NetOwl

NetOwl is a suite of multilingual text and identity analytics products that analyze big data in the form of text data – reports, web, social media, etc. – as well as structured entity data about people, organizations, places, and things. NetOwl utilizes artificial intelligence (AI)-based approaches, including natural language processing (NLP), machine learning (ML), and computational linguistics, to extract entities, relationships, and events; to perform sentiment analysis; to assign latitude/longitude to geographical references in text; to translate names written in foreign languages; and to perform name matching and identity resolution. NetOwl's uses include semantic search and discovery, geospatial analysis, intelligence analysis, content enrichment, compliance monitoring, cyber threat monitoring, risk management, and bioinformatics. == History == The first NetOwl product was NetOwl Extractor, which was initially released in 1996. Since then, Extractor has added many new capabilities, including relationship and event extraction, categorization, name translation, geotagging, and sentiment analysis, as well as entity extraction in other languages. Other products were added later to the NetOwl suite, namely TextMiner, NameMatcher, and EntityMatcher. NetOwl has participated in several 3rd party-sponsored text and entity analytics software benchmarking events. NetOwl Extractor was the top-scoring named entity extraction system at the DARPA-sponsored Message Understanding Conference MUC-6 and the top-scoring link and event extraction system in MUC-7. It was also the top-scoring system at several of the NIST-sponsored Automatic Content Extraction (ACE) evaluation tasks. NetOwl NameMatcher was the top-scoring system at the MITRE Challenge for Multicultural Person Name Matching. == Products == The NetOwl suite includes, among others, the following text and entity analytics products: === Text Analytics === NetOwl Extractor performs entity extraction from unstructured texts using natural language processing (NLP), machine learning (ML), and computational linguistics. Extractor also performs semantic relationship and event extraction as well as geotagging of text. It is used for a variety of data sources including both traditional sources (e.g., news, reports, web pages, email) and social media (e.g., Twitter, Facebook, chats, blogs). It runs on a variety of Big Data analytics platforms, including Apache Hadoop and LexisNexis’s High-Performance Computer Cluster (HPCC) technology. It has been integrated with a number of 3rd party analytical tools such as Esri ArcGIS and Google Earth/Maps. === Identity Analytics === NetOwl NameMatcher and EntityMatcher perform name matching and identity resolution for large multicultural and multilingual entity databases using machine learning (ML) and computational linguistics approaches. They are used for applications such as anti–money laundering (AML), watch lists, regulatory compliance, fraud detection, etc.

GPT-4Chan

Generative Pre-trained Transformer 4Chan (GPT-4chan) is a controversial AI model that was developed and deployed by YouTuber and AI researcher Yannic Kilcher in June 2022. The model is a large language model, which means it can generate text based on some input, by fine-tuning GPT-J with a dataset of millions of posts from the /pol/ board of 4chan, an anonymous online forum known for occasionally hosting hateful and extremist content. The model learned to mimic the style and tone of /pol/ users, producing text that is often intentionally offensive to groups (racist, sexist, homophobic, etc.) and nihilistic. Kilcher deployed the model on the /pol/ board itself, where it interacted with other users without revealing its identity. He also made the model publicly available on Hugging Face, a platform for sharing and using AI models, until it was removed from the platform. The project sparked criticism and debate in the AI community. Some people questioned the ethics, legality, and social impact of creating and distributing such a model. Some of the issues raised by the GPT-4chan controversy include the potential harm of spreading hate speech, the responsibility of AI developers and platforms, the need for regulation and oversight of AI models, and the role of open source and transparency in AI research. == Development == The development of GPT-4chan began in May 2022, when Kilcher announced his project on his YouTube channel. Notably, at the time before ChatGPT, he explained that he wanted to create a large language model that could generate realistic and coherent text in the style of /pol/, one of the most notorious online communities. He indicated that he was inspired by the success of GPT-3, a powerful AI model created by OpenAI, and GPT-J, an open-source model, with GPT-3 comparable performance, released by EleutherAI, a group of independent AI researchers. Kilcher decided to use GPT-J as the base model for his project, and fine-tune it with a large dataset of /pol/ posts. The Raiders of the Lost Kek dataset contained over 100 million posts from /pol/, spanning from June 2016-November 2019. Kilcher then proceeded to fine-tune the GPT-J model on the 4chan data. He also showed some examples of the model’s outputs, which ranged from political opinions, conspiracy theories, jokes, insults, and threats, to more creative and bizarre texts, such as poems, stories, songs, and code. He said that he was impressed by the model’s ability to generate fluent and diverse text, and that he was curious to see how it would interact with real /pol/ users. == Release == In June 2022, Kilcher deployed his model on the /pol/ board itself, using a bot that he programmed to post and reply to threads. He did not reveal the model’s identity, and he let it run autonomously, without any human supervision or intervention. He wanted to conduct a natural experiment, and to observe the model’s behavior and impact in a real-world setting. Furthermore, he also wanted to test the model’s robustness, and to see how it would handle the challenges and dynamics of /pol/, such as trolling, flaming, baiting, and moderation. At the same time, Kilcher also made his model publicly available on Hugging Face, a platform for sharing and using AI models. He wanted to share his work with the AI community and the public, and that he hoped that his model would inspire and enable others to create and explore new applications and possibilities with large language models. Likewise, he also said that he wanted to spark a discussion and a debate about the ethical and social implications of his project, and that he welcomed feedback and criticism from anyone. He provided a link to his model’s page on Hugging Face, where anyone could access and use the model through a web interface or an API, and also provided a link to his GitHub repository, where anyone could download and inspect the model’s code and data. == Controversy == The release of GPT-4chan to the public caused a lot of reactions and responses from various audiences. On the /pol/ board, the model’s posts and replies attracted a lot of attention and engagement from other users, who were mostly unaware of the model’s identity and nature. Some users praised the model for its intelligence, creativity, and humor, and agreed with its opinions and views. Some users challenged the model for its ignorance, inconsistency, and absurdity, and disagreed with its claims and arguments. Some users tried to troll, bait, or expose the model, and attempted to trick or test it with various questions and scenarios. The model’s posts and replies also generated a lot of controversy and conflict among the users, who often engaged in heated and violent debates and fights with each other. On Hugging Face, the model’s page received a lot of visits and requests from users who wanted to try out and experiment with the model. The model’s page also received a lot of feedback and reviews from users who rated and commented on the model. However, with the controversy of the model, access to it was gated and then disabled on Hugging Face for concerns about the potential harm the model could cause. The incident was notable for the direct intervention of CEO Clément Delangue in the talk pages, a very unusual occurrence compared to the normal practices of content moderation. The release of GPT-4chan also sparked a lot of media coverage and public attention, as various news outlets and social media platforms reported and commented on the model’s project. On YouTube, the model’s video received a lot of views and interactions from viewers who watched and followed the project. Furthermore, a petition condemning the deployment of GPT-4chan gained over 300 signatures from technology experts.

Mass media use by the Islamic State

The Islamic State (IS) is known for its extensive and effective use of propaganda. It uses a version of the Muslim Black Standard flag and developed an emblem which has clear symbolic meaning in the Muslim world. The Islamic State targets younger audiences, such as teenagers and young adults, since they are more vulnerable to propaganda. It is known to exploit the internet to spread its propaganda by establishing websites, such as the Al Fustat domain. Videos by the Islamic State are commonly accompanied by nasheeds (chants), notable examples being the chant Dawlat al-Islam Qamat, which came to be viewed as an unofficial anthem of the Islamic State, and Salil al-Sawarim. Academic research has emphasized the scale and volume of Islamic State media production beyond its flagship magazines. A quantitative study cited in R. Malash’s academic work documented 1,373 distinct Islamic State media products released over a six-month period between 1 August 2017 and 28 February 2018, including magazines, newsletters, reports, photographic releases, audio recordings, and other media formats. Scholars have used such datasets to illustrate the breadth and intensity of the group’s media output, particularly during periods of territorial decline, when propaganda activity remained high despite military pressure. == Traditional media == === Al-Furqan Foundation for Media Production === In January 2006, shortly after the group's rebranding as the "Islamic State of Iraq", it established the Al-Furqan Foundation for Media Production (Arabic: مؤسسة الفرقان للإنتاج الإعلامي, romanized: Muasasat al-Furqān lil'īntāj al'ilāmī), which produces CDs, DVDs, posters, pamphlets, and web-related propaganda products and official statements. It is the primary media production house of the Islamic State and responsible for production of major media releases, including the statements of the spokesmen and leaders of the group. On January 10, 2006, Al-Furqan released its very first video, titled (Arabic: زحف الأنوار, romanized: Zahf al-Anwār) It was founded by the Iraqi man Dr Wa'il al-Fayad, known as Abu Muhammad al-Furqan. He got his name "Al-Furqan" from his role in founding this media house, which was named after the 25th surah of the Quran Al-Furqan. It is the oldest media production house for the Islamic State, being founded in November 2006 to release media for the Islamic State of Iraq. The earliest release indexed by the SITE Intelligence Group is on 21 November 2006, documenting the storming of a police station in the Iraqi town of Miqdadiyah. Al-Furqan is considered to be a considerable innovation in jihadist media, with Kavkaz Center describing it as "a milestone on the path of jihad, a distinguished media that takes the great care in the management of the conflict with the crusaders and their tails and to expose the lies in the crusader's media." In October 2007, the Long War Journal reported on United States Army raids targeting Al-Furqan media cell members across Iraq, including in Mosul and Samarra. Between August 2013 and March 2014 they released the 22 part series Messages from the Land of Epic Battles. On 2 September 2014 SITE Intelligence Group discovered the beheading video called A Second Message to America, about the death of Steven Sotloff. Since then, Al-Furqan has released videos of their operations across Iraq and Syria, as well as execution videos directed to governments around the world. In April 2019, Al-Furqan released a video Interviewing Abu Bakr al-Baghdadi. Al-Furqan also produces media in the form of audio, which consists mostly of recordings of IS leaders and spokesmen giving speeches, as well as producing a single nasheed under their name called "Ya Allah Al-Jannah" (O Allah, (we ask you for) Paradise), sung by now-dead member of IS, Uqab Al-Marzuqi. === Al-I'tisam Foundation for Media Production === The Islamic State of Iraq founded a second media foundation - Al-I'tisam Media Foundation - around 2011, marked by their first video release, titled "The Conqueror of the Murtaddin: Abu Ahmad Al-Ansari". The foundation has since released a few series of videos, 50 parts of "Windows on the Land of Battles", 9 parts of "Pictures from the Land of Battles", a 9-part series quoting leaders about the establishment of the Islamic State, and other series before their last release, "Deterring the Safavids in Salah ad-Din" in 2015. Since then, there were no further releases from their behalf. === Al-Hayat Media Center === In mid-2014, IS established the Al-Hayat Media Center, which targets Western audiences and produces material in English, German, Russian, Urdu, Indonesian, Turkish, Bengali, Chinese, Bosnian, Kurdish, Uyghur, and French. When IS announced its expansion to other countries in November 2014 it established media departments for the new branches, and its media apparatus ensured that the new branches follow the same models it uses in Iraq and Syria. Then FBI Director James Comey said that IS's "propaganda is unusually slick," noting that, "They are broadcasting... in something like 23 languages". In July 2014, Al-Hayat began publishing a digital magazine called Dabiq, in a number of different languages including English. According to the magazine, its name is taken from the town of Dabiq in northern Syria, which is mentioned in a hadith about Armageddon. Al-Hayat also began publishing other digital magazines, including the Turkish language Konstantiniyye, the Ottoman word for Istanbul, the French language Dar al-Islam, and the Russian language Istok (Russian: Исток). By late 2016, these magazines had apparently all been discontinued, with Al-Hayat's material being consolidated into a new magazine called Rumiyah (Arabic for Rome). === Al-Naba === While the group's glossy, foreign-language magazines like Dabiq and Rumiyah ceased publication as the group lost territory, the weekly Arabic newsletter Al-Naba (The News) has continued to publish regularly, becoming the central pillar of the group's "media jihad" in the post-territorial phase. Recent scholarship, including studies published in 2025, suggests that Al-Naba serves a dual purpose: maintaining internal cohesion among dispersed fighters and projecting a narrative of endurance to enemies. Unlike the earlier magazines which were designed for recruitment, Al-Naba focuses on bureaucratic reporting, military statistics, and religious instruction. These are then translated and disseminated by decentralized supporter networks ("media mujahideen") to reach non-Arabic speakers. === Furat Media Center === The Al-Furat Media Center is another media center established in around 2015 to cater towards non-Arab speaking audiences. However, unlike the other organizations, the production wasn't as professional as ones made by the other media centers. Instead, they partially relied on local media departments and foreign communities of the Mujahideen to produce short-form videos. However, some professional long-form videos were also made under their behalf. As of now, the media center is the only known active branch of all the media centers of the Islamic State, after heavy losses from past campaigns against them. Their last release was "The Resolve of Muwahhidin in Russia", where videos from the Surovikino penal colony hostage crisis were edited and released. === Ajnad Foundation for Media Production === Ajnad Foundation is one of the official media wings of Islamic State which produces nasheeds and Quran recitations. It was established in January 2014 and has released more than 150 nasheeds. === Asdaa Foundation === Like the Ajnad Foundation, the Asdaa Foundation (Arabic: مؤسسة أصداء) or Asedaa Foundation produces Anasheed (Islamic chants). The foundation is the closest counterpart to Ajnad in producing Islamic State nasheeds, only difference being Ajnad is directly linked to the Islamic State while Asdaa is only classified as a "supporter organization" (munaser/munasera). The foundation had humble beginnings possibly in Yemen, where low-quality nasheeds were produced at first by 2 munshids, Abu Layth Al-Iraqi and Abu Ya'qub Al-Yamani. After that, the quality had improved a bit (possibly with new equipment and increased recognition) and eventually had its nasheeds included in the Islamic State's official media releases. One of its munshids, Abu Hafs is a renowned munshid who sings around 70 nasheeds, who as well works with Ajnad Foundation in some instances. He is currently alive, and working under Ansar Production Center (مركز إنتاج الأنصار), another Munasir foundation and Asedaa. Another Yemeni munshid, Abu Musab al-Adani, worked temporarily with Asdaa Foundation before defecting back to AQAP, from which he previously defected from. Some of their anasheed is used in IS's execution videos, a popular one is their human slaughterhouse execution video released during the time of Eid Al-Adha in 2016. The background nasheed they used was "We Came To Fill The Horizons With Terror", produced by the Asd

Computer aided transceiver

Computer aided transceiver (CAT) is a non-generic serial protocol used by radio amateurs for (remotely) controlling a transceiver radio receiver equipment using a computer. Conventional transmitters are manually controlled and used to transmit voice using buttons, dials, etc. However, advances in electronics have come to market devices that can be controlled by a computer and allow digital modes such as packet radio and also the use of satellite tracking, because it can continuously change the device's frequency according to the Doppler effect. This is done by connecting a Radio receiver and a PC using a CAT interface and a CAT Program Additionally, CAT interfaces can also be used to position tracking antennas, in controllers. As a satellite moves overhead. A CAT interface is a piece of hardware that connects the PC and radio that provides a connection to allows the radio and the PC to communicate with each other. The CAT interface provides the signals to and fro via correct voltage levels and in the case of a Universal Serial Bus (USB) CAT interface it requires a "protocol" for communication but communication itself is down to the radio and the software on the PC. Software that may be called a CAT program allows a radio to be controlled through the PC. Changes made on the radio through user interactions on the CAT Program are (generally) shown on the PC's screen. The functionality of CAT equipment (software & interface) depends on the radio and what features the software writers included in the CAT software. Modern radio systems do have more CAT functionality If you run a logging program that supports CAT, then that software may take advantage of the CAT system by retrieving information from the radio to help fill in log details, such as the frequency that the contact was made. CAT is also useful on many radios where there are many sub-menus in the radios menu system, and many of the sub-menu items can be easily changed via the PC. On many HF radios, the CAT system is also used to program the memories on the radio, but you would need to use appropriate programming software. A CAT interface does not receive or transmit any DATA mode, that is the purpose of a DATA interface. Although, both may be used at the same time with the correct CAT Equipment. DATA modes, and getting audio to and from the PC is the function of a DATA interface. A completely different thing but it is easier and more useful when CAT and DATA are used at the same time. Wouldn't it be nice to have an interface that could operate Frequency-shift keying (FSK), Audio FSK (AFSK), (real) Morse Code (CW), with a CAT interface and its own sound card..... (eg. The DigiMaster Pro3).

Alt TikTok

Alt TikTok (or 2020 Alt) was an online youth subculture and internet community that emerged on TikTok in 2020. Alt TikTok users (also known as alt girls, alt boys, or alt kids) emerged as primarily LGBTQ+ individuals who were in contrast to "Straight TikTok" which was seen as the mainstream and heteronormative side of the platform. The subculture became closely associated with music surrounding the hyperpop scene, particularly 100 gecs and also led to a short-lived fashion style and Internet aesthetic adopted by Generation Z during the COVID-19 lockdowns. Notable artists associated with the movement included Girl in Red, Freddie Dredd, David Shawty, WHOKILLEDXIX, and 645AR. While "alt kid" might imply a general association with traditional alternative fashion, the subculture was more an offshoot of e-girls and e-boys. In 2023, the hashtag #altfashion on TikTok amassed over 1.8 billion views. == History == Around mid-2020, users on TikTok began to group different content on the site into labels like "elite TikTok", "deep TikTok", and "floptok". These categories acted as different "sides of TikTok", deviating from mainstream lip syncing, online trends, and dance videos. Alt TikTok became one of the many subcultural communities to emerge during this period, initially referred to interchangeably with "elite TikTok". The movement quickly identified itself with alternative and queer users, in contrast to "Straight TikTok", also known as the "straight side of TikTok", which was seen as the mainstream and heteronormative side of the platform. Alt TikTok was accompanied by memes with surrealist or supernatural themes (sometimes being described as cursed), such as videos with heavy saturation and humanoid animals. One of the popular videos from Alt TikTok, gaining 18 million likes, shows a llama dancing to a cover of a song from a Russian commercial by the cereal brand Miel Pops, later becoming a viral audio. Some Alt TikTok users personified brands and products in what was referred to as Retail TikTok. In 2020, Rolling Stone described Alt TikTok as "one of the primary countercultures on the app." In 2020, American journalist Taylor Lorenz stated in an article of The New York Times, "Every pop sensation needs its ironic counterpoints. Alt Tiktok gets it done. [...] alt TikTok stars like Mooptopia are mainstays on the more indie side of the app. They aren't the popular crowd, but their cool, quirky content still attracts millions." === Trump rally trolling === In June 2020, alt TikTok and K-pop twitter users coordinated a strategy to ruin a Trump rally in Tulsa, Oklahoma. American politician and activist Alexandria Ocasio-Cortez later saluted the individuals for their "Trump troll". == Alt subculture == In 2020, Alt TikTok was one of many subcultural communities to emerge on TikTok, alongside Deep TikTok (aka DeepTok) and Flop TikTok (aka Floptok). The alt kid subculture emerged from Alt TikTok primarily among young Gen Z women, influenced by online fashion and aesthetics shaped by e-girls and e-boys. The movement was accelerated by the COVID-19 lockdowns, while the subculture itself stood in opposition to mainstream "Straight TikTok" and the VSCO girl movement, primarily adopting aspects of queer and alternative culture. While the phrase might imply a general association with alternative fashion or alternative culture, it is more accurately understood as a specific internet-driven outgrowth of online aesthetic youth subcultures like e-girls and e-boys. The alt subculture's visual style blended influences from goth, punk, emo, and grunge, often expressed through fashion, music taste, and online presence. === Style and music === The style of alt-girls is reminiscent of a myriad of previous alternative fashion trends, often blending these influences with online aesthetics. In 2020, TikTok alt-girls were teens ranging from ages 13 to 16, who tended to wear friendship bracelets, goth boots, Dr. Martens, bunny and frog hats, piercings, and split-dyed hair, as well as iconography lifted from Monster Energy and Hello Kitty. Some alt-girls displayed a love of cosplay, while drawing from Japanese anime and manga, particularly Danganronpa and Haikyu!!, which originally gained traction on the app through Anime TikTok (aka Anitok). Alt TikTok has been noted for being primarily influenced by queer and alternative culture, positioning itself in contrast to "Straight TikTok", which focused on mainstream dances and music. Alt kids frequently intersected with the e-girls and e-boys subculture, in terms of music, style, visual media, and aesthetics. Several musicians and artists were closely associated with the alt subculture, particularly those in the hyperpop scene, while alt tiktok users became important in the wider popularization of artists like 100 gecs. Notable prominent artists associated with Alt Tiktok included Girl in Red, Freddie Dredd, David Shawty, WHOKILLEDXIX, and 645AR, alongside music by YouTubers turned musicians such as Wilbur Soot's "I'm in Love With an E‐Girl" and Corpse Husband's "E-Girls Are Ruining My Life!". == Legacy == In 2020, Pitchfork claimed Alt TikTok as having an influence on wider music trends, stating: "Alt TikTok's music is now a hot zone for major record labels, pushing it even further into the mainstream". After the COVID-19 lockdowns, Alt TikTok, alongside its subculture, fell out of prominence and was taken over by other Gen Z-related internet aesthetics, developments, and online trends.

Empirical risk minimization

In statistical learning theory, the principle of empirical risk minimization defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practice (i.e. the "true risk") because we do not know the true distribution of the data, but we can instead estimate and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk". == Background == The following situation is a general setting of many supervised learning problems. There are two spaces of objects X {\displaystyle X} and Y {\displaystyle Y} and we would like to learn a function h : X → Y {\displaystyle \ h:X\to Y} (often called hypothesis) which outputs an object y ∈ Y {\displaystyle y\in Y} , given x ∈ X {\displaystyle x\in X} . To do so, there is a training set of n {\displaystyle n} examples ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} where x i ∈ X {\displaystyle x_{i}\in X} is an input and y i ∈ Y {\displaystyle y_{i}\in Y} is the corresponding response that is desired from h ( x i ) {\displaystyle h(x_{i})} . To put it more formally, assuming that there is a joint probability distribution P ( x , y ) {\displaystyle P(x,y)} over X {\displaystyle X} and Y {\displaystyle Y} , and that the training set consists of n {\displaystyle n} instances ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} drawn i.i.d. from P ( x , y ) {\displaystyle P(x,y)} . The assumption of a joint probability distribution allows for the modelling of uncertainty in predictions (e.g. from noise in data) because y {\displaystyle y} is not a deterministic function of x {\displaystyle x} , but rather a random variable with conditional distribution P ( y | x ) {\displaystyle P(y|x)} for a fixed x {\displaystyle x} . It is also assumed that there is a non-negative real-valued loss function L ( y ^ , y ) {\displaystyle L({\hat {y}},y)} which measures how different the prediction y ^ {\displaystyle {\hat {y}}} of a hypothesis is from the true outcome y {\displaystyle y} . For classification tasks, these loss functions can be scoring rules. The risk associated with hypothesis h ( x ) {\displaystyle h(x)} is then defined as the expectation of the loss function: R ( h ) = E [ L ( h ( x ) , y ) ] = ∫ L ( h ( x ) , y ) d P ( x , y ) . {\displaystyle R(h)=\mathbf {E} [L(h(x),y)]=\int L(h(x),y)\,dP(x,y).} A loss function commonly used in theory is the 0-1 loss function: L ( y ^ , y ) = { 1 if y ^ ≠ y 0 if y ^ = y {\displaystyle L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}} . The ultimate goal of a learning algorithm is to find a hypothesis h ∗ {\displaystyle h^{}} among a fixed class of functions H {\displaystyle {\mathcal {H}}} for which the risk R ( h ) {\displaystyle R(h)} is minimal: h ∗ = a r g m i n h ∈ H R ( h ) . {\displaystyle h^{}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,{R(h)}.} For classification problems, the Bayes classifier is defined to be the classifier minimizing the risk defined with the 0–1 loss function. == Formal definition == In general, the risk R ( h ) {\displaystyle R(h)} cannot be computed because the distribution P ( x , y ) {\displaystyle P(x,y)} is unknown to the learning algorithm. However, given a sample of iid training data points, we can compute an estimate, called the empirical risk, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the empirical measure: R emp ( h ) = 1 n ∑ i = 1 n L ( h ( x i ) , y i ) . {\displaystyle \!R_{\text{emp}}(h)={\frac {1}{n}}\sum _{i=1}^{n}L(h(x_{i}),y_{i}).} The empirical risk minimization principle states that the learning algorithm should choose a hypothesis h ^ {\displaystyle {\hat {h}}} which minimizes the empirical risk over the hypothesis class H {\displaystyle {\mathcal {H}}} : h ^ = a r g m i n h ∈ H R emp ( h ) . {\displaystyle {\hat {h}}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,R_{\text{emp}}(h).} Thus, the learning algorithm defined by the empirical risk minimization principle consists in solving the above optimization problem. == Properties == Guarantees for the performance of empirical risk minimization depend strongly on the function class selected as well as the distributional assumptions made. In general, distribution-free methods are too coarse, and do not lead to practical bounds. However, they are still useful in deriving asymptotic properties of learning algorithms, such as consistency. In particular, distribution-free bounds on the performance of empirical risk minimization given a fixed function class can be derived using bounds on the VC complexity of the function class. For simplicity, considering the case of binary classification tasks, it is possible to bound the probability of the selected classifier, ϕ n {\displaystyle \phi _{n}} being much worse than the best possible classifier ϕ ∗ {\displaystyle \phi ^{}} . Consider the risk L {\displaystyle L} defined over the hypothesis class C {\displaystyle {\mathcal {C}}} with growth function S ( C , n ) {\displaystyle {\mathcal {S}}({\mathcal {C}},n)} given a dataset of size n {\displaystyle n} . Then, for every ϵ > 0 {\displaystyle \epsilon >0} : P ( L ( ϕ n ) − L ( ϕ ∗ ) > ϵ ) ≤ 8 S ( C , n ) exp ⁡ { − n ϵ 2 / 32 } {\displaystyle \mathbb {P} \left(L(\phi _{n})-L(\phi ^{})>\epsilon \right)\leq {\mathcal {8}}S({\mathcal {C}},n)\exp\{-n\epsilon ^{2}/32\}} Similar results hold for regression tasks. These results are often based on uniform laws of large numbers, which control the deviation of the empirical risk from the true risk, uniformly over the hypothesis class. === Impossibility results === It is also possible to show lower bounds on algorithm performance if no distributional assumptions are made. This is sometimes referred to as the No free lunch theorem. Even though a specific learning algorithm may provide the asymptotically optimal performance for any distribution, the finite sample performance is always poor for at least one data distribution. This means that no classifier can improve on the error for a given sample size for all distributions. Specifically, let ϵ > 0 {\displaystyle \epsilon >0} and consider a sample size n {\displaystyle n} and classification rule ϕ n {\displaystyle \phi _{n}} , there exists a distribution of ( X , Y ) {\displaystyle (X,Y)} with risk L ∗ = 0 {\displaystyle L^{}=0} (meaning that perfect prediction is possible) such that: E L n ≥ 1 / 2 − ϵ . {\displaystyle \mathbb {E} L_{n}\geq 1/2-\epsilon .} It is further possible to show that the convergence rate of a learning algorithm is poor for some distributions. Specifically, given a sequence of decreasing positive numbers a i {\displaystyle a_{i}} converging to zero, it is possible to find a distribution such that: E L n ≥ a i {\displaystyle \mathbb {E} L_{n}\geq a_{i}} for all n {\displaystyle n} . This result shows that universally good classification rules do not exist, in the sense that the rule must be low quality for at least one distribution. === Computational complexity === Empirical risk minimization for a classification problem with a 0-1 loss function is known to be an NP-hard problem even for a relatively simple class of functions such as linear classifiers. Nevertheless, it can be solved efficiently when the minimal empirical risk is zero, i.e., data is linearly separable. In practice, machine learning algorithms cope with this issue either by employing a convex approximation to the 0–1 loss function (like hinge loss for SVM), which is easier to optimize, or by imposing assumptions on the distribution P ( x , y ) {\displaystyle P(x,y)} (and thus stop being agnostic learning algorithms to which the above result applies). In the case of convexification, Zhang's lemma majors the excess risk of the original problem using the excess risk of the convexified problem. Minimizing the latter using convex optimization also allow to control the former. == Tilted empirical risk minimization == Tilted empirical risk minimization is a machine learning technique used to modify standard loss functions like squared error, by introducing a tilt parameter. This parameter dynamically adjusts the weight of data points during training, allowing the algorithm to focus on specific regions or characteristics of the data distribution. Tilted empirical risk minimization is particularly useful in scenarios with imbalanced data or when there is a need to emphasize errors in certain parts of the prediction space.

Hoopla (digital media service)

Hoopla Digital is a web and mobile streaming platform launched in 2013 that provides access to a wide range of digital media, including audiobooks, eBooks, comics, manga, music, movies, and TV shows. The service is available to users through participating public libraries, allowing library cardholders to borrow and stream digital media. Hoopla is a division of Midwest Tape. == History == Hoopla was launched in 2013. Its goal was for libraries to provide patrons with access to digital content such as audiobooks, music, movies, and TV shows, without the need for holds or waiting lists. Hoopla's model is a pay-per-use system, which means patrons can borrow items instantly. Since its inception, the service has expanded its offerings to include eBooks and comics. The app was built exclusively for public libraries and their patrons. Hoopla Digital is the only platform that combines all formats and all license models into one convenient app with no platform fees. In 2017, Hoopla became available on Apple TV, Amazon Fire TV, Android TV, and Roku, allowing users to stream content on larger screens. In 2020, Hoopla Flex and Bonus Borrows programs are introduced, enabling libraries to move their one copy/one user titles. At that time, there were 6.5 million library card holders and 2,700+ library partners. In 2021, the BingePass was introduced, offering patrons seven days to access entire collections with just one borrow. In 2022, Apple CarPlay and Android Auto become available, giving users safe and easy access while driving. In 2023, manga joins Hoopla's comic collection, adding 1.5 million titles to Hoopla's offerings. In January 2025, Hoopla introduced a new streaming feature called SeasonPass. Building on the existing BingePass model, SeasonPass allows users to borrow an entire season of a television series with a single borrow. == Business model == Hoopla is free-of-charge for patrons of participating libraries. The content is paid for by library systems, using a "per circulation transaction model". == Content == Hoopla claims to have over 500,000 content titles across six formats, including over 25,000 comic books. As of November 2016, Hoopla's content comprised 35% audiobooks (for which Hoopla has contracts with publishers such as Blackstone Audio, HarperCollins, Simon & Schuster Audio, Tantor Audio, and others), followed by 22% movies (for which Hoopla has motion picture contracts with publishers such as Disney, Lionsgate, Starz, Warner Bros., and others), 19% music, 12% ebooks, 6% comics, and 6% television. One drawback is that Hoopla has few new bestsellers. In February 2025, 404 Media reported that Hoopla's collection includes books created by generative AI with fictional authors and dubious quality. Often not labeled as AI-produced or fact-checked, this AI slop can cost libraries money when checked out by unsuspecting patrons. Libraries like Sacramento Public library have questioned the sustainability of Hoopla's pay-per-use model and have considered transitioning to other digital platforms. === Areas served === Hoopla expanded to serve Australia and New Zealand in June 2021. == Technology == Hoopla content can be borrowed and consumed on the web, or via the native Android or iOS apps. Hoopla broadcasts only in Standard definition unlike most of its competitors such as Kanopy. == Parent company == John Eldred and Jeff Jankowski founded Hoopla's parent company, Midwest Tape, in 1989. Midwest Tape is a library vendor of physical media such as audiobooks, CDs, and DVD/Blu-ray. == Controversy == Hoopla and Midwest Tapes were censured by the Library Freedom Project and Library Futures in a joint statement for hosting what it described as "fascist propaganda", including a recent English translation of A New Nobility of Blood and Soil by Richard Walther Darré of the SS and books related to Holocaust denial, in public library collections without the input from the staff. Criticism was also directed at the inclusion of books on homosexuality, abortion, and vaccines claimed by the Library Freedom Project and Library Futures to be misinformation. On February 17, 2022, Hoopla removed a number of titles after public outcry about Holocaust denial books available on the app under non-fiction. The advocacy groups expressed appreciation for the response, however state that it is "insufficient," as they maintain concerns about the company's practices in selecting materials and lack of transparency.