Digital history

Digital history is the use of digital media to further historical analysis, presentation, and research. It is a branch of the digital humanities and an extension of quantitative history, cliometrics, and computing. Digital history is commonly known as digital public history, concerned primarily with engaging online audiences with historical content, or digital research methods, that further academic research. Digital history outputs include: digital archives, online presentations, data and information visualizations, interactive maps, timelines, audio files, and virtual worlds. These outputs are designed to enhance accessibility to users, facilitating engagement with historical content. Recent digital history projects focus on creativity, collaboration, and technical innovation, text mining, corpus linguistics, network analysis, 3D modeling, and big data analysis. By utilizing these resources, the user can rapidly develop new analyses that can link to, extend, and bring to life existing histories. == History == Rooted in earlier social science history work, particularly around the history of enslavement in the United States, early digital history in the 1960s and 70s focused on using computers to conduct quantitative analyses, primarily of demographic and social history data - censuses, election returns, city directories, and other tabular or countable data. - with the aim of producing defensible research findings These early computers could be programmed to conduct statistical analyses of these records, creating tallies, or seeking trends across records. This research into historical demography was rooted in the rise of social history as a field of historical interest. The historians involved in this work sought to quantify past societies, to come to new conclusions about communities and population. Computers proved capable tools for that type of work. By the late 1970s younger historians turned to cultural studies, most of these studies involved online databases that were checked by Professionals in Great Britain about once a year. The outpouring of quantitative studies by established scholars continued. Since then, quantitative history and cliometrics have been used primarily by historically minded economists and political scientists. In the late 1980s quantifiers founded the Association for History and Computing. This movement provided some of the impetus for the rise of digital history in the 1990s. The more recent roots of digital history were in software rather than online networks. In 1982, the Library of Congress embarked on its Optical Disk Pilot Project, which placed text and images from its collection on to laserdiscs and CD-ROMs. The library started offering online exhibits in 1992 when it launched Selected Civil War Photographs. In 1993, Roy Rosenzweig, along with Steve Brier and Josh Brown, produced their award-winning CD-ROM Who Built America? From the Centennial Exposition of 1876 to the Great War of 1914, designed for Apple, Inc. that integrated images, text, film and sound clips, displayed in a visual interface that supported a text narrative. Among the earliest online digital history projects were The Heritage Project of the University of Kansas, and medieval historian Dr. Lynn Nelson's World History Index and History Central Catalogue. Another was The Valley of the Shadow, conceived in 1991 by current University of Richmond professor of humanities and president emeritus, Edward L. Ayers, who was then at the University of Virginia. The Institute for Advanced Technology in the Humanities (IATH) at the University of Virginia adopted the Valley Project and partnered with IBM to collect and transcribe historical sources into digital files. The project collected data related to Augusta County in Virginia and Franklin County in Pennsylvania during the American Civil War. In 1996, William G. Thomas III joined Ayers on the Valley Project. Together, they produced an online article entitled "The Differences Slavery Made: A Close Analysis of Two American Communities," which also appeared in The American Historical Review in 2003. A CD-ROM also accompanied the Valley Project, published by W. W. Norton and Company in 2000. Rosenzweig, who died October 11, 2007, founded the Center for History and New Media (CHNM) at George Mason University in 1994. Today, CHNM boasts several digital tools available to historians, such as Zotero, Omeka or Tropy. In 1997, Ayers and Thomas used the term "digital history" when they proposed and founded the Virginia Center for Digital History (VCDH) at the University of Virginia, the earliest center devoted exclusively to history. Several other institutions promoting digital history include the Center for Humane Arts, Letters, and Social Sciences Online (MATRIX) at Michigan State University, Maryland's Institute for Technology in the Humanities, and the Center for Digital Research in the Humanities at the University of Nebraska. In 2004, Emory University launched Southern Spaces, a "peer-reviewed Internet journal and scholarly forum" examining the history of the South. == Applications == There are many potential benefits to the use of digital history when combined with traditional historical methods. Some of these applications include: Combining traditional historical methods and new research methods in order to come to new conclusions. Using different tools to extract and analyse larger amounts of data that would not be manageable otherwise. Create models and maps of data extracted to create a visualisation of the data. Data extracted and analysed can be placed alongside existing historiography to increase combined historical knowledge. By adding new research methods to existing historical method, historians can benefit greatly from the ability to work with larger amounts of data and develop new interpretations from this. == Notable Projects == The collaborative nature of most digital history endeavors has meant that the discipline has developed primarily at institutions with the resources to sponsor content research and technical innovation. Two of the first centers, George Mason University's Center for History and New Media and the Virginia Center for Digital History at the University of Virginia have been among the leaders in the development of digital history projects and the education of digital historians. Some of the noteworthy projects emerging from these pioneering centers are The Geography of Slavery, The Texas Slavery Project, and The Countryside Transformed at VCDH and Liberty, Equality, Fraternity: Exploring the French Revolution and The Lost Museum at the CHNM. In each of these projects, mediated archives holding multiple types of sources are combined with digital tools to analyze and illuminate an historical question to a varying degree; this integration of content and tools with analysis is one of the hallmarks of digital history—projects move beyond archives or collections and into scholarly analysis and the use of digital tools to develop that analysis. The differences between the ways projects incorporate these integrations are a measure of the development of the field and point to the ongoing debates over what digital history can and should be. While many of the projects at VCDH, CHNM, and other university's centers have been geared towards academics and post-secondary education, the University of Victoria (British Columbia), in conjunction with the Université de Sherbrooke and the Ontario Institute for Studies in Education at the University of Toronto, has created as series of projects for all ages, "Great Unsolved Mysteries in Canadian History." Laden with instructional aids, this site asks teachers to introduce students to historical research methods to help them develop analytical skills and a sense of the complexities of their national history. Issues of race, religion, and gender are addressed in carefully constructed modules that cover incidents in Canadian history from Viking exploration through the 1920s. One of the original co-creators of the project, John Lutz has also developed Victoria's Victoria with the University of Victoria and Malaspina University-College. In addition to Ayers, Thomas, Lutz, and Rosenzweig, numerous other individual scholars work with digital history techniques and have made and/or continue to make important contributions to the field. Robert Darnton's 2000 article, "An Early Information Society: News and the Media in Eighteenth-Century Paris" was supplemented with electronic resources and is an early model of the discussions around digital history and its future in the humanities. One of the first major digital projects to be reviewed by the American Historical Review (AHR) was Philip Ethington's "Los Angeles and the Problem of Urban Historical Knowledge"—a multimedia exploration of changes to Los Angeles' physical profile over the course of several decades. In this essay, he also expresses his beliefs that historians have major power in

Kuaishou

Kuaishou Technology is a Chinese publicly traded partly state-owned holding company based in Haidian District, Beijing, that was founded in 2011 by Hua Su (Chinese: 宿华) and Cheng Yixiao (Chinese: 程一笑). The company, listed on the Hong Kong Stock Exchange, is known for developing a mobile app for sharing users' short videos, a social network, and video special effects editor. The app is known as Kwai in many countries outside of China. It is also known as Snack Video in India, Pakistan and Indonesia. == Ownership and governance == Kuaishou's overseas team is led by the former CEO of the application 99, and staff from Google, Facebook, Netflix, and TikTok were recruited to lead the company's international expansion. The China Internet Investment Fund, a state-owned enterprise controlled by the Cyberspace Administration of China, holds a golden share ownership stake in Kuaishou. == History == Kuaishou is China's first short video platform that was developed in 2011 by engineer Hua Su and Cheng Yixiao. Prior to co-founding Kuaishou, Su Hua had worked for both Google and Baidu as a software engineer. The company is headquartered in Haidian District, Beijing. Kuaishou's predecessor "GIF Kuaishou" was founded in March 2011. GIF Kuaishou was a mobile app with which users could make and share GIF pictures. In 2013, Kuaishou became a short-video social platform. By 2013, the app had reached 100 million daily users. By 2019, it had exceeded 200 million active daily users. In March 2017, Kuaishou closed a US$350 million investment round that was led by Tencent. In January 2018, Forbes estimated the company's valuation to be US$18 billion. In April 2018, Kuaishou's app was briefly banned from Chinese app stores after China Central Television (CCTV) reported on the platform popularizing videos of teenage mothers. In 2019, the company announced a partnership with the People's Daily, an official newspaper of the Central Committee of the Chinese Communist Party, to help it experiment with the use of artificial intelligence in news. In June 2020, following the start of the 2020–2021 China–India skirmishes, the Government of India banned Kwai along with 58 other apps, citing "data and privacy issues". In January 2021, Kuaishou announced it was planning an initial public offering (IPO) to raise approximately US$5 billion. Kuaishou's stock completed its first day of trading at $300 Hong Kong dollars (HKD) (US$38.70), more than doubling its initial offer price, and causing its market value to rise to over $1 trillion HKD (US$159 billion). In February 2021, Kuaishou made a debut on the Hong Kong Stock Exchange, with its shares soaring by 194% at the opening. The company subsequently encountered major setbacks as a result of heightened regulatory restrictions on Chinese internet firms, which contributed to its share price falling by nearly 80% from its post-IPO peak. By December 2021, Kuaishou announced a major reorganization, including the layoff of 30% of its staff, primarily targeting mid-level employees earning an annual salary of $157,000 or more. This restructuring aimed to cut costs and mitigate financial losses. In October 2022, state-owned Beijing Radio and Television Station took a minority ownership stake in Kuaishou. In April 2024, a Financial Times article citing current and former Kuaishou employees stated that the company has been running an ageist redundancy programme known internally as "Limestone", culling workers in their mid-30s. In June 2024, Kuaishou and the Sichuan international communication center launched a branch center in São Paulo, Brazil. In June 2024, Kuaishou released its diffusion transformer text-to-video model, Kling, which they claimed could generate two minutes of video at 30 frames per second and in 1080p resolution. The model has been compared to that of OpenAI's Sora text-to-video model. It is accessible to the public on Kuaishou's video editing app KwaiCut via signing up for a waitlist with a Chinese phone number. In December 2025, Kuaishou came under a cyberattack which led to a temporary influx of violent and pornographic content. == Popularity == As of 2019, it had a worldwide user base of over 200 million, leading the "Most Downloaded" lists of the Google Play and Apple App Store in eight countries, such as Brazil, where it was introduced in 2019. Its main short-video platform competitor was Douyin, which is known as TikTok outside China. Compared to Douyin, Kuaishou is more popular with older users living outside China's Tier 1 cities. Its initial popularity came from videos of Chinese rural life. The app is particularly well known for its "rustic" aesthetic and is popular among rural people. Kuaishou also relied more on e-commerce revenue than on advertising revenue compared to its main competitor. == Reception == Kwai (as the app is called outside of China) was banned in India in 2020 along with other short video apps like TikTok. Kuaishou then released the clone SnackVideo, which was subsequently also banned. The app is one of the most popular social media platforms in Brazil, where Kuaishou partnered with creators to make telenovela style content, and appeals to football fans by working with football teams CR Flamengo and Santos FC and sponsoring the tournament Copa América. Kwai was notable in Brazil for spreading information (and misinformation) about the COVID-19 vaccine and political misinformation. === Manjiao Wenhua === "Manjiao wenhua" (慢脚文化) is a sarcasm term on Chinese internet on the unethical or illegal contents on Kuaishou. State broadcaster China Central Television (CCTV) reported that many contents are about child pregnancy. "Dating, pregnancy, bearing a child...these are strictly prohibited in the real time by a minor, but these contents can easily shown to audiences here." In addition, many students from primary or secondary schools make a pose of smoking. Wang Zhenhui (王贞会) from CUPSL stated that these kinds of bad values will give negative effects to the minors.

CLAWS (linguistics)

The Constituent Likelihood Automatic Word-tagging System (CLAWS) is a program that performs part-of-speech tagging. It was developed in the 1980s at Lancaster University by the University Centre for Computer Corpus Research on Language. It has an overall accuracy rate of 96–97% with the latest version (CLAWS4) tagging around 100 million words of the British National Corpus. == History == A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. Developed in the early 1980s, CLAWS was built to fill the ever-growing gap created by always-changing POS necessities. Originally created to add part-of-speech tags to the LOB corpus of British English, the CLAWS tagset has since been adapted to other languages as well, including Urdu and Arabic. Since its inception, CLAWS has been hailed for its functionality and adaptability. Still, it is not without flaws, and though it boasts an error-rate of only 1.5% when judged in major categories, CLAWS still remains with c.3.3% ambiguities unresolved. Ambiguity arises in cases such as with the word flies, and whether it should be classified as a noun or a verb. It's these ambiguities that will require the various upgrades and tagsets that CLAWS will endure. == Rules and processing == CLAWS uses a Hidden Markov model to determine the likelihood of sequences of words in anticipating each part-of-speech label. === Sample output === This excerpt from Bram Stoker's Dracula (1897) has been tagged using both the CLAWS C5 and C7 tagsets. This is what a CLAWS output will generally look like, with the most likely part-of-speech tag following each word. == Tagsets == === CLAWS1 tagset === The first tagset developed in CLAWS, CLAWS1 tagset, has 132 word tags. In terms of form and application, C1 tagset is similar to Brown Corpus tags. See Table of tags in C1 tagset here. === CLAWS2 tagset === From 1983 to 1986, updated versions leading to CLAWS2 were part of a larger attempt to deal with aspects such as recognizing sentence breaks, in order to avoid the need for manual pre-processing of a text before the tags were applied, moving instead to optional manual post-editing to adjust the output of the automatic annotation, if needed. The CLAWS2 tagset has 166 word tags. See Table of tags in C2 tagset here. === CLAWS4 tagset === The CLAWS4 was used for the 100-million-word British National Corpus (BNC). A general-purpose grammatical tagger, it is a successor of the CLAWS1 tagger. In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets. For example, the BNC project used two tagset versions: "a main tagset (C5) with 62 tags with which the whole of the corpus has been tagged, and a larger (C7) tagset with 152 tags, which has been used to make a selected 'core' sample corpus of two million words." The latest version of CLAWS4 is offered by UCREL, a research center of Lancaster University. === CLAWS5 tagset === The CLAWS5 tagset, which was used for BNC, has over 60 tags. See Table of tags in C5 tagset here. === CLAWS6 tagset === The CLAWS6 tagset was used for the BNC sampler corpus and the COLT corpus. It has over 160 tags, including 13 determiner subtypes. See Table of tags in C6 tagset here. === CLAWS7 tagset === The standard CLAWS7 tagset is used currently. It is only different in the punctuation tags when compared to the CLAWS6 tagset. See Table of tags in C7 tagset here. === CLAWS8 tagset === CLAWS8 tagset was extended from C7 tagset with further distinctions in the determiner and pronoun categories, as well as 37 new auxiliary tags for forms of be, do, and have. See Table of tags in C8 tagset here

Night Sky (app)

Night Sky (app) is an application developed and published by indie studio iCandi Apps Ltd. from the UK. Night Sky is a stargazing reference app, where the user can explore a virtual representation of the night sky to identify stars, planets, constellations and satellites. The app is developed specifically for iOS, tvOS and watchOS devices. Night Sky was first released on November 1, 2011 for iOS, and has had multiple updates since launch. Night Sky was mentioned in the September 2016 Apple Keynote during the Apple Watch Series 2 announcement. In October 2016, Night Sky was featured as the Free App of The Week on the Apple App Store. == Reception == Night Sky was featured in Apple's 'Best of 2012' and has also been pre-installed onto iPads in Apple retail stores worldwide.

Weak supervision

Weak supervision (also known as semi-supervised learning) is a paradigm in machine learning, the relevance and notability of which increased with the advent of large language models due to the large amount of data required to train them. It is characterized by using a combination of a small amount of human-labeled data (exclusively used in more expensive and time-consuming supervised learning paradigm), followed by a large amount of unlabeled data (used exclusively in unsupervised learning paradigm). In other words, the desired output values are provided only for a subset of the training data. The remaining data is unlabeled or imprecisely labeled. Intuitively, it can be seen as an exam and labeled data as sample problems that the teacher solves for the class as an aid in solving another set of problems. In the transductive setting, these unsolved problems act as exam questions. In the inductive setting, they become practice problems of the sort that will make up the exam. == Problem == The acquisition of labeled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labeling process thus may render large, fully labeled training sets infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value. Semi-supervised learning is also of theoretical interest in machine learning and as a model for human learning. == Technique == More formally, semi-supervised learning assumes a set of l {\displaystyle l} independently identically distributed examples x 1 , … , x l ∈ X {\displaystyle x_{1},\dots ,x_{l}\in X} with corresponding labels y 1 , … , y l ∈ Y {\displaystyle y_{1},\dots ,y_{l}\in Y} and u {\displaystyle u} unlabeled examples x l + 1 , … , x l + u ∈ X {\displaystyle x_{l+1},\dots ,x_{l+u}\in X} are processed. Semi-supervised learning combines this information to surpass the classification performance that can be obtained either by discarding the unlabeled data and doing supervised learning or by discarding the labels and doing unsupervised learning. Semi-supervised learning may refer to either transductive learning or inductive learning. The goal of transductive learning is to infer the correct labels for the given unlabeled data x l + 1 , … , x l + u {\displaystyle x_{l+1},\dots ,x_{l+u}} only. The goal of inductive learning is to infer the correct mapping from X {\displaystyle X} to Y {\displaystyle Y} . It is unnecessary (and, according to Vapnik's principle, imprudent) to perform transductive learning by way of inferring a classification rule over the entire input space; however, in practice, algorithms formally designed for transduction or induction are often used interchangeably. == Assumptions == In order to make any use of unlabeled data, some relationship to the underlying distribution of data must exist. Semi-supervised learning algorithms make use of at least one of the following assumptions: === Continuity / smoothness assumption === Points that are close to each other are more likely to share a label. This is also generally assumed in supervised learning and yields a preference for geometrically simple decision boundaries. In the case of semi-supervised learning, the smoothness assumption additionally yields a preference for decision boundaries in low-density regions, so few points are close to each other but in different classes. === Cluster assumption === The data tend to form discrete clusters, and points in the same cluster are more likely to share a label (although data that shares a label may spread across multiple clusters). This is a special case of the smoothness assumption and gives rise to feature learning with clustering algorithms. === Manifold assumption === The data lie approximately on a manifold of much lower dimension than the input space. In this case learning the manifold using both the labeled and unlabeled data can avoid the curse of dimensionality. Then learning can proceed using distances and densities defined on the manifold. The manifold assumption is practical when high-dimensional data are generated by some process that may be hard to model directly, but which has only a few degrees of freedom. For instance, human voice is controlled by a few vocal folds, and images of various facial expressions are controlled by a few muscles. In these cases, it is better to consider distances and smoothness in the natural space of the generating problem, rather than in the space of all possible acoustic waves or images, respectively. == History == The heuristic approach of self-training (also known as self-learning or self-labeling) is historically the oldest approach to semi-supervised learning, with examples of applications starting in the 1960s. The transductive learning framework was formally introduced by Vladimir Vapnik in the 1970s. Interest in inductive learning using generative models also began in the 1970s. A probably approximately correct learning bound for semi-supervised learning of a Gaussian mixture was demonstrated by Ratsaby and Venkatesh in 1995. == Methods == === Generative models === Generative approaches to statistical learning first seek to estimate p ( x | y ) {\displaystyle p(x|y)} , the distribution of data points belonging to each class. The probability p ( y | x ) {\displaystyle p(y|x)} that a given point x {\displaystyle x} has label y {\displaystyle y} is then proportional to p ( x | y ) p ( y ) {\displaystyle p(x|y)p(y)} by Bayes' rule. Semi-supervised learning with generative models can be viewed either as an extension of supervised learning (classification plus information about p ( x ) {\displaystyle p(x)} ) or as an extension of unsupervised learning (clustering plus some labels). Generative models assume that the distributions take some particular form p ( x | y , θ ) {\displaystyle p(x|y,\theta )} parameterized by the vector θ {\displaystyle \theta } . If these assumptions are incorrect, the unlabeled data may actually decrease the accuracy of the solution relative to what would have been obtained from labeled data alone. However, if the assumptions are correct, then the unlabeled data necessarily improves performance. The unlabeled data are distributed according to a mixture of individual-class distributions. In order to learn the mixture distribution from the unlabeled data, it must be identifiable, that is, different parameters must yield different summed distributions. Gaussian mixture distributions are identifiable and commonly used for generative models. The parameterized joint distribution can be written as p ( x , y | θ ) = p ( y | θ ) p ( x | y , θ ) {\displaystyle p(x,y|\theta )=p(y|\theta )p(x|y,\theta )} by using the chain rule. Each parameter vector θ {\displaystyle \theta } is associated with a decision function f θ ( x ) = argmax y p ( y | x , θ ) {\displaystyle f_{\theta }(x)={\underset {y}{\operatorname {argmax} }}\ p(y|x,\theta )} . The parameter is then chosen based on fit to both the labeled and unlabeled data, weighted by λ {\displaystyle \lambda } : argmax Θ ( log ⁡ p ( { x i , y i } i = 1 l | θ ) + λ log ⁡ p ( { x i } i = l + 1 l + u | θ ) ) {\displaystyle {\underset {\Theta }{\operatorname {argmax} }}\left(\log p(\{x_{i},y_{i}\}_{i=1}^{l}|\theta )+\lambda \log p(\{x_{i}\}_{i=l+1}^{l+u}|\theta )\right)} === Low-density separation === Another major class of methods attempts to place boundaries in regions with few data points (labeled or unlabeled). One of the most commonly used algorithms is the transductive support vector machine, or TSVM (which, despite its name, may be used for inductive learning as well). Whereas support vector machines for supervised learning seek a decision boundary with maximal margin over the labeled data, the goal of TSVM is a labeling of the unlabeled data such that the decision boundary has maximal margin over all of the data. In addition to the standard hinge loss ( 1 − y f ( x ) ) + {\displaystyle (1-yf(x))_{+}} for labeled data, a loss function ( 1 − | f ( x ) | ) + {\displaystyle (1-|f(x)|)_{+}} is introduced over the unlabeled data by letting y = sign ⁡ f ( x ) {\displaystyle y=\operatorname {sign} {f(x)}} . TSVM then selects f ∗ ( x ) = h ∗ ( x ) + b {\displaystyle f^{}(x)=h^{}(x)+b} from a reproducing kernel Hilbert space H {\displaystyle {\mathcal {H}}} by minimizing the regularized empirical risk: f ∗ = argmin f ( ∑ i = 1 l ( 1 − y i f ( x i ) ) + + λ 1 ‖ h ‖ H 2 + λ 2 ∑ i = l + 1 l + u ( 1 − | f ( x i ) | ) + ) {\displaystyle f^{}={\underset {f}{\operatorname {argmin} }}\left(\displaystyle \sum _{i=1}^{l}(1-y_{i}f(x_{i}))_{+}+\lambda _{1}\|h\|_{\mathcal {H}}^{2}+\lambda _{2}\sum _{i=l+1}^{l+u}(1-|f(x_{i})|)_{+}\right)} An exact solution is intractable due to the non-convex term ( 1 − | f ( x ) | ) + {\displayst

You Only Look Once

You Only Look Once (YOLO) is a series of real-time object detection systems based on convolutional neural networks. First introduced by Joseph Redmon et al. in 2015, YOLO has undergone several iterations and improvements, becoming one of the most popular object detection frameworks. The name "You Only Look Once" refers to the fact that the algorithm requires only one forward propagation pass through the neural network to make predictions, unlike previous region proposal-based techniques like R-CNN that require thousands for a single image. == Overview == Compared to previous methods like R-CNN and OverFeat, instead of applying the model to an image at multiple locations and scales, YOLO applies a single neural network to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities. === OverFeat === OverFeat was an early influential model for simultaneous object classification and localization. Its architecture is as follows: Train a neural network for image classification only ("classification-trained network"). This could be one like the AlexNet. The last layer of the trained network is removed, and for every possible object class, initialize a network module at the last layer ("regression network"). The base network has its parameters frozen. The regression network is trained to predict the ( x , y ) {\displaystyle (x,y)} coordinates of two corners of the object's bounding box. During inference time, the classification-trained network is run over the same image over many different zoom levels and croppings. For each, it outputs a class label and a probability for that class label. Each output is then processed by the regression network of the corresponding class. This results in thousands of bounding boxes with class labels and probability. These boxes are merged until only one single box with a single class label remains. == Versions == There are two parts to the YOLO series. The original part contained YOLOv1, v2, and v3, all released on a website maintained by Joseph Redmon. === YOLOv1 === The original YOLO algorithm, introduced in 2015, divides the image into an S × S {\displaystyle S\times S} grid of cells. If the center of an object's bounding box falls into a grid cell, that cell is said to "contain" that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes. These confidence scores reflect how confident the model is that the box contains an object and how accurate it thinks the box is that it predicts. In more detail, the network performs the same convolutional operation over each of the S 2 {\displaystyle S^{2}} patches. The output of the network on each patch is a tuple as follows: ( p 1 , … , p C , c 1 , x 1 , y 1 , w 1 , h 1 , … , c B , x B , y B , w B , h B ) {\displaystyle (p_{1},\dots ,p_{C},c_{1},x_{1},y_{1},w_{1},h_{1},\dots ,c_{B},x_{B},y_{B},w_{B},h_{B})} where p i {\displaystyle p_{i}} is the conditional probability that the cell contains an object of class i {\displaystyle i} , conditional on the cell containing at least one object. x j , y j , w j , h j {\displaystyle x_{j},y_{j},w_{j},h_{j}} are the center coordinates, width, and height of the j {\displaystyle j} -th predicted bounding box that is centered in the cell. Multiple bounding boxes are predicted to allow each prediction to specialize in one kind of bounding box. For example, slender objects might be predicted by j = 2 {\displaystyle j=2} while stout objects might be predicted by j = 1 {\displaystyle j=1} . c j {\displaystyle c_{j}} is the predicted intersection over union (IoU) of each bounding box with its corresponding ground truth. The network architecture has 24 convolutional layers followed by 2 fully connected layers. During training, for each cell, if it contains a ground truth bounding box, then only the predicted bounding boxes with the highest IoU with the ground truth bounding boxes is used for gradient descent. Concretely, let j {\displaystyle j} be that predicted bounding box, and let i {\displaystyle i} be the ground truth class label, then x j , y j , w j , h j {\displaystyle x_{j},y_{j},w_{j},h_{j}} are trained by gradient descent to approach the ground truth, p i {\displaystyle p_{i}} is trained towards 1 {\displaystyle 1} , other p i ′ {\displaystyle p_{i'}} are trained towards zero. If a cell contains no ground truth, then only c 1 , c 2 , … , c B {\displaystyle c_{1},c_{2},\dots ,c_{B}} are trained by gradient descent to approach zero. === YOLOv2 === Released in 2016, YOLOv2 (also known as YOLO9000) improved upon the original model by incorporating batch normalization, a higher resolution classifier, and using anchor boxes to predict bounding boxes. It could detect over 9000 object categories. It was also released on GitHub under the Apache 2.0 license. === YOLOv3 === YOLOv3, introduced in 2018, contained only "incremental" improvements, including the use of a more complex backbone network, multiple scales for detection, and a more sophisticated loss function. === YOLOv4 and beyond === Subsequent versions of YOLO (v4, v5, etc.) have been developed by different researchers, further improving performance and introducing new features. These versions are not officially associated with the original YOLO authors but build upon their work. As of 2026, versions up to YOLO26 have been released..

Deep image prior

Deep image prior is a type of convolutional neural network used to enhance a given image with no prior training data other than the image itself. A neural network is randomly initialized and used as prior to solve inverse problems such as noise reduction, super-resolution, and inpainting. Image statistics are captured by the structure of a convolutional image generator rather than by any previously learned capabilities. == Method == === Background === Inverse problems such as noise reduction, super-resolution, and inpainting can be formulated as the optimization task x ∗ = m i n x E ( x ; x 0 ) + R ( x ) {\displaystyle x^{}=min_{x}E(x;x_{0})+R(x)} , where x {\displaystyle x} is an image, x 0 {\displaystyle x_{0}} a corrupted representation of that image, E ( x ; x 0 ) {\displaystyle E(x;x_{0})} is a task-dependent data term, and R(x) is the regularizer. Deep neural networks learn a generator/decoder x = f θ ( z ) {\displaystyle x=f_{\theta }(z)} which maps a random code vector z {\displaystyle z} to an image x {\displaystyle x} . The image corruption method used to generate x 0 {\displaystyle x_{0}} is selected for the specific application. === Specifics === In this approach, the R ( x ) {\displaystyle R(x)} prior is replaced with the implicit prior captured by the neural network (where R ( x ) = 0 {\displaystyle R(x)=0} for images that can be produced by a deep neural networks and R ( x ) = + ∞ {\displaystyle R(x)=+\infty } otherwise). This yields the equation for the minimizer θ ∗ = a r g m i n θ E ( f θ ( z ) ; x 0 ) {\displaystyle \theta ^{}=argmin_{\theta }E(f_{\theta }(z);x_{0})} and the result of the optimization process x ∗ = f θ ∗ ( z ) {\displaystyle x^{}=f_{\theta ^{}}(z)} . The minimizer θ ∗ {\displaystyle \theta ^{}} (typically a gradient descent) starts from a randomly initialized parameters and descends into a local best result to yield the x ∗ {\displaystyle x^{}} restoration function. ==== Overfitting ==== A parameter θ may be used to recover any image, including its noise. However, the network is reluctant to pick up noise because it contains high impedance while useful signal offers low impedance. This results in the θ parameter approaching a good-looking local optimum so long as the number of iterations in the optimization process remains low enough not to overfit data. === Deep Neural Network Model === Typically, the deep neural network model for deep image prior uses a U-Net like model without the skip connections that connect the encoder blocks with the decoder blocks. The authors in their paper mention that "Our findings here (and in other similar comparisons) seem to suggest that having deeper architecture is beneficial, and that having skip-connections that work so well for recognition tasks (such as semantic segmentation) is highly detrimental." == Applications == === Denoising === The principle of denoising is to recover an image x {\displaystyle x} from a noisy observation x 0 {\displaystyle x_{0}} , where x 0 = x + ϵ {\displaystyle x_{0}=x+\epsilon } . The distribution ϵ {\displaystyle \epsilon } is sometimes known (e.g.: profiling sensor and photon noise) and may optionally be incorporated into the model, though this process works well in blind denoising. The quadratic energy function E ( x , x 0 ) = | | x − x 0 | | 2 {\displaystyle E(x,x_{0})=||x-x_{0}||^{2}} is used as the data term, plugging it into the equation for θ ∗ {\displaystyle \theta ^{}} yields the optimization problem m i n θ | | f θ ( z ) − x 0 | | 2 {\displaystyle min_{\theta }||f_{\theta }(z)-x_{0}||^{2}} . === Super-resolution === Super-resolution is used to generate a higher resolution version of image x. The data term is set to E ( x ; x 0 ) = | | d ( x ) − x 0 | | 2 {\displaystyle E(x;x_{0})=||d(x)-x_{0}||^{2}} where d(·) is a downsampling operator such as Lanczos that decimates the image by a factor t. === Inpainting === Inpainting is used to reconstruct a missing area in an image x 0 {\displaystyle x_{0}} . These missing pixels are defined as the binary mask m ∈ { 0 , 1 } H × V {\displaystyle m\in \{0,1\}^{H\times V}} . The data term is defined as E ( x ; x 0 ) = | | ( x − x 0 ) ⊙ m | | 2 {\displaystyle E(x;x_{0})=||(x-x_{0})\odot m||^{2}} (where ⊙ {\displaystyle \odot } is the Hadamard product). The intuition behind this is that the loss is computed only on the known pixels in the image, and the network is going to learn enough about the image to fill in unknown parts of the image even though the computed loss doesn't include those pixels. This strategy is used to remove image watermarks by treating the watermark as missing pixels in the image. === Flash–no-flash reconstruction === This approach may be extended to multiple images. A straightforward example mentioned by the author is the reconstruction of an image to obtain natural light and clarity from a flash–no-flash pair. Video reconstruction is possible but it requires optimizations to take into account the spatial differences. == Implementations == A reference implementation rewritten in Python 3.6 with the PyTorch 0.4.0 library was released by the author under the Apache 2.0 license: deep-image-prior A TensorFlow-based implementation written in Python 2 and released under the CC-SA 3.0 license: deep-image-prior-tensorflow A Keras-based implementation written in Python 2 and released under the GPLv3: machine_learning_denoising == Example == See Astronomy Picture of the Day (APOD) of 2024-02-18