Meta-Labeling

Meta-Labeling

Meta-labeling, also known as corrective AI, is a machine learning (ML) technique utilized in quantitative finance to enhance the performance of investment and trading strategies, developed in 2017 by Marcos López de Prado at Guggenheim Partners and Cornell University. The core idea is to separate the decision of trade direction (side) from the decision of trade sizing, addressing the inefficiencies of simultaneously learning both side and size predictions. The side decision involves forecasting market movements (long, short, neutral), while the size decision focuses on risk management and profitability. It serves as a secondary decision-making layer that evaluates the signals generated by a primary predictive model. By assessing the confidence and likely profitability of those signals, meta-labeling allows investors and algorithms to dynamically size positions and suppress false positives. == Motivation == Meta-labeling is designed to improve precision without sacrificing recall. As noted by López de Prado, attempting to model both the direction and the magnitude of a trade using a single algorithm can result in poor generalization. By separating these tasks, meta-labeling enables greater flexibility and robustness: Enhances control over capital allocation. Reduces overfitting by limiting model complexity. Allows the use of interpretability tools and tailored thresholds to manage risk. Enables dynamic trade suppression in unfavorable regimes. == Applications == Meta-labeling has been applied in a variety of financial ML contexts, including: Algorithmic trading: Filtering and sizing trades to reduce false positives. Portfolio optimization: Scaling exposure across multiple signals with differing confidence levels. Risk management: Dynamically disabling strategies in adverse market conditions. Model validation: Interpreting when and why a model may be underperforming due to regime shifts. == General architecture == Meta-labeling decouples two core components of systematic trading strategies: directional prediction and position sizing. The process involves training a primary model to generate trade signals (e.g., buy, sell, or hold) and then training a secondary model to determine whether each signal is likely to lead to a profitable trade. The second model outputs a probability that is interpreted as the confidence in the forecast, which can be used to adjust the position size or to filter out unreliable trades. Meta-labeling is typically implemented as a three-stage process: Primary model (M1): Predicts the direction or label of a financial outcome using features such as market prices, returns, or volatility indicators. A typical output is directional, e.g., Y ∈ {−1,0,1}, representing short, neutral, or long positions. Secondary model (M2): A binary classifier trained to predict whether the primary model's prediction will be profitable. The target variable is a binary meta-label F ∈ { 0 , 1 } {\displaystyle F\in \{0,1\}} . Inputs can include features used in the primary model, performance diagnostics, or market regime data. Position sizing algorithm (M3): Translates the output probability of the secondary model into a position size. Higher confidence scores result in larger allocations, while lower confidence leads to reduced or zero exposure. === Stage 1: Forecasting side === Primary model architecture Figure 1 Figure 1 presents the architecture of a primary model. It focuses on forecasting the side of the trade. Following the example, this model (M1) takes in input data – such as open-high-low-close data and determines the side of the position to take: a negative number is a short position, and positive number is a long position, the range is set between −1 and 1 (the closer it is to −1 or 1, the stronger the models conviction is). When training the model, the labels are −1 and 1, based on the direction of forward returns for some predefined investment horizon. The researcher may decide to apply a recall check (τ: "Tau") by setting a minimum threshold that the initial output needs to be to qualify of a short or long position (if the threshold is not met, no side forecast is predicted, leading to closing of any open positions), this leads to the primary model output which is one of three possible side forecasts: −1, 0, or 1. The primary model also generates evaluation data which can be used by the secondary model, to improve performance of size forecasts. Some examples of evaluation data include rolling accuracy, F1, recall, precision, and AUC scores. === Stage 2: Filtering out false positives === General meta-labeling architecture Figure 2 Next comes the phase of filtering out false positives, by applying a secondary machine learning model (M2), which is a binary classifier trained to determine if the trade will be profitable or not. The model takes as input four general groupings of data: General input data which is predictive of a false positive. For example the last 30 days rolling volatility of the underlying asset. Evaluation data. Market state and regime data, one may find that macro economic data or clustering the market into regimes may help as specific trading strategies are known to perform better in particular regimes. Example: momentum based strategies perform best in periods with low volatility and strong directional moves. Primary models initial input which is a value between −1 and 1. This highlights the strength of the primary models conviction. The output of the model is a value between −1 and 1 (if using a Tanh function) which will indicate the strength of the conviction that a short or long position is profitable, or it could simply be between 0 and 1 (using a sigmoid function) if one only wanted to know if it made money or not. This output allows filtering out trades that are likely to lead to losses. One could stop at this point or use the outputs of the secondary model as inputs to a position sizing algorithm (M3) which could further enhance strategy performance metrics by translating the output probability of the secondary model into a position size. Higher confidence scores result in larger allocations, while lower confidence leads to reduced or zero exposure. === Stage 3: Optimizing position sizes === ==== Position sizing methods (M3) ==== Various algorithms have been proposed for transforming predicted probabilities into trade sizes: All-or-nothing: Allocate 100% of capital if the probability exceeds a predefined threshold (e.g., 0.5); otherwise, do not trade. Model confidence: Use the probability score directly as the fraction of capital allocated. Linear scaling: Rescale the model's probabilities using min-max normalization based on the training data. Normal CDF (NCDF): Use a normal cumulative distribution function applied to a z-statistic derived from the predicted probability. Empirical CDF (ECDF): Rank probabilities based on their percentile in the training data to ensure relative allocation. Sigmoid Optimal Position Sizing (SOPS): Applies a smooth non-linear sigmoid transformation optimized to maximize risk-adjusted returns (Sharpe ratio). ==== Model calibration ==== Each machine learning algorithm used in meta-labeling tends to produce outputs with different characteristic distributions; for example, some are approximately normally distributed, whereas others exhibit a pronounced U-shape, concentrating probabilities near the extremes. Due to these varying distributions, simply summing the outputs of different models can inadvertently lead to uneven weighting of signals, biasing trade decisions. To address this, model calibration techniques are essential to adjust the predicted probabilities towards frequentist probabilities, ensuring that model outputs reflect true likelihoods more accurately. Two common calibration techniques are: Platt scaling (Sigmoid scaling): Suitable for correcting S-shaped calibration plots typically produced by models such as support vector machines (SVMs). Isotonic regression: Fits a non-decreasing step function to probabilities and is effective particularly with larger datasets, though it can sometimes lead to overfitting. Transforming predictions to frequentist probabilities is crucial as it provides probabilistic outputs that are directly interpretable as the actual likelihood of an event occurring. Such calibration significantly enhances the effectiveness of fixed position sizing methods, reducing maximum drawdowns and increasing risk-adjusted returns. However, calibration has less impact on position sizing methods that directly estimate parameters from the training data, such as ECDF and SOPS, suggesting that calibration is a critical step mainly for fixed methods that rely heavily on raw model outputs. =

Percept (artificial intelligence)

A percept is the input that an intelligent agent is perceiving at any given moment. It is essentially the same concept as a percept in psychology, except that it is being perceived not by the brain but by the agent. A percept is detected by a sensor, often a camera, processed accordingly, and acted upon by an actuator. Each percept is added to a "percept sequence", which is a complete history of each percept ever detected. The agent's action at any instant point may depend on the entire percept sequence up to that particular instant point. An intelligent agent chooses how to act not only based on the current percept, but the percept sequence. The next action is chosen by the agent function, which maps every percept to an action. For example, if a camera were to record a gesture, the agent would process the percepts, calculate the corresponding spatial vectors, examine its percept history, and use the agent program (the application of the agent function) to act accordingly. == Examples == Examples of percepts include inputs from touch sensors, cameras, infrared sensors, sonar, microphones, mice, and keyboards. A percept can also be a higher-level feature of the data, such as lines, depth, objects, faces, or gestures.

The Dodo (website)

The Dodo is an American online publisher focused on animals. The website was launched in January 2014 by Izzie Lerer, the daughter of media executive Kenneth Lerer, and journalist Kerry Lauerman. The Dodo has become one of the most popular Facebook publishers, garnering 1 billion video views from the social network in November 2015. The Dodo is headquartered in New York, New York. == History == The company—named after the first recorded species that humans drove to extinction—was founded by Lerer out of "a personal passion for the subject manner". Lerer has a PhD in animal studies with a focus on animal ethics and human relationships from Columbia University, launching the website after noticing the viral success of animal videos online but seeing no one "really owned the space." The Dodo's editorial and video production staff unionized with the Writers Guild of America, East in April 2018.

Scalable Video Coding

Scalable Video Coding (SVC) is a video compression standard developed jointly by the ITU-T and the ISO/IEC. The two organizations formed the Joint Video Team (JVT) to create the H.264/MPEG-4 AVC standard (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC). SVC aims to provide adaptable or scalable content, allowing a single encoded video stream to be decoded at various bitrates, resolutions, and quality levels, thus catering to diverse devices and network conditions. == History == In October 2003, the Moving Picture Experts Group (MPEG) issued a Call for Proposals on SVC Technology. Fourteen proposals were submitted, twelve of which utilized wavelet compression, while the remaining two were extensions of H.264/MPEG-4 AVC. The proposal from the Heinrich-Hertz-Institut (HHI) was selected by MPEG as the foundation for the SVC standardization project. In January 2005, MPEG and the Video Coding Experts Group (VCEG) agreed to finalize SVC as an amendment to the H.264/MPEG-4 AVC standard. In November 2008, Google launched Gmail Video Chat, which employed an H.264/SVC codec, marking the first consumer application of the standard. This service was succeeded by Google+ Hangouts in 2012. In 2011, Google Code highlighted SVC as the successor to the open-source RVC video chat engine, noting its prominence in 2010. == Principles of scalability == === Overview === Scalability refers to the ability to represent a video signal at multiple levels of detail within a single encoded bitstream. This enables decoding of a base layer for basic quality and additional enhancement layers for progressively higher quality. SVC defines three types of scalability: Spatial scalability: Supports multiple resolution levels. Temporal scalability: Enables varying frame rates. Quality scalability: Provides different image quality levels. === Spatial scalability === Spatial scalability allows the reconstruction of video at different resolutions, such as QCIF, CIF, or SD. This is achieved through a pyramidal decomposition into multiple spatial layers. === Temporal scalability === Temporal scalability adjusts the frame rate of the decoded video stream. Various frame rates are supported using a hierarchical structure of video frames. === Quality scalability === Quality scalability, or Signal-to-Noise Ratio (SNR) scalability, improves the signal-to-noise ratio of a layer, reducing quantization distortion between the original and reconstructed images. SVC supports two approaches: Fine Grain Scalability (FGS) and Coarse Grain Scalability (CGS). ==== Coarse Grain Scalability (CGS) ==== CGS incorporates quality scalability across spatial resolutions. Each spatial resolution is encoded as a separate layer, refining texture and motion data. For a given resolution, quality scalability is achieved by encoding multiple quality layers with progressively finer quantization steps, starting from a base layer with minimal quality. ==== Fine Grain Scalability (FGS) ==== FGS enables progressive refinement of transformed coefficients within a single spatial layer. The base quality layer is encoded using the AVC standard with an initial quantization parameter (QP) ensuring minimal acceptable quality. Subsequent refinement layers reduce the QP by six, halving the quantization step. The refinement data stream can be truncated at any point, allowing fine-grained quality scalability.

News ticker

A news ticker (sometimes called a crawler, crawl, slide, zipper, ticker tape, or chyron) is a horizontal or vertical (depending on the language's writing system) text-based display either in the form of a graphic that typically resides in the lower third of the screen space on a television station or network (usually during news programming) or as a long, thin scoreboard-style display seen around the facades of some offices or public buildings dedicated to presenting headlines or minor pieces of news. It is an evolution of the paper strips tapes, a continuous paper print-out of stock quotes from a printing telegraph which was mainly used to transmit companies' share price information over telegraph lines before the advance of technology in the 1960s. News tickers have been used in Europe in countries such as United Kingdom, Germany and Ireland for some years; they are also used in several Asian countries and Australia. In the United States, tickers were long used on a special event basis by broadcast television stations to disseminate weather warnings, school closings, and election results. Sports telecasts occasionally used a ticker to update other contests in progress before the expansion of cable news networks and the internet for news content. In addition, some ticker displays are used to relay continuous business and financial information. Most tickers are traditionally displayed in the form of scrolling text running from right to left across the screen or building display (or in the opposite direction for right-to-left writing systems such as Arabic script and Hebrew), allowing for headlines of varying degrees of detail; some used by television broadcasters, however, display stories in a static manner (allowing for the seamless switching of each story individually programmed for display) or utilize a "flipping" effect (in which each individual headline is shown for a few seconds before transitioning to the next, instead of scrolling across the screen, usually resulting in a relatively quicker run through of all of the information programmed into the ticker). Since the growth in usage of the World Wide Web, some news tickers have syndicated news stories posted largely on websites of broadcasters or by other independent news agencies. == Current uses == === Television === The presentation of headlines or other information in a news ticker has become a common element of many different news networks. The use of the ticker has differed on a number of channels: News networks and local newscasts commonly use a setup in which news headlines are scrolled across an area near the bottom of the screen, though some variations have formed, such as showing one headline at a time with a scrolling or "flipper" effect. Financial news channels use two or more tickers displaying company shares prices and business headlines. Networks with a focus on sports often use a slightly different system, where scores and statuses of ongoing and finished games are displayed one by one, along with minor sports highlights, statistics and sports news headlines. They are typically divided into categories devoted to specific leagues and events (with college basketball and football usually focusing on the top 25 ranked teams on the AP Poll, occasionally supplemented by sections for specific conferences). Some programs, including news-based programs emphasizing viewer interactivity, or special events, may also use tickers to display messages and reactions from viewers and others that relate to the program. These comments are often sourced from social networking services such as Facebook and Twitter, typically curating comments from a specific page or hashtag. Due to their current prevalence, they have been occasionally been made targets of pranks and vandalism. In one such example, News 14 Carolina allowed viewers to submit relevant information such as school closings or traffic delays via telephone or the Internet that would be incorporated into the ticker; the system was exploited in February 2004 to display humorous and crude messages, including the infamous "All your base are belong to us". Occasionally messages intended for training accidentally end up being put on the live ticker as happened on BBC News in 2022 when "Weather rain everywhere" and "Manchester United are rubbish" appeared on the live news ticker. Some businesses and organizations have utilized tickers intended for relaying weather-related closings as a surreptitious source for free guerrilla marketing, proclaiming they were open rather than closed and giving their phone number if possible, allowing them to 'advertise' on a television station all day for free. Since then, many stations have required pre-registration of businesses or organizations with an authorized representative and a signed affidavit on company letterhead affirming their authenticity, along with filtering out unfamiliar businesses and organizations, before being able to display their closing announcements. Stations also confirm all closings involving school districts with authorized officials to prevent situations in which students either show up to canceled classes in dangerous conditions, or do not attend school due to an erroneous, prank-submitted, or false listing. === On personal computers === Various applications have been developed over time to install news tickers on personal computer desktops using RSS feeds from news organizations, which are displayed in a fashion similar to those used by television channels but enable the user to access to underlying news stories, a feature not offered by traditional television channels. The Bloomberg Terminal and other financial information-tracking programs and devices also utilize tickers. A ticker may also be used as an unobtrusive method by businesses in order to deliver important information to their staff. The ticker can be set to reappear, stay on screen, or be put into a retractable mode (where a small tab is left visible on-screen). In the United Kingdom, broadcasters have stopped using this technology as other forms of communications have become available and increased in popularity. BBC News and Sky News discontinued their respective desktop tickers in March 2011 and 2012 to focus on other products, such as smartphone applications, to deliver updated information on breaking news and sport stories. === News tickers on buildings === Since the advent of the telegraph, newspapers commonly used their buildings to share the latest headlines. At first simple chalkboard signs were used for bulletins, but limelight illumination, electric lights, magic lantern projections, and other novel techniques were later employed. The method of using electric lights to spell out moving letters was invented by Frank C. Reilly (August 20, 1888 – April 10, 1947) and patented in 1923. Reilly called his invention the Motograph News Bulletin. In 1928, The New York Times installed a Motograph News Bulletin to display news headlines on the sides of Times Tower. The display was 388 feet (118 m) long, 5 feet (1.5 m) high, and employed over 14,800 light bulbs. Popularly known as the "Zipper", the sign remained in use until the building was sold in 1961. The sign was darkened during World War II to comply with wartime lighting restrictions. The Motograph operated until 1994 and was replaced by an electronic version in 1995, which was in turn removed in 2017 due to the replacement of all individual screens on the front of One Times Square with a 350 foot (110 m)-tall LED billboard in 2018. Ticker displays appear today on the exterior of the News Corp Building, which houses the headquarters for Fox News Channel/News Corp in the west extension of Manhattan's Rockefeller Center, as well as one that displays delayed stock market data that is located in Times Square. NASDAQ itself features a large display screen on the facade of the NASDAQ MarketSite building in Times Square. The Reuters buildings at Canary Wharf and in Toronto have news and stock tickers; the latter type features market data for the New York Stock Exchange, NASDAQ and London Stock Exchange, while the Toronto building's ticker also includes quotes from the Toronto Stock Exchange. A red-LED ticker was added to the perimeter of 10 Rockefeller Center in 1994, as the building was being renovated to accommodate the studios for NBC's Today. Placed at the juncture of the first and second floors, the ticker is visible to spectators in Rockefeller Plaza and passersby on West 49th Street and updates continuously, even at times when Today is not being produced and broadcast. As of 2015, the ticker strip is only a small part of a large two-floor LCD video display that is placed within the window of the studio showing promotional information. The Martin Place Headquarters of Seven News, the news division of Australian television broadcaster Seven Network, also incorporates a ticker that wraps around the building. == In popular culture == The use of new

Hallucination (artificial intelligence)

In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting, confabulation, or delusion) is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where a hallucination typically involves false percepts. For example, a chatbot powered by large language models (LLMs), like ChatGPT, may embed plausible-sounding random falsehoods within its generated content. Detecting and mitigating errors and hallucinations pose significant challenges for practical deployment and reliability of LLMs in high-stakes scenarios, such as chip design, supply chain logistics, and medical diagnostics. Some software engineers and statisticians have criticized the specific term "AI hallucination" for unreasonably anthropomorphizing computers. Symbolic artificial intelligence models generally do not produce hallucinations, unlike large language models. == Term == === Origin === Since the 1980s, the term "hallucination" has been used in computer vision with a positive connotation to describe the process of adding detail to an image. For example, the task of generating high-resolution face images from low-resolution inputs is called face hallucination. The first documented use of the term "hallucination" in this sense is in the PhD thesis of Eric Mjolsness in 1986. A notable work is the face hallucination algorithm by Simon Baker and Takeo Kanade published in 1999. In the 2000s, hallucinations were described in statistical machine translation as a failure mode. Since the 2010s, the term has undergone a semantic shift to signify the generation of factually incorrect or misleading outputs by AI systems in tasks like machine translation and object detection. In 2015, hallucinations were identified in visual semantic role labeling tasks by Saurabh Gupta and Jitendra Malik. In 2015, computer scientist Andrej Karpathy used the term "hallucinated" in a blog post to describe his recurrent neural network (RNN) language model generating an incorrect citation link. In 2017, Google researchers used the term to describe the responses generated by neural machine translation (NMT) models when they are not related to the source text, and in 2018, the term was used in computer vision to describe instances where non-existent objects are erroneously detected because of adversarial attacks. In July 2021, Meta warned during its release of BlenderBot 2 that the system is prone to "hallucinations", which Meta defined as "confident statements that are not true". Following OpenAI's ChatGPT release in beta version in November 2022, some users complained that such chatbots often seem to pointlessly embed plausible-sounding random falsehoods within their generated content. Many news outlets, including The New York Times, started to use the term "hallucinations" to describe these models' frequently incorrect or inconsistent responses. In 2023, the Cambridge dictionary updated its definition of hallucination to include this new sense specific to the field of AI. Some researchers have highlighted a lack of consistency in how the term is used, but also identified several alternative terms in the literature, such as confabulations, fabrications, and factual errors. === Definitions and alternatives === Uses, definitions and characterizations of the term "hallucination" in the context of LLMs include: "a tendency to invent facts in moments of uncertainty" (OpenAI, May 2023) "a model's logical mistakes" (OpenAI, May 2023) "fabricating information entirely, but behaving as if spouting facts" (CNBC, May 2023) "making up information" (The Verge, February 2023) "probability distributions" (in scientific contexts) Journalist Benj Edwards, in Ars Technica, writes that the term "hallucination" is controversial, but that some form of metaphor remains necessary; Edwards suggests "confabulation" as an analogy for processes that involve "creative gap-filling". In July 2024, a White House report on fostering public trust in AI research mentioned hallucinations only in the context of reducing them. Notably, when acknowledging David Baker's Nobel Prize-winning work with AI-generated proteins, the Nobel committee avoided the term entirely, instead referring to "imaginative protein creation". Hicks, Humphries, and Slater, in their article in Ethics and Information Technology, argue that the output of LLMs is "bullshit" under Harry Frankfurt's definition of the term, and that the models are "in an important way indifferent to the truth of their outputs", with true statements only accidentally true, and false ones accidentally false. Some researchers also use the derogatory term "botshit", often referring to uncritical use of AI. === Criticism === In the scientific community, some researchers avoid the term "hallucination", seeing it as potentially misleading. It has been criticized by Usama Fayyad, executive director of the Institute for Experimental Artificial Intelligence at Northeastern University, on the grounds that it misleadingly personifies large language models and is vague. Mary Shaw said, "The current fashion for calling generative AI's errors 'hallucinations' is appalling. It anthropomorphizes the software, and it spins actual errors as somehow being idiosyncratic quirks of the system even when they're objectively incorrect." In Salon, statistician Gary Smith argues that LLMs "do not understand what words mean" and consequently that the term "hallucination" unreasonably anthropomorphizes the machine. Murray Shanahan argues that anthropomorphic framing of LLM capabilities, including terms like "hallucination", encourages users and researchers to attribute cognitive processes to systems that operate through statistical pattern completion, and advocates for more careful linguistic practices when discussing LLM behavior. Kristina Šekrst argues that applying psychological vocabulary to LLM outputs obscures the difference between the appearance of mental properties and their genuine presence. Förster & Skop assert that tech companies use the hallucination metaphor to anthropomorphize models and deflect responsibility for non-factual outputs. Some see the AI outputs not as illusory but as prospective—that is, having some chance of being true, similar to early-stage scientific conjectures. The term has also been criticized for its association with psychedelic drug experiences. == In natural language generation == In natural language generation, there are several reasons why natural language models hallucinate: === Hallucination from data === Hallucinations can stem from incomplete, inaccurate or unrepresentative data sets. === Modeling-related causes === The pre-training of generative pretrained transformers (GPT) involves predicting the next word. It incentivizes GPT models to "give a guess" about what the next word is, even when they lack information. Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension between novelty and usefulness. For instance, Amabile and Pratt define human creativity as the production of novel and useful ideas. By extension, a focus on novelty in machine creativity can lead to the production of original but inaccurate responses—that is, falsehoods—whereas a focus on usefulness may result in memorized content lacking originality. By 2022, newspapers such as The New York Times expressed concern that, as the adoption of bots based on large language models continued to grow, unwarranted user confidence in bot output could lead to problems. === Interpretability research === In 2025, interpretability research by Anthropic on the LLM Claude identified internal circuits that cause it to decline to answer questions unless it knows the answer. By default, the circuit is active and the LLM doesn't answer. When the LLM has sufficient information, these circuits are inhibited and the LLM answers the question. Hallucinations were found to occur when this inhibition happens incorrectly, such as when Claude recognizes a name but lacks sufficient information about that person, causing it to generate plausible but untrue responses. === Examples === On 15 November 2022, researchers from Meta AI published Galactica, designed to "store, combine and reason about scientific knowledge". Content generated by Galactica came with the warning: "Outputs may be unreliable! Language Models are prone to hallucinate text." In one case, when asked to draft a paper on creating avatars, Galactica cited a fictitious paper from a real author who works in the relevant area. Meta withdrew Galactica on 17 November due to offensiveness and inaccuracy. OpenAI's ChatGPT, released in beta version to the public on November 30, 2022, was based on the foundation model GPT-3.5 (a revision of GPT-3). Professor Ethan Mollick of Wharton called it an "omniscient, eager-to-please intern who sometimes lies to you". Data scientist Teresa Kuba

Digital history

Digital history is the use of digital media to further historical analysis, presentation, and research. It is a branch of the digital humanities and an extension of quantitative history, cliometrics, and computing. Digital history is commonly known as digital public history, concerned primarily with engaging online audiences with historical content, or digital research methods, that further academic research. Digital history outputs include: digital archives, online presentations, data and information visualizations, interactive maps, timelines, audio files, and virtual worlds. These outputs are designed to enhance accessibility to users, facilitating engagement with historical content. Recent digital history projects focus on creativity, collaboration, and technical innovation, text mining, corpus linguistics, network analysis, 3D modeling, and big data analysis. By utilizing these resources, the user can rapidly develop new analyses that can link to, extend, and bring to life existing histories. == History == Rooted in earlier social science history work, particularly around the history of enslavement in the United States, early digital history in the 1960s and 70s focused on using computers to conduct quantitative analyses, primarily of demographic and social history data - censuses, election returns, city directories, and other tabular or countable data. - with the aim of producing defensible research findings These early computers could be programmed to conduct statistical analyses of these records, creating tallies, or seeking trends across records. This research into historical demography was rooted in the rise of social history as a field of historical interest. The historians involved in this work sought to quantify past societies, to come to new conclusions about communities and population. Computers proved capable tools for that type of work. By the late 1970s younger historians turned to cultural studies, most of these studies involved online databases that were checked by Professionals in Great Britain about once a year. The outpouring of quantitative studies by established scholars continued. Since then, quantitative history and cliometrics have been used primarily by historically minded economists and political scientists. In the late 1980s quantifiers founded the Association for History and Computing. This movement provided some of the impetus for the rise of digital history in the 1990s. The more recent roots of digital history were in software rather than online networks. In 1982, the Library of Congress embarked on its Optical Disk Pilot Project, which placed text and images from its collection on to laserdiscs and CD-ROMs. The library started offering online exhibits in 1992 when it launched Selected Civil War Photographs. In 1993, Roy Rosenzweig, along with Steve Brier and Josh Brown, produced their award-winning CD-ROM Who Built America? From the Centennial Exposition of 1876 to the Great War of 1914, designed for Apple, Inc. that integrated images, text, film and sound clips, displayed in a visual interface that supported a text narrative. Among the earliest online digital history projects were The Heritage Project of the University of Kansas, and medieval historian Dr. Lynn Nelson's World History Index and History Central Catalogue. Another was The Valley of the Shadow, conceived in 1991 by current University of Richmond professor of humanities and president emeritus, Edward L. Ayers, who was then at the University of Virginia. The Institute for Advanced Technology in the Humanities (IATH) at the University of Virginia adopted the Valley Project and partnered with IBM to collect and transcribe historical sources into digital files. The project collected data related to Augusta County in Virginia and Franklin County in Pennsylvania during the American Civil War. In 1996, William G. Thomas III joined Ayers on the Valley Project. Together, they produced an online article entitled "The Differences Slavery Made: A Close Analysis of Two American Communities," which also appeared in The American Historical Review in 2003. A CD-ROM also accompanied the Valley Project, published by W. W. Norton and Company in 2000. Rosenzweig, who died October 11, 2007, founded the Center for History and New Media (CHNM) at George Mason University in 1994. Today, CHNM boasts several digital tools available to historians, such as Zotero, Omeka or Tropy. In 1997, Ayers and Thomas used the term "digital history" when they proposed and founded the Virginia Center for Digital History (VCDH) at the University of Virginia, the earliest center devoted exclusively to history. Several other institutions promoting digital history include the Center for Humane Arts, Letters, and Social Sciences Online (MATRIX) at Michigan State University, Maryland's Institute for Technology in the Humanities, and the Center for Digital Research in the Humanities at the University of Nebraska. In 2004, Emory University launched Southern Spaces, a "peer-reviewed Internet journal and scholarly forum" examining the history of the South. == Applications == There are many potential benefits to the use of digital history when combined with traditional historical methods. Some of these applications include: Combining traditional historical methods and new research methods in order to come to new conclusions. Using different tools to extract and analyse larger amounts of data that would not be manageable otherwise. Create models and maps of data extracted to create a visualisation of the data. Data extracted and analysed can be placed alongside existing historiography to increase combined historical knowledge. By adding new research methods to existing historical method, historians can benefit greatly from the ability to work with larger amounts of data and develop new interpretations from this. == Notable Projects == The collaborative nature of most digital history endeavors has meant that the discipline has developed primarily at institutions with the resources to sponsor content research and technical innovation. Two of the first centers, George Mason University's Center for History and New Media and the Virginia Center for Digital History at the University of Virginia have been among the leaders in the development of digital history projects and the education of digital historians. Some of the noteworthy projects emerging from these pioneering centers are The Geography of Slavery, The Texas Slavery Project, and The Countryside Transformed at VCDH and Liberty, Equality, Fraternity: Exploring the French Revolution and The Lost Museum at the CHNM. In each of these projects, mediated archives holding multiple types of sources are combined with digital tools to analyze and illuminate an historical question to a varying degree; this integration of content and tools with analysis is one of the hallmarks of digital history—projects move beyond archives or collections and into scholarly analysis and the use of digital tools to develop that analysis. The differences between the ways projects incorporate these integrations are a measure of the development of the field and point to the ongoing debates over what digital history can and should be. While many of the projects at VCDH, CHNM, and other university's centers have been geared towards academics and post-secondary education, the University of Victoria (British Columbia), in conjunction with the Université de Sherbrooke and the Ontario Institute for Studies in Education at the University of Toronto, has created as series of projects for all ages, "Great Unsolved Mysteries in Canadian History." Laden with instructional aids, this site asks teachers to introduce students to historical research methods to help them develop analytical skills and a sense of the complexities of their national history. Issues of race, religion, and gender are addressed in carefully constructed modules that cover incidents in Canadian history from Viking exploration through the 1920s. One of the original co-creators of the project, John Lutz has also developed Victoria's Victoria with the University of Victoria and Malaspina University-College. In addition to Ayers, Thomas, Lutz, and Rosenzweig, numerous other individual scholars work with digital history techniques and have made and/or continue to make important contributions to the field. Robert Darnton's 2000 article, "An Early Information Society: News and the Media in Eighteenth-Century Paris" was supplemented with electronic resources and is an early model of the discussions around digital history and its future in the humanities. One of the first major digital projects to be reviewed by the American Historical Review (AHR) was Philip Ethington's "Los Angeles and the Problem of Urban Historical Knowledge"—a multimedia exploration of changes to Los Angeles' physical profile over the course of several decades. In this essay, he also expresses his beliefs that historians have major power in