The Czekanowski distance (sometimes shortened as CZD) is a per-pixel quality metric that estimates quality or similarity by measuring differences between pixels. Because it compares vectors with strictly non-negative elements, it is often used to compare colored images, as color values cannot be negative. This different approach has a better correlation with subjective quality assessment than PSNR. == Definition == Androutsos et al. give the Czekanowski coefficient as follows: d z ( i , j ) = 1 − 2 ∑ k = 1 p min ( x i k , x j k ) ∑ k = 1 p ( x i k + x j k ) {\displaystyle d_{z}(i,j)=1-{\frac {2\sum _{k=1}^{p}{\text{min}}(x_{ik},\ x_{jk})}{\sum _{k=1}^{p}(x_{ik}+x_{jk})}}} Where a pixel x i {\displaystyle x_{i}} is being compared to a pixel x j {\displaystyle x_{j}} on the k-th band of color – usually one for each of red, green and blue. For a pixel matrix of size M × N {\displaystyle M\times N} , the Czekanowski coefficient can be used in an arithmetic mean spanning all pixels to calculate the Czekanowski distance as follows: 1 M N ∑ i = 0 M − 1 ∑ j = 0 N − 1 ( 1 − 2 ∑ k = 1 3 min ( A k ( i , j ) , B k ( i , j ) ) ∑ k = 1 3 ( A k ( i , j ) + B k ( i , j ) ) ) {\displaystyle {\frac {1}{MN}}\sum _{i=0}^{M-1}\sum _{j=0}^{N-1}{\begin{pmatrix}1-{\frac {2\sum _{k=1}^{3}{\text{min}}(A_{k}(i,j),\ B_{k}(i,j))}{\sum _{k=1}^{3}(A_{k}(i,j)+B_{k}(i,j))}}\end{pmatrix}}} Where A k ( i , j ) {\displaystyle A_{k}(i,j)} is the (i, j)-th pixel of the k-th band of a color image and, similarly, B k ( i , j ) {\displaystyle B_{k}(i,j)} is the pixel that it is being compared to. == Uses == In the context of image forensics – for example, detecting if an image has been manipulated –, Rocha et al. report the Czekanowski distance is a popular choice for Color Filter Array (CFA) identification.
GPT-5
GPT-5 is a multimodal large language model developed by OpenAI and the fifth in its series of generative pre-trained transformer (GPT) foundation models. Preceded in the series by GPT-4, it was launched on August 7, 2025. It is publicly accessible to users of the chatbot products ChatGPT and Microsoft Copilot as well as to developers through the OpenAI API. == Background == On April 14, 2023, Sam Altman, the chief executive officer of OpenAI, spoke at an event at the Massachusetts Institute of Technology and said that the company was not training GPT-5 at that time. He stated that OpenAI was "prioritizing GPT-4 development" and that "we are not and won't for some time" release GPT-5. On July 18, OpenAI filed for a "GPT-5" trademark in the United States. On November 13, Altman confirmed to the Financial Times that the company was working to develop GPT-5. According to The Information, "[f]or much of the second half of 2024, OpenAI was developing a model known internally as Orion and intended to become GPT-5", "[b]ut the Orion effort failed to produce a better model, and the company instead released it as GPT-4.5 in February [2025]." By late July 2025, OpenAI was widely anticipated as planning to release GPT-5 in early August. On July 30, The Verge reported that "Microsoft is getting ready for GPT-5" as "sources familiar with Microsoft's AI plans" told an editor that the company was testing a new mode for its Copilot chatbot that would offer a model that "thinks deeply or quickly based on the task". On August 5, in the leadup to the release of GPT-5, OpenAI released GPT-OSS, a set of two open-weight models that have reasoning capabilities. GPT-5 was then unveiled during a livestream event on August 7. == Capabilities == At the time of its release, GPT-5 had state-of-the-art performance on benchmarks that test mathematics, programming, finance, and multimodal understanding. According to OpenAI, improvements over its predecessor models include faster response times, better coding and writing skills, more accurate answers to health questions, and lower levels of hallucination. Also, compared to previous models, GPT-5 aims to give safe, high-level responses to potentially harmful queries rather than outright declining them, an approach that OpenAI refers to as "safe completions", aiming to result "in GPT-5 being able to refuse more unsafe questions, while offering fewer rejections to users seeking harmless information." In addition, GPT-5 was trained to give more critical, "less effusively agreeable" answers compared to its predecessor models. Days before the launch of GPT-5, two early testers of the model stated that they were "impressed" by its ability to code and to solve mathematical and scientific problems. They suggested that the model shows great improvement from GPT-4, but not as large of a gain as from GPT-3 to GPT-4. A day prior to the release of GPT-5, during a press briefing, Sam Altman, the chief executive officer of OpenAI, called GPT-5 "a significant step along the path to AGI", referring to artificial general intelligence, the hypothetical level of intelligence that OpenAI defines as the ability to perform any economically valuable task that a human can. According to Altman, GPT-5 is "significantly better" than its predecessors, offering "PhD-level" abilities across a wide range of tasks. The exact energy consumption of GPT-5 use has not been disclosed by OpenAI. Researchers at the University of Rhode Island estimated that a medium-length response consumes slightly over 18 watt-hours, equivalent to using an incandescent bulb for 18 minutes. === Architecture === GPT-5 is a system that contains a fast, high-throughput model, a deeper reasoning model, and a real-time router that decides which model to use based on conversation type, complexity, tool needs, and explicit user intent. Altman had previously criticized the manual model picker for being overly complex, suggesting a need for unification. GPT-5 also includes agentic functionality through which it can set up its own desktop and can use its browser to search autonomously for sources that relate to its task. The GPT-5 system card defines two fast, high-throughput models – gpt-5-main and gpt-5-main-mini – and two thinking models – gpt-5-thinking and gpt-5-thinking-mini. In the OpenAI API, developers can access the thinking model, its mini version, and gpt-5-thinking-nano, an even smaller and faster nano version of the thinking model. The version of GPT-5 that is accessible via the API has adjustable reasoning effort (low, medium, high, or minimal) and verbosity (low, medium, or high). Additionally, ChatGPT provides access to gpt-5-thinking with a setting that makes use of parallel test-time compute, referred to as gpt-5-thinking-pro. == Limitations == === Safety === Neuraltrust, a security research company, claimed to have successfully compromised GPT-5 within its first day of testing the model. According to its report, it enabled GPT-5 to generate detailed instructions for manufacturing explosive devices. SPLX, another company, conducted similar tests and came to similar conclusions about GPT-5's security. Their assessments suggest that GPT-5 has significant security gaps, potentially rendering it as being unsafe for use in a corporate environment. == Training == According to AIMultiple, GPT-5 is natively multimodal, meaning that it was trained from scratch on multiple modalities (like text and images) at once without relying on already-trained language or vision models. Its training process involved three stages: unsupervised pretraining, supervised fine-tuning, and reinforcement learning from human feedback. Pretraining used a large-scale multilingual dataset of books, articles, web pages, academic papers, and licensed sources. GPT-5's visual and text capabilities were described as having been developed alongside each other throughout training, unlike with GPT-4. == Use == GPT-5 is used in ChatGPT. Although GPT-5 is free for all ChatGPT users, Plus users get higher use limits while Pro users get unlimited access to GPT-5 as well as limited access to GPT-5 Pro. Standard limits for lower-tier users on responses per hour still apply. Additionally, with the introduction of GPT-5, ChatGPT's "Advanced Voice Mode" was replaced by "ChatGPT Voice", which is supposed to enable more natural-sounding conversations. OpenAI stated that "Standard Voice Mode retires on September 9, 2025, unifying all users on ChatGPT Voice". On November 24, 2025, the feature of shopping research was added to ChatGPT, claimed to be a mini model post-trained on gpt-5-thinking-mini. GPT-5 is also available in Microsoft Copilot, and Microsoft stated that it will incorporate GPT-5 into a wide variety of its products. According to 9to5Mac, Apple Inc. is planning to integrate the model into the Apple Intelligence feature in its iOS 26, iPadOS 26, and macOS Tahoe operating systems. It is also accessible via the OpenAI API. A number of American companies were reported as having received access to GPT-5 ahead of its launch. OpenAI stated that the private health insurance company Oscar Health was checking applications from its policyholders with the model. In addition, Uber was using GPT-5 for its customer support system; GitLab, Windsurf, and Cursor were using the model for software development; and the Spanish bank BBVA was using it for financial analysis. Other companies that OpenAI listed as having used GPT-5 pre-release include Amgen, Lowe's, and Notion. == Reception == === Critical reviews === Grace Huckins in MIT Technology Review found that, "[w]hereas o1 was a major technological advancement, GPT-5 is, above all else, a refined product." In response to claims that Sam Altman, the chief executive officer of OpenAI, had made about the model, she stated that "GPT-5 will furnish a more pleasant and seamless user experience. That's not nothing, but it falls far short of the transformative AI future that Altman has spent much of the past year hyping." In response to Altman's claim that GPT-5 is "a significant step along the path" to artificial general intelligence, she noted: "[M]aybe he's right—but if so, it's a very small step." In The Information, Stephanie Palazzolo praised GPT-5's coding capabilities. According to Matteo Wong in The Atlantic, GPT-5 "is intuitive, fast, and efficient; adapts to human preferences and intentions; and is easy to personalize." He stated: "At this stage of the AI boom, when every major chatbot is legitimately helpful in numerous ways, benchmarks, science, and rigor feel almost insignificant. What matters is how the chatbot feels [...]". John Herrman from the New York magazine wrote: "Casual users who encounter GPT-5 through ChatGPT aren't likely to feel like they're using a completely different product [...] while people who use it for software development or in a corporate context are more likely to notice a major change." Mashable's Christian de Looper found that "GPT-5
Discovery system (artificial intelligence)
A discovery system is an artificial intelligence system that attempts to discover new scientific concepts or laws. The aim of discovery systems is to automate scientific data analysis and the scientific discovery process. Ideally, an artificial intelligence system should be able to search systematically through the space of all possible hypotheses and yield the hypothesis - or set of equally likely hypotheses - that best describes the complex patterns in data. During the era known as the second AI summer (approximately 1978–1987), various systems akin to the era's dominant expert systems were developed to tackle the problem of extracting scientific hypotheses from data, with or without interacting with a human scientist. These systems included Autoclass, Automated Mathematician, Eurisko, which aimed at general-purpose hypothesis discovery, and more specific systems such as Dalton, which uncovers molecular properties from data. The dream of building systems that discover scientific hypotheses was pushed to the background with the second AI winter and the subsequent resurgence of subsymbolic methods such as neural networks. Subsymbolic methods emphasize prediction over explanation, and yield models which works well but are difficult or impossible to explain which has earned them the name black box AI. A black-box model cannot be considered a scientific hypothesis, and this development has even led some researchers to suggest that the traditional aim of science - to uncover hypotheses and theories about the structure of reality - is obsolete. Other researchers disagree and argue that subsymbolic methods are useful in many cases, just not for generating scientific theories. == Discovery systems from the 1970s and 1980s == Autoclass was a Bayesian Classification System written in 1986 Automated Mathematician was one of the earliest successful discovery systems. It was written in 1977 and worked by generating a modifying small Lisp programs Eurisko was a Sequel to Automated Mathematician written in 1984 Dalton is a still maintained program capable of calculating various molecular properties initially launched in 1983 and available in open source since 2017 Glauber is a scientific discovery method written in the context of computational philosophy of science launched in 1983 == Modern discovery systems (2009–present) == After a couple of decades with little interest in discovery systems, the interest in using AI to uncover natural laws and scientific explanations was renewed by the work of Michael Schmidt, then a PhD student in Computational Biology at Cornell University. Schmidt and his advisor, Hod Lipson, invented Eureqa, which they described as a symbolic regression approach to "distilling free-form natural laws from experimental data". This work effectively demonstrated that symbolic regression was a promising way forward for AI-driven scientific discovery. Since 2009, symbolic regression has matured further, and today, various commercial and open source systems are actively used in scientific research. Notable examples include Eureqa, now a part of DataRobot AI Cloud Platform, AI Feynman, and QLattice.
Dataset shift
Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e., P t r a i n ( X , Y ) ≠ P t e s t ( X , Y ) {\displaystyle P_{train}(X,Y)\neq P_{test}(X,Y)} ). This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability. Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant . Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same. Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time. A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.
Sequence labeling
In machine learning, sequence labeling is a type of pattern recognition task that involves the algorithmic assignment of a categorical label to each member of a sequence of observed values. A common example of a sequence labeling task is part of speech tagging, which seeks to assign a part of speech to each word in an input sentence or document. Sequence labeling can be treated as a set of independent classification tasks, one per member of the sequence. However, accuracy is generally improved by making the optimal label for a given element dependent on the choices of nearby elements, using special algorithms to choose the globally best set of labels for the entire sequence at once. As an example of why finding the globally best label sequence might produce better results than labeling one item at a time, consider the part-of-speech tagging task just described. Frequently, many words are members of multiple parts of speech, and the correct label of such a word can often be deduced from the correct label of the word to the immediate left or right. For example, the word "sets" can be either a noun or verb. In a phrase like "he sets the books down", the word "he" is unambiguously a pronoun, and "the" unambiguously a determiner, and using either of these labels, "sets" can be deduced to be a verb, since nouns very rarely follow pronouns and are less likely to precede determiners than verbs are. But in other cases, only one of the adjacent words is similarly helpful. In "he sets and then knocks over the table", only the word "he" to the left is helpful (cf. "...picks up the sets and then knocks over..."). Conversely, in "... and also sets the table" only the word "the" to the right is helpful (cf. "... and also sets of books were ..."). An algorithm that proceeds from left to right, labeling one word at a time, can only use the tags of left-adjacent words and might fail in the second example above; vice versa for an algorithm that proceeds from right to left. Most sequence labeling algorithms are probabilistic in nature, relying on statistical inference to find the best sequence. The most common statistical models in use for sequence labeling make a Markov assumption, i.e. that the choice of label for a particular word is directly dependent only on the immediately adjacent labels; hence the set of labels forms a Markov chain. This leads naturally to the hidden Markov model (HMM), one of the most common statistical models used for sequence labeling. Other common models in use are the maximum entropy Markov model and conditional random field.
Croissant (metadata format)
Croissant is a metadata format design to support sharing of datasets for machine learning applications. It is a platform-agnostic schema used to standardize metadata in data repositories like Hugging Face, kaggle, Dataverse and OpenML. == Structure == Croissant builds upon schema.org, uses primarily JSON-LD, and divides metadata in four "layers": Dataset Metadata, Resource, Structure and Semantic: The Dataset Metadata layer constrains which schema.org properties should be used, including additional properties, linking together the resources (files) of the dataset with general metadata, like licensing and citation information. The Resource layer describes the individual files and sets of those using two new classes, FileObject and FileSet. A FileSet may be a collection of related images. The Structure layer specifies how the files are organized in the dataset. A RecordSet class describes how resources are present, configurations that may very a lot between modality. This specification facilitates interoperability of the datasets. Finally, the Semantic layer adds information for practical reuse of the dataset, such as splits for train, test and validation subsets. It also provides a default extension for metadata related to responsible AI. The use of a standard machine-readable structure increases, for example, the discoverability of datasets in search engines such as Google Dataset Search. == History == Croissant was shared in arXiv in March 2024 and published in the proceedings of NeurIPS 2024. It started as community driven as a MLCommons Croissant Working Group, including stakeholders organizations from academia and industry, including Google, the open data institute, Sage Bionetworks and King's College London. Variations of Croissant are developed to support datasets in different areas of research, such as Geo-Croissant for geospatial datasets. Other technical extensions, such as support for RDF, soon followed.
AI browser
An AI browser is a web browser with integrated artificial intelligence capabilities, such as automatically summarizing web page content or answering questions about it. A more specialized type is an agentic browser, based on the concept of agentic AI, which can take actions – such as navigating webpages or filling out forms – on behalf of the user. Several agentic browsers emerged in 2025, including ChatGPT Atlas (macOS only), Comet, and Dia. As of 2025, this is a recent development in the browser market, including new entrants from OpenAI, Opera and Perplexity. The designation of 'AI browser' also includes established browsers that later added non-agentic AI features, such as Microsoft Edge with the Copilot chatbot, Google Chrome with the Gemini chatbot (for Windows desktop users in the US with their language set to English), and Firefox with multiple chatbot providers (such as ChatGPT, Claude, Copilot, Gemini, and Le Chat). AI browsers have been noted to be susceptible to prompt injection attacks. == Browser extensions and integrations == Rather than creating entirely new browsers, some AI browsing solutions integrate with existing browsers through extensions or companion applications. These tools add agentic capabilities to established browsers without requiring users to switch platforms. Examples include Composite, which functions as a cross-browser agent that works with Chrome, Edge, and other browsers to automate web-based tasks for workers. == Cloud-based implementations == Cloud-based implementations of AI browsers allow users to run automated browsing agents without local installation. These systems operate on remote servers using frameworks such as Puppeteer or Playwright. Examples include Browserbase, Browser-use and AI Browser. The AI typically parses the Document Object Model (DOM) to locate and interact with page elements, and may also analyze browser screenshots to interpret layout and structure. == Criticisms and dangers == AI browsers have been noted to be susceptible to being vulnerable to prompt injection attacks, in which the content of websites can be used to hijack the control of the browser. Multiple organisations have argued against using AI browsers due to this vulnerability. The United Kingdom national cyber security centre and Gartner consider them to be too risky for adoption by most organisations. A study by the CISPA Helmholtz Center and Saarland University concluded that this vulnerability makes them easy targets for malware, fraud, automated defamation, disinformation and biased outputs.