AI Data House (smc-pvt) Ltd

AI Data House (smc-pvt) Ltd — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Hallucination (artificial intelligence)

    Hallucination (artificial intelligence)

    In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting, confabulation, or delusion) is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where a hallucination typically involves false percepts. For example, a chatbot powered by large language models (LLMs), like ChatGPT, may embed plausible-sounding random falsehoods within its generated content. Detecting and mitigating errors and hallucinations pose significant challenges for practical deployment and reliability of LLMs in high-stakes scenarios, such as chip design, supply chain logistics, and medical diagnostics. Some software engineers and statisticians have criticized the specific term "AI hallucination" for unreasonably anthropomorphizing computers. Symbolic artificial intelligence models generally do not produce hallucinations, unlike large language models. == Term == === Origin === Since the 1980s, the term "hallucination" has been used in computer vision with a positive connotation to describe the process of adding detail to an image. For example, the task of generating high-resolution face images from low-resolution inputs is called face hallucination. The first documented use of the term "hallucination" in this sense is in the PhD thesis of Eric Mjolsness in 1986. A notable work is the face hallucination algorithm by Simon Baker and Takeo Kanade published in 1999. In the 2000s, hallucinations were described in statistical machine translation as a failure mode. Since the 2010s, the term has undergone a semantic shift to signify the generation of factually incorrect or misleading outputs by AI systems in tasks like machine translation and object detection. In 2015, hallucinations were identified in visual semantic role labeling tasks by Saurabh Gupta and Jitendra Malik. In 2015, computer scientist Andrej Karpathy used the term "hallucinated" in a blog post to describe his recurrent neural network (RNN) language model generating an incorrect citation link. In 2017, Google researchers used the term to describe the responses generated by neural machine translation (NMT) models when they are not related to the source text, and in 2018, the term was used in computer vision to describe instances where non-existent objects are erroneously detected because of adversarial attacks. In July 2021, Meta warned during its release of BlenderBot 2 that the system is prone to "hallucinations", which Meta defined as "confident statements that are not true". Following OpenAI's ChatGPT release in beta version in November 2022, some users complained that such chatbots often seem to pointlessly embed plausible-sounding random falsehoods within their generated content. Many news outlets, including The New York Times, started to use the term "hallucinations" to describe these models' frequently incorrect or inconsistent responses. In 2023, the Cambridge dictionary updated its definition of hallucination to include this new sense specific to the field of AI. Some researchers have highlighted a lack of consistency in how the term is used, but also identified several alternative terms in the literature, such as confabulations, fabrications, and factual errors. === Definitions and alternatives === Uses, definitions and characterizations of the term "hallucination" in the context of LLMs include: "a tendency to invent facts in moments of uncertainty" (OpenAI, May 2023) "a model's logical mistakes" (OpenAI, May 2023) "fabricating information entirely, but behaving as if spouting facts" (CNBC, May 2023) "making up information" (The Verge, February 2023) "probability distributions" (in scientific contexts) Journalist Benj Edwards, in Ars Technica, writes that the term "hallucination" is controversial, but that some form of metaphor remains necessary; Edwards suggests "confabulation" as an analogy for processes that involve "creative gap-filling". In July 2024, a White House report on fostering public trust in AI research mentioned hallucinations only in the context of reducing them. Notably, when acknowledging David Baker's Nobel Prize-winning work with AI-generated proteins, the Nobel committee avoided the term entirely, instead referring to "imaginative protein creation". Hicks, Humphries, and Slater, in their article in Ethics and Information Technology, argue that the output of LLMs is "bullshit" under Harry Frankfurt's definition of the term, and that the models are "in an important way indifferent to the truth of their outputs", with true statements only accidentally true, and false ones accidentally false. Some researchers also use the derogatory term "botshit", often referring to uncritical use of AI. === Criticism === In the scientific community, some researchers avoid the term "hallucination", seeing it as potentially misleading. It has been criticized by Usama Fayyad, executive director of the Institute for Experimental Artificial Intelligence at Northeastern University, on the grounds that it misleadingly personifies large language models and is vague. Mary Shaw said, "The current fashion for calling generative AI's errors 'hallucinations' is appalling. It anthropomorphizes the software, and it spins actual errors as somehow being idiosyncratic quirks of the system even when they're objectively incorrect." In Salon, statistician Gary Smith argues that LLMs "do not understand what words mean" and consequently that the term "hallucination" unreasonably anthropomorphizes the machine. Murray Shanahan argues that anthropomorphic framing of LLM capabilities, including terms like "hallucination", encourages users and researchers to attribute cognitive processes to systems that operate through statistical pattern completion, and advocates for more careful linguistic practices when discussing LLM behavior. Kristina Šekrst argues that applying psychological vocabulary to LLM outputs obscures the difference between the appearance of mental properties and their genuine presence. Förster & Skop assert that tech companies use the hallucination metaphor to anthropomorphize models and deflect responsibility for non-factual outputs. Some see the AI outputs not as illusory but as prospective—that is, having some chance of being true, similar to early-stage scientific conjectures. The term has also been criticized for its association with psychedelic drug experiences. == In natural language generation == In natural language generation, there are several reasons why natural language models hallucinate: === Hallucination from data === Hallucinations can stem from incomplete, inaccurate or unrepresentative data sets. === Modeling-related causes === The pre-training of generative pretrained transformers (GPT) involves predicting the next word. It incentivizes GPT models to "give a guess" about what the next word is, even when they lack information. Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension between novelty and usefulness. For instance, Amabile and Pratt define human creativity as the production of novel and useful ideas. By extension, a focus on novelty in machine creativity can lead to the production of original but inaccurate responses—that is, falsehoods—whereas a focus on usefulness may result in memorized content lacking originality. By 2022, newspapers such as The New York Times expressed concern that, as the adoption of bots based on large language models continued to grow, unwarranted user confidence in bot output could lead to problems. === Interpretability research === In 2025, interpretability research by Anthropic on the LLM Claude identified internal circuits that cause it to decline to answer questions unless it knows the answer. By default, the circuit is active and the LLM doesn't answer. When the LLM has sufficient information, these circuits are inhibited and the LLM answers the question. Hallucinations were found to occur when this inhibition happens incorrectly, such as when Claude recognizes a name but lacks sufficient information about that person, causing it to generate plausible but untrue responses. === Examples === On 15 November 2022, researchers from Meta AI published Galactica, designed to "store, combine and reason about scientific knowledge". Content generated by Galactica came with the warning: "Outputs may be unreliable! Language Models are prone to hallucinate text." In one case, when asked to draft a paper on creating avatars, Galactica cited a fictitious paper from a real author who works in the relevant area. Meta withdrew Galactica on 17 November due to offensiveness and inaccuracy. OpenAI's ChatGPT, released in beta version to the public on November 30, 2022, was based on the foundation model GPT-3.5 (a revision of GPT-3). Professor Ethan Mollick of Wharton called it an "omniscient, eager-to-please intern who sometimes lies to you". Data scientist Teresa Kuba

    Read more →
  • Information space analysis

    Information space analysis

    Within the field of information science, information space analysis is a deterministic method, enhanced by machine intelligence, for locating and assessing resources for team-centric efforts. Organizations need to be able to quickly assemble teams backed by the support services, information, and material to do the job. To do so, these teams need to find and assess sources of services that are potential participants in the team effort. To support this initial team and resource development, information needs to be developed via analysis tools that help make sense of sets of data sources in an Intranet or Internet. Part of the process is to characterize them, partition them, and sort and filter them. These tools focus on three key issues in forming a collaborative team: Help individuals responsible for forming the team understand what is available. Assist team members in identifying the structure and categorize the information available to them in a manner specifically suited to the task at hand. Aid team members to understand the mappings of their information between their organization and that used by others who might participate. Information space analysis tools combine multiple methods to assist in this task. This causes the tools to be particularly well-suited to integrating additional technologies in order to create specialized systems.

    Read more →
  • Label noise

    Label noise

    Label noise refers to errors or inaccuracies in the class labels of data instances. This is a widespread issue in machine learning datasets, arising from human annotator mistakes, unclear labeling instructions, automated labeling methods, or adversarial attacks in supervised learning. Label noise can be roughly divided into random noise, where labels are flipped independently of input features, and systematic noise, where mislabeling is dependent on certain patterns or biases in the data. Label noise can be damaging to model performance, especially for complex models that may overfit to noisy labels rather than generalizable patterns. Many approaches have been proposed to deal with the effects of label noise, including robust loss functions, noise-tolerant algorithms, data cleaning methods, and semi-supervised learning approaches. To reduce the impact of wrong labels during training, techniques like label smoothing, sample reweighting and using trusted validation sets are used. The role of noise-robust training paradigms and curriculum learning strategies to improve resilience against mislabeled data is also explored in recent research.

    Read more →
  • Artificial brain

    Artificial brain

    An artificial brain (or artificial mind) is software and hardware with cognitive abilities similar to those of the animal or human brain. Research investigating "artificial brains" and brain emulation plays three important roles in science: An ongoing attempt by neuroscientists to understand how the human brain works, known as cognitive neuroscience. A thought experiment in the philosophy of artificial intelligence, demonstrating that it is possible, at least in theory, to create a machine that has all the capabilities of a human being. A long-term project to create machines exhibiting behavior comparable to those of animals with complex central nervous system such as mammals and most particularly humans. The ultimate goal of creating a machine exhibiting human-like behavior or intelligence is sometimes called strong AI. An example of the first objective is the project reported by Aston University in Birmingham, England where researchers are using biological cells to create "neurospheres" (small clusters of neurons) in order to develop new treatments for diseases including Alzheimer's, motor neurone and Parkinson's disease. The second objective is a reply to arguments such as John Searle's Chinese room argument, Hubert Dreyfus's critique of AI or Roger Penrose's argument in The Emperor's New Mind. These critics argued that there are aspects of human consciousness or expertise that can not be simulated by machines. One reply to their arguments is that the biological processes inside the brain can be simulated to any degree of accuracy. This reply was made as early as 1950, by Alan Turing in his classic paper "Computing Machinery and Intelligence". The third objective is generally called artificial general intelligence by researchers. However, Ray Kurzweil prefers the term "strong AI". In his book The Singularity is Near, he focuses on whole brain emulation using conventional computing machines as an approach to implementing artificial brains, and claims (on grounds of computer power continuing an exponential growth trend) that this could be done by 2025. Henry Markram, director of the Blue Brain project (which is attempting brain emulation), made a similar claim (2020) at the Oxford TED conference in 2009. == Approaches to brain simulation == W. Ross Ashby's pioneering work in cybernetics provided an early mathematical framework for understanding adaptive brain-like systems. In his 1952 book Design for a Brain, Ashby proposed that the brain could be modeled as an ultrastable system that maintains equilibrium through continuous adaptation to environmental perturbations. His approach used differential equations and state-space models to describe how neural systems could exhibit purposeful behavior through feedback mechanisms. Ashby's homeostat, a physical machine built in 1948, demonstrated these principles through an electromechanical device with four interconnected units that automatically adjusted their parameters to maintain stability when disturbed. The homeostat represented one of the first attempts to build an artificial system exhibiting brain-like adaptive behavior, influencing subsequent work in adaptive systems, neural networks, and artificial intelligence. Although direct human brain emulation using artificial neural networks on a high-performance computing engine is a commonly discussed approach, there are other approaches. An alternative artificial brain implementation could be based on Holographic Neural Technology (HNeT) non linear phase coherence/decoherence principles. The analogy has been made to quantum processes through the core synaptic algorithm which has strong similarities to the quantum mechanical wave equation. EvBrain is a form of evolutionary software that can evolve "brainlike" neural networks, such as the network immediately behind the retina. In November 2008, IBM received a US$4.9 million grant from the Pentagon for research into creating intelligent computers. The Blue Brain project is being conducted with the assistance of IBM in Lausanne. The project is based on the premise that it is possible to artificially link the neurons "in the computer" by placing thirty million synapses in their proper three-dimensional position. Some proponents of strong AI speculated in 2009 that computers in connection with Blue Brain and Soul Catcher may exceed human intellectual capacity by around 2015, and that it is likely that we will be able to download the human brain at some time around 2050. While Blue Brain is able to represent complex neural connections on the large scale, the project does not achieve the link between brain activity and behaviors executed by the brain. In 2012, project Spaun (Semantic Pointer Architecture Unified Network) attempted to model multiple parts of the human brain through large-scale representations of neural connections that generate complex behaviors in addition to mapping. Spaun's design recreates elements of human brain anatomy. The model, consisting of approximately 2.5 million neurons, includes features of the visual and motor cortices, GABAergic and dopaminergic connections, the ventral tegmental area (VTA), substantia nigra, and others. The design allows for several functions in response to eight tasks, using visual inputs of typed or handwritten characters and outputs carried out by a mechanical arm. Spaun's functions include copying a drawing, recognizing images, and counting. There are good reasons to believe that, regardless of implementation strategy, the predictions of realising artificial brains in the near future are optimistic. In particular brains (including the human brain) and cognition are not currently well understood, and the scale of computation required is unknown. Another near term limitation is that all current approaches for brain simulation require orders of magnitude larger power consumption compared with a human brain. The human brain consumes about 20 W of power, whereas current supercomputers may use as much as 1 MW—i.e., an order of 100,000 more. == Artificial brain thought experiment == Some critics of brain simulation believe that it is simpler to create general intelligent action directly without imitating nature. Some commentators have used the analogy that early attempts to construct flying machines modeled them after birds, but that modern aircraft do not look like birds.

    Read more →
  • Azure Maps

    Azure Maps

    Azure Maps is a suite of cloud-based, location-based services provided by Microsoft as part of the company's Azure platform. The platform provides geospatial and location-based services via REST APIs and software development kits (SDKs). The service is typically used to integrate maps or geospatial data into applications. Azure Maps differs from Microsoft's other enterprise mapping service, Bing Maps, in its pricing model, focus on privacy, and its level of integration into the broader Azure cloud ecosystem. == History == Azure Maps was first introduced in public preview mode under the name "Azure Location Based Services" in 2017, primarily as an enterprise solution. The services was intended to add mapping and location-based functionality onto the existing Azure cloud services suite, seen as a critical part of Microsoft's broader Internet-of-Things (IoT) strategy. The preview version included APIs which could be used to develop location aware apps for use cases such as logistics and mobility. In 2018, the software was renamed "Azure Maps," and became generally available to the public, and a number of new functions were added, including route calculation, travel time calculation, and incorporation of real-time traffic data and incident information. Azure Maps was integrated with Azure IoT Central in 2018, which added tracking, monitoring, and geofencing capabilities. A set of mobility APIs on were added in 2019, with applications such as use in public transport apps and shared bicycle fleet management. “Azure Maps Creator,” which converts private facility floor plans into indoor map data, was also introduced in 2019. Some commentators linked these services to Microsoft's broader development of augmented reality products. In 2020, Azure Maps Visual for Power BI was released, integrating location-based features and mapping capabilities into Microsoft's business intelligence software. An elevation API (which was later retired), geolocation services, and an iOS and Android software development kit were introduced in 2021. In 2022, support for historical weather, air quality, and tropical storm data was made generally available and custom styling for indoor maps was also introduced. In 2023, Azure Maps was certified as HIPAA compliant in a move to target healthcare and health insurance companies. == Functionality == === Geocoding === Geocoding is one of the core functionalities of Azure Maps, converting addresses or place names into geographic coordinates. Batch geocoding is used to process large amounts of address data, a function used for route optimization and spatial analysis. === Reverse geocoding === Reverse geocoding derives human-readable information from geographic coordinates like longitude and latitude, used in navigation and by geographic information systems. === Routing === Azure Maps uses map data and routing algorithms to calculate the shortest or fastest routes between locations based on factors like vehicle size and type, traffic conditions, and distance. Routing also supports multi-modal routing, which include multiple modes of transport in a single trip, including cycling, walking, and ferries. This functionality is used for location-based searches and route optimization in applications like fleet management, proximity marketing, and emergency services as well as logistics and delivery, urban planning, ride sharing apps, and outdoor activities. === Map visualization === The platform supports map visualizations that can be modified to reflect real-time data (including from IoT sensors) as well as historical data patterns. Visualizations include heat maps, street maps, satellite imagery and other custom data layers. Maps are rendered using raster or vector tiles which reduce the load of displaying large data sets or complex maps. This can be used in various applications in areas like transportation, smart cities, retail and marketing, public health, and environmental monitoring. For example, it can be used for tracking the spread of diseases or measuring the impact of changing climatic patterns. === Geofencing and spatial analytics === Azure Maps supports polygonal geofencing, which enables the definition of custom geographic boundaries. Geofenced areas can be monitored in real-time for events of interest. For example, an application could send an alert when equipment or persons enter or leave a defined area. Tools for analyzing historical geofencing data are also available via the APIs for optimization purposes. == Industry usage == Azure Maps' geofencing function has seen usage in the construction industry, designating hazardous areas for safety purposes and sending alerts if anyone enters the area. Private facility maps are used by construction companies for monitoring large construction sites to increase productivity and prevent accidents or damage. In emergency management, New Zealand based company Beca has used Azure Maps to provide analysis on the impact of earthquakes to users, including information on the severity and location of an earthquake and the impact on affected properties. Alaska's Department of Transportation uses Azure Maps as part of an information system providing weather-related warnings and analytics to road crews. Airmap, an airspace management platform for drones, uses Azure Maps. Azure Maps has also been used in conjunction with Azure Monitor for risk monitoring by an insurance company. Other companies that use or have used Azure Maps include BMW, Banco Santander, Jvion, MV Transportation, C.H. Robertson, Wise Skulls, Tata Consultancy Services, Providence Health and Services, Gas Brasiliano Distribuidora S.A., Shell plc, Persistent Systems, Phase 2 Dining and Entertainment, Symbio, HID, Globant, and Insight Enterprises. == Partnerships == Azure Maps and TomTom have been partners since 2016, and TomTom provides location data to Azure Maps and can process data from Azure Maps for mapping purposes. In 2021, Azure Maps partnered with AccuWeather to make climatic data available via its APIs, making weather data along all parts of calculated routes available for mobility and logistics purposes. Microsoft has partnered with Esri, the developer of ArcGIS, and there is cross-compatibility between Azure and ArcGIS so that data from Azure Maps can be integrated into ArcGIS and vice versa. Azure Maps partnered with Moovit in 2019, a startup providing software that interfaces with public transport data. Moovit's database on global public transit networks, including information on which stations and facilities are wheelchair accessible, was linked to Azure Maps. This service was noted for its use increasing accessibility to public transport for the visually impaired by means of voice activated route planning assistance. NORAD has used some Azure Maps functions for their NORAD Tracks Santa website during Christmas holidays. == Components == === REST APIs === Various APIs cover the major functionalities across Azure Maps: Data registry API Geolocation API Render API Route API Search API Spatial API Time zone API Traffic API Weather API === SDKs === Azure Maps SDKs uses MapLibre-style specifications and open source MapLibre GL-based libraries as a rendering engine. The Web SDK is used for developing web apps with maps and location-based data and functionality. It includes a map control module as well as modules with drawing tools. It also supports Azure Maps Creator and various spatial data formats. The platform also includes a set of REST SDKs for developers integrating Azure Maps REST APIs into Python, C#, Java or JavaScript applications. Azure Maps also includes Android and iOS SDKs used for developing applications for Android and Apple devices. === Azure Maps Creator === Azure Maps Creator is a tool for generating custom maps for locations like large office complexes, construction sites, or university campuses. These maps can then be integrated into applications and used with other Azure Maps functions for purposes such as wayfinding and maintenance and security in building automation contexts. === Azure Maps Visual for Power BI === Azure Maps is integrated with Microsoft Power BI, a graphical tool for producing data visualizations. Since July 2020, Power BI can be used in conjunction with Azure Maps for developing map-based data visualizations. This functionality entered general availability in May 2023.

    Read more →
  • Hallucination (artificial intelligence)

    Hallucination (artificial intelligence)

    In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting, confabulation, or delusion) is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where a hallucination typically involves false percepts. For example, a chatbot powered by large language models (LLMs), like ChatGPT, may embed plausible-sounding random falsehoods within its generated content. Detecting and mitigating errors and hallucinations pose significant challenges for practical deployment and reliability of LLMs in high-stakes scenarios, such as chip design, supply chain logistics, and medical diagnostics. Some software engineers and statisticians have criticized the specific term "AI hallucination" for unreasonably anthropomorphizing computers. Symbolic artificial intelligence models generally do not produce hallucinations, unlike large language models. == Term == === Origin === Since the 1980s, the term "hallucination" has been used in computer vision with a positive connotation to describe the process of adding detail to an image. For example, the task of generating high-resolution face images from low-resolution inputs is called face hallucination. The first documented use of the term "hallucination" in this sense is in the PhD thesis of Eric Mjolsness in 1986. A notable work is the face hallucination algorithm by Simon Baker and Takeo Kanade published in 1999. In the 2000s, hallucinations were described in statistical machine translation as a failure mode. Since the 2010s, the term has undergone a semantic shift to signify the generation of factually incorrect or misleading outputs by AI systems in tasks like machine translation and object detection. In 2015, hallucinations were identified in visual semantic role labeling tasks by Saurabh Gupta and Jitendra Malik. In 2015, computer scientist Andrej Karpathy used the term "hallucinated" in a blog post to describe his recurrent neural network (RNN) language model generating an incorrect citation link. In 2017, Google researchers used the term to describe the responses generated by neural machine translation (NMT) models when they are not related to the source text, and in 2018, the term was used in computer vision to describe instances where non-existent objects are erroneously detected because of adversarial attacks. In July 2021, Meta warned during its release of BlenderBot 2 that the system is prone to "hallucinations", which Meta defined as "confident statements that are not true". Following OpenAI's ChatGPT release in beta version in November 2022, some users complained that such chatbots often seem to pointlessly embed plausible-sounding random falsehoods within their generated content. Many news outlets, including The New York Times, started to use the term "hallucinations" to describe these models' frequently incorrect or inconsistent responses. In 2023, the Cambridge dictionary updated its definition of hallucination to include this new sense specific to the field of AI. Some researchers have highlighted a lack of consistency in how the term is used, but also identified several alternative terms in the literature, such as confabulations, fabrications, and factual errors. === Definitions and alternatives === Uses, definitions and characterizations of the term "hallucination" in the context of LLMs include: "a tendency to invent facts in moments of uncertainty" (OpenAI, May 2023) "a model's logical mistakes" (OpenAI, May 2023) "fabricating information entirely, but behaving as if spouting facts" (CNBC, May 2023) "making up information" (The Verge, February 2023) "probability distributions" (in scientific contexts) Journalist Benj Edwards, in Ars Technica, writes that the term "hallucination" is controversial, but that some form of metaphor remains necessary; Edwards suggests "confabulation" as an analogy for processes that involve "creative gap-filling". In July 2024, a White House report on fostering public trust in AI research mentioned hallucinations only in the context of reducing them. Notably, when acknowledging David Baker's Nobel Prize-winning work with AI-generated proteins, the Nobel committee avoided the term entirely, instead referring to "imaginative protein creation". Hicks, Humphries, and Slater, in their article in Ethics and Information Technology, argue that the output of LLMs is "bullshit" under Harry Frankfurt's definition of the term, and that the models are "in an important way indifferent to the truth of their outputs", with true statements only accidentally true, and false ones accidentally false. Some researchers also use the derogatory term "botshit", often referring to uncritical use of AI. === Criticism === In the scientific community, some researchers avoid the term "hallucination", seeing it as potentially misleading. It has been criticized by Usama Fayyad, executive director of the Institute for Experimental Artificial Intelligence at Northeastern University, on the grounds that it misleadingly personifies large language models and is vague. Mary Shaw said, "The current fashion for calling generative AI's errors 'hallucinations' is appalling. It anthropomorphizes the software, and it spins actual errors as somehow being idiosyncratic quirks of the system even when they're objectively incorrect." In Salon, statistician Gary Smith argues that LLMs "do not understand what words mean" and consequently that the term "hallucination" unreasonably anthropomorphizes the machine. Murray Shanahan argues that anthropomorphic framing of LLM capabilities, including terms like "hallucination", encourages users and researchers to attribute cognitive processes to systems that operate through statistical pattern completion, and advocates for more careful linguistic practices when discussing LLM behavior. Kristina Šekrst argues that applying psychological vocabulary to LLM outputs obscures the difference between the appearance of mental properties and their genuine presence. Förster & Skop assert that tech companies use the hallucination metaphor to anthropomorphize models and deflect responsibility for non-factual outputs. Some see the AI outputs not as illusory but as prospective—that is, having some chance of being true, similar to early-stage scientific conjectures. The term has also been criticized for its association with psychedelic drug experiences. == In natural language generation == In natural language generation, there are several reasons why natural language models hallucinate: === Hallucination from data === Hallucinations can stem from incomplete, inaccurate or unrepresentative data sets. === Modeling-related causes === The pre-training of generative pretrained transformers (GPT) involves predicting the next word. It incentivizes GPT models to "give a guess" about what the next word is, even when they lack information. Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension between novelty and usefulness. For instance, Amabile and Pratt define human creativity as the production of novel and useful ideas. By extension, a focus on novelty in machine creativity can lead to the production of original but inaccurate responses—that is, falsehoods—whereas a focus on usefulness may result in memorized content lacking originality. By 2022, newspapers such as The New York Times expressed concern that, as the adoption of bots based on large language models continued to grow, unwarranted user confidence in bot output could lead to problems. === Interpretability research === In 2025, interpretability research by Anthropic on the LLM Claude identified internal circuits that cause it to decline to answer questions unless it knows the answer. By default, the circuit is active and the LLM doesn't answer. When the LLM has sufficient information, these circuits are inhibited and the LLM answers the question. Hallucinations were found to occur when this inhibition happens incorrectly, such as when Claude recognizes a name but lacks sufficient information about that person, causing it to generate plausible but untrue responses. === Examples === On 15 November 2022, researchers from Meta AI published Galactica, designed to "store, combine and reason about scientific knowledge". Content generated by Galactica came with the warning: "Outputs may be unreliable! Language Models are prone to hallucinate text." In one case, when asked to draft a paper on creating avatars, Galactica cited a fictitious paper from a real author who works in the relevant area. Meta withdrew Galactica on 17 November due to offensiveness and inaccuracy. OpenAI's ChatGPT, released in beta version to the public on November 30, 2022, was based on the foundation model GPT-3.5 (a revision of GPT-3). Professor Ethan Mollick of Wharton called it an "omniscient, eager-to-please intern who sometimes lies to you". Data scientist Teresa Kuba

    Read more →
  • Empirical risk minimization

    Empirical risk minimization

    In statistical learning theory, the principle of empirical risk minimization defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practice (i.e. the "true risk") because we do not know the true distribution of the data, but we can instead estimate and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk". == Background == The following situation is a general setting of many supervised learning problems. There are two spaces of objects X {\displaystyle X} and Y {\displaystyle Y} and we would like to learn a function h : X → Y {\displaystyle \ h:X\to Y} (often called hypothesis) which outputs an object y ∈ Y {\displaystyle y\in Y} , given x ∈ X {\displaystyle x\in X} . To do so, there is a training set of n {\displaystyle n} examples ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} where x i ∈ X {\displaystyle x_{i}\in X} is an input and y i ∈ Y {\displaystyle y_{i}\in Y} is the corresponding response that is desired from h ( x i ) {\displaystyle h(x_{i})} . To put it more formally, assuming that there is a joint probability distribution P ( x , y ) {\displaystyle P(x,y)} over X {\displaystyle X} and Y {\displaystyle Y} , and that the training set consists of n {\displaystyle n} instances ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} drawn i.i.d. from P ( x , y ) {\displaystyle P(x,y)} . The assumption of a joint probability distribution allows for the modelling of uncertainty in predictions (e.g. from noise in data) because y {\displaystyle y} is not a deterministic function of x {\displaystyle x} , but rather a random variable with conditional distribution P ( y | x ) {\displaystyle P(y|x)} for a fixed x {\displaystyle x} . It is also assumed that there is a non-negative real-valued loss function L ( y ^ , y ) {\displaystyle L({\hat {y}},y)} which measures how different the prediction y ^ {\displaystyle {\hat {y}}} of a hypothesis is from the true outcome y {\displaystyle y} . For classification tasks, these loss functions can be scoring rules. The risk associated with hypothesis h ( x ) {\displaystyle h(x)} is then defined as the expectation of the loss function: R ( h ) = E [ L ( h ( x ) , y ) ] = ∫ L ( h ( x ) , y ) d P ( x , y ) . {\displaystyle R(h)=\mathbf {E} [L(h(x),y)]=\int L(h(x),y)\,dP(x,y).} A loss function commonly used in theory is the 0-1 loss function: L ( y ^ , y ) = { 1 if y ^ ≠ y 0 if y ^ = y {\displaystyle L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}} . The ultimate goal of a learning algorithm is to find a hypothesis h ∗ {\displaystyle h^{}} among a fixed class of functions H {\displaystyle {\mathcal {H}}} for which the risk R ( h ) {\displaystyle R(h)} is minimal: h ∗ = a r g m i n h ∈ H R ( h ) . {\displaystyle h^{}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,{R(h)}.} For classification problems, the Bayes classifier is defined to be the classifier minimizing the risk defined with the 0–1 loss function. == Formal definition == In general, the risk R ( h ) {\displaystyle R(h)} cannot be computed because the distribution P ( x , y ) {\displaystyle P(x,y)} is unknown to the learning algorithm. However, given a sample of iid training data points, we can compute an estimate, called the empirical risk, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the empirical measure: R emp ( h ) = 1 n ∑ i = 1 n L ( h ( x i ) , y i ) . {\displaystyle \!R_{\text{emp}}(h)={\frac {1}{n}}\sum _{i=1}^{n}L(h(x_{i}),y_{i}).} The empirical risk minimization principle states that the learning algorithm should choose a hypothesis h ^ {\displaystyle {\hat {h}}} which minimizes the empirical risk over the hypothesis class H {\displaystyle {\mathcal {H}}} : h ^ = a r g m i n h ∈ H R emp ( h ) . {\displaystyle {\hat {h}}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,R_{\text{emp}}(h).} Thus, the learning algorithm defined by the empirical risk minimization principle consists in solving the above optimization problem. == Properties == Guarantees for the performance of empirical risk minimization depend strongly on the function class selected as well as the distributional assumptions made. In general, distribution-free methods are too coarse, and do not lead to practical bounds. However, they are still useful in deriving asymptotic properties of learning algorithms, such as consistency. In particular, distribution-free bounds on the performance of empirical risk minimization given a fixed function class can be derived using bounds on the VC complexity of the function class. For simplicity, considering the case of binary classification tasks, it is possible to bound the probability of the selected classifier, ϕ n {\displaystyle \phi _{n}} being much worse than the best possible classifier ϕ ∗ {\displaystyle \phi ^{}} . Consider the risk L {\displaystyle L} defined over the hypothesis class C {\displaystyle {\mathcal {C}}} with growth function S ( C , n ) {\displaystyle {\mathcal {S}}({\mathcal {C}},n)} given a dataset of size n {\displaystyle n} . Then, for every ϵ > 0 {\displaystyle \epsilon >0} : P ( L ( ϕ n ) − L ( ϕ ∗ ) > ϵ ) ≤ 8 S ( C , n ) exp ⁡ { − n ϵ 2 / 32 } {\displaystyle \mathbb {P} \left(L(\phi _{n})-L(\phi ^{})>\epsilon \right)\leq {\mathcal {8}}S({\mathcal {C}},n)\exp\{-n\epsilon ^{2}/32\}} Similar results hold for regression tasks. These results are often based on uniform laws of large numbers, which control the deviation of the empirical risk from the true risk, uniformly over the hypothesis class. === Impossibility results === It is also possible to show lower bounds on algorithm performance if no distributional assumptions are made. This is sometimes referred to as the No free lunch theorem. Even though a specific learning algorithm may provide the asymptotically optimal performance for any distribution, the finite sample performance is always poor for at least one data distribution. This means that no classifier can improve on the error for a given sample size for all distributions. Specifically, let ϵ > 0 {\displaystyle \epsilon >0} and consider a sample size n {\displaystyle n} and classification rule ϕ n {\displaystyle \phi _{n}} , there exists a distribution of ( X , Y ) {\displaystyle (X,Y)} with risk L ∗ = 0 {\displaystyle L^{}=0} (meaning that perfect prediction is possible) such that: E L n ≥ 1 / 2 − ϵ . {\displaystyle \mathbb {E} L_{n}\geq 1/2-\epsilon .} It is further possible to show that the convergence rate of a learning algorithm is poor for some distributions. Specifically, given a sequence of decreasing positive numbers a i {\displaystyle a_{i}} converging to zero, it is possible to find a distribution such that: E L n ≥ a i {\displaystyle \mathbb {E} L_{n}\geq a_{i}} for all n {\displaystyle n} . This result shows that universally good classification rules do not exist, in the sense that the rule must be low quality for at least one distribution. === Computational complexity === Empirical risk minimization for a classification problem with a 0-1 loss function is known to be an NP-hard problem even for a relatively simple class of functions such as linear classifiers. Nevertheless, it can be solved efficiently when the minimal empirical risk is zero, i.e., data is linearly separable. In practice, machine learning algorithms cope with this issue either by employing a convex approximation to the 0–1 loss function (like hinge loss for SVM), which is easier to optimize, or by imposing assumptions on the distribution P ( x , y ) {\displaystyle P(x,y)} (and thus stop being agnostic learning algorithms to which the above result applies). In the case of convexification, Zhang's lemma majors the excess risk of the original problem using the excess risk of the convexified problem. Minimizing the latter using convex optimization also allow to control the former. == Tilted empirical risk minimization == Tilted empirical risk minimization is a machine learning technique used to modify standard loss functions like squared error, by introducing a tilt parameter. This parameter dynamically adjusts the weight of data points during training, allowing the algorithm to focus on specific regions or characteristics of the data distribution. Tilted empirical risk minimization is particularly useful in scenarios with imbalanced data or when there is a need to emphasize errors in certain parts of the prediction space.

    Read more →
  • Ameca (robot)

    Ameca (robot)

    Ameca is a robotic humanoid created in 2021 by Engineered Arts, headquarters in Falmouth, Cornwall, United Kingdom. The project commenced in February 2021, and the first public demonstration was at the CES 2022 show in Las Vegas. Ameca's appearance features grey rubber skin on the face and hands, and is specifically designed to appear genderless. In 2024, an Ameca unit was installed in Edinburgh in the UK to reside at the National Robotarium. Ameca generation 3 has been released and showcased at ICRA 2025 along with Ami. == History == The first generation of Ameca was developed at Engineered Arts headquarters in Falmouth, Cornwall, United Kingdom. The project started in February 2021, with the first video revealed publicly on 1 December 2021. Ameca gained widespread attention on Twitter and TikTok ahead of its first public demonstration at the Consumer Electronics Show 2022, where it was covered by CNET and other news outlets. In 2022, Ameca presented an Alternative Christmas message by British TV Channel 4 for Christmas Day. Ameca was associated with the Museum of the Future's robotic family, where it could interact with visitors. In 2024, an Ameca unit was installed in Edinburgh in the UK to reside at the National Robotarium. In January 2026, Ameca served as an ambassador for the European Space Agency (ESA) at the 18th European Space Conference. == Features == It is designed as a platform for further developing robotics technologies involving human-robot interaction. utilizes embedded microphones, binocular eye mounted cameras, a chest camera and facial recognition software to interact with the public. Interactions can be governed by either OpenAI's GPT-3 or human telepresence. It also features articulated motorized arms, fingers, neck and facial features. Ameca's appearance features grey rubber skin on the face and hands, and is specifically designed to appear genderless. == Public appearances == Computer History Museum, California Heinz Nixdorf MuseumsForum, Paderborn, Germany Copernicus Science Center, Warsaw, Poland Museum of the Future, Dubai Consumer Electronics Show 2022 Deutsches Museum Nuremberg OMR Festival 2022 Hosted by Vodafone GITEX 2022 International Conference on Robotics and Automation 2023 International Telecommunication Union AI for Good Global Summit 2023 Sphere (Not Ameca, Custom humanoid named Aura built on Ameca technology)

    Read more →
  • Comparison of user features of messaging platforms

    Comparison of user features of messaging platforms

    Comparison of user features of messaging platforms refers to a comparison of all the various user features of various electronic instant messaging platforms. This includes a wide variety of resources; it includes standalone apps, platforms within websites, computer software, and various internal functions available on specific devices, such as iMessage for iPhones. This entry includes only the features and functions that shape the user experience for such apps. A comparison of the underlying system components, programming aspects, and other internal technical information, is outside the scope of this entry. == Overview and background == Instant messaging technology is a type of online chat that offers real-time text transmission over the Internet. A LAN messenger operates in a similar way over a local area network. Short messages are typically transmitted between two parties when each user chooses to complete a thought and select "send". Some IM applications can use push technology to provide real-time text, which transmits messages character by character, as they are composed. More advanced instant messaging can add file transfer, clickable hyperlinks, Voice over IP, or video chat. Non-IM types of chat include multicast transmission, usually referred to as "chat rooms", where participants might be anonymous or might be previously known to each other (for example collaborators on a project that is using chat to facilitate communication). Instant messaging systems tend to facilitate connections between specified known users (often using a contact list also known as a "buddy list" or "friend list"). Depending on the IM protocol, the technical architecture can be peer-to-peer (direct point-to-point transmission) or client-server (an Instant message service center retransmits messages from the sender to the communication device). By 2010, instant messaging over the Web was in sharp decline, in favor of messaging features on social networks. The most popular IM platforms were terminated, such as AIM which closed down and Windows Live Messenger which merged into Skype. Instant messaging has since seen a revival in popularity in the form of "messaging apps" (usually on mobile devices) which by 2014 had more users than social networks. As of 2010, social networking providers often offer IM abilities. Facebook Chat is a form of instant messaging, and Twitter can be thought of as a Web 2.0 instant messaging system. Similar server-side chat features are part of most dating websites, such as OkCupid or PlentyofFish. The spread of smartphones and similar devices in the late 2000s also caused increased competition with conventional instant messaging, by making text messaging services still more ubiquitous. Many instant messaging services offer video calling features, voice over IP and web conferencing services. Web conferencing services can integrate both video calling and instant messaging abilities. Some instant messaging companies are also offering desktop sharing, IP radio, and IPTV to the voice and video features. The term "Instant Messenger" is a service mark of Time Warner and may not be used in software not affiliated with AOL in the United States. For this reason, in April 2007, the instant messaging client formerly named Gaim (or gaim) announced that they would be renamed "Pidgin". In the 2010s, more people started to use messaging apps on modern computers and devices like WhatsApp, WeChat, Viber, Facebook Messenger, Telegram, Signal and Line rather than instant messaging on computers like AIM and Windows Live Messenger. For example, WhatsApp was founded in 2009, and Facebook acquired in 2014, by which time it already had half a billion users. === Concepts === ==== Backchannel ==== Backchannel is the practice of using networked computers to maintain a real-time online conversation alongside the primary group activity or live spoken remarks. The term was coined in the field of linguistics to describe listeners' behaviours during verbal communication. (See Backchannel (linguistics).) The term "backchannel" generally refers to online conversation about the conference topic or speaker. Occasionally backchannel provides audience members a chance to fact-check the presentation. First growing in popularity at technology conferences, backchannel is increasingly a factor in education where WiFi connections and laptop computers allow participants to use ordinary chat like IRC or AIM to actively communicate during presentation. More recent research include works where the backchannel is brought publicly visible, such as the ClassCommons, backchan.nl and Fragmented Social Mirror. Twitter is also widely used today by audiences to create backchannels during broadcasting of content or at conferences. For example, television drama, other forms of entertainment and magazine programs. This practice is often also called live tweeting. Many conferences nowadays also have a hashtag that can be used by the participants to share notes and experiences; furthermore such hashtags can be user generated. == Features == Various platforms and apps are distinguished by their strengths and features in regards to specific functions. === Group messaging === === Official channels === Some apps include a feature known as "official channels" which allows companies, especially news media outlets, publications, and other mass media companies, to offer an official channel, which users can join, and thereby receive regular updates, published articles, or news updates from companies or news outlets. Two apps which have a large amount of such channels available are Line and Telegram. === Video group calls === == Basic default platforms == Basic platforms which are common across entire categories of mobile devices, computers, or operating systems. === SMS === SMS (short message service) is a text messaging service component of most telephone, Internet, and mobile device systems. It uses standardized communication protocols to enable mobile devices to exchange short text messages. An intermediary service can facilitate a text-to-voice conversion to be sent to landlines. SMS, as used on modern devices, originated from radio telegraphy in radio memo pagers that used standardized phone protocols. These were defined in 1985 as part of the Global System for Mobile Communications (GSM) series of standards. The first test SMS message was sent on December 3, 1992, when Neil Papwort, a test engineer for Sema Group, used a personal computer to send "Merry Christmas" to the phone of colleague Richard Jarvis. It commercially rolled out to many cellular networks that decade. SMS became hugely popular worldwide as a way of text communication. By the end of 2010, SMS was the most widely used data application, with an estimated 3.5 billion active users, or about 80% of all mobile phone subscribers. The protocols allowed users to send and receive messages of up to 160 characters (when entirely alpha-numeric) to and from GSM mobiles. Although most SMS messages are sent from one mobile phone to another, support for the service has expanded to include other mobile technologies, such as ANSI CDMA networks and Digital AMPS. Mobile marketing, a type of direct marketing, uses SMS. According to a 2018 market research report the global SMS messaging business was estimated to be worth over US$100 billion, accounting for almost 50 percent of all the revenue generated by mobile messaging. A Flash SMS is a type of SMS that appears directly on the main screen without user interaction and is not automatically stored in the inbox. It can be useful in emergencies, such as a fire alarm or cases of confidentiality, as in delivering one-time passwords. ==== Threaded SMS format ==== Threaded SMS is a visual styling orientation of SMS message history that arranges messages to and from a contact in chronological order on a single screen. It was first invented by a developer working to implement the SMS client for the BlackBerry, who was looking to make use of the blank screen left below the message on a device with a larger screen capable of displaying far more than the usual 160 characters, and was inspired by threaded Reply conversations in email. Visually, this style of representation provides a back-and-forth chat-like history for each individual contact. Hierarchical-threading at the conversation-level (as typical in blogs and on-line messaging boards) is not widely supported by SMS messaging clients. This limitation is due to the fact that there is no session identifier or subject-line passed back and forth between sent and received messages in the header data (as specified by SMS protocol) from which the client device can properly thread an incoming message to a specific dialogue, or even to a specific message within a dialogue. Most smart phone text-messaging-clients are able to create some contextual threading of "group messages" which narrows the context of the thread around the common interests shared by

    Read more →
  • Belief–desire–intention model

    Belief–desire–intention model

    For popular psychology, the belief–desire–intention (BDI) model of human practical reasoning was developed by Michael Bratman as a way of explaining future-directed intention. BDI is fundamentally reliant on folk psychology (the 'theory theory'), which is the notion that our mental models of the world are theories. It was used as a basis for developing the belief–desire–intention software model. == Applications == BDI was part of the inspiration behind the BDI software architecture, which Bratman was also involved in developing. Here, the notion of intention was seen as a way of limiting time spent on deliberating about what to do, by eliminating choices inconsistent with current intentions. BDI has also aroused some interest in psychology. BDI formed the basis for a computational model of childlike reasoning CRIBB.

    Read more →
  • Inauthentic text

    Inauthentic text

    An inauthentic text is a computer-generated expository document meant to appear as genuine, but which is actually meaningless. Frequently they are created in order to be intermixed with genuine documents and thus manipulate the results of search engines, as with Spam blogs. They are also carried along in email in order to fool spam filters by giving the spam the superficial characteristics of legitimate text. Sometimes nonsensical documents are created with computer assistance for humorous effect, as with Dissociated press or Flarf poetry. They have also been used to challenge the veracity of a publication—MIT students submitted papers generated by a computer program called SCIgen to a conference, where they were initially accepted. This led the students to claim that the bar for submissions was too low. With the amount of computer generated text outpacing the ability of people to humans to curate it, there needs some means of distinguishing between the two. Yet automated approaches to determining absolutely whether a text is authentic or not face intrinsic challenges of semantics. Noam Chomsky coined the phrase "Colorless green ideas sleep furiously" giving an example of grammatically correct, but semantically incoherent sentence; some will point out that in certain contexts one could give this sentence (or any phrase) meaning. The first group to use the expression in this regard can be found below from Indiana University. Their work explains in detail an attempt to detect inauthentic texts and identify pernicious problems of inauthentic texts in cyberspace. The site has a means of submitting text that assesses, based on supervised learning, whether a corpus is inauthentic or not. Many users have submitted incorrect types of data and have correspondingly commented on the scores. This application is meant for a specific kind of data; therefore, submitting, say, an email, will not return a meaningful score.

    Read more →
  • Computer audition

    Computer audition

    Computer audition (CA) or machine listening is the general field of study of algorithms and systems for audio interpretation by machines. Since the notion of what it means for a machine to "hear" is very broad and somewhat vague, computer audition attempts to bring together several disciplines that originally dealt with specific problems or had a concrete application in mind. The engineer Paris Smaragdis, interviewed in Technology Review, talks about these systems — "software that uses sound to locate people moving through rooms, monitor machinery for impending breakdowns, or activate traffic cameras to record accidents." Inspired by models of human audition, CA deals with questions of representation, transduction, grouping, use of musical knowledge and general sound semantics for the purpose of performing intelligent operations on audio and music signals by the computer. Technically this requires a combination of methods from the fields of signal processing, auditory modelling, music perception and cognition, pattern recognition, and machine learning, as well as more traditional methods of artificial intelligence for musical knowledge representation. == Applications == Like computer vision versus image processing, computer audition versus audio engineering deals with understanding of audio rather than processing. It also differs from problems of speech understanding by machine since it deals with general audio signals, such as natural sounds and musical recordings. Applications of computer audition are widely varying, and include search for sounds, genre recognition, acoustic monitoring, music transcription, score following, audio texture, music improvisation, emotion in audio and so on. == Related disciplines == Computer Audition overlaps with the following disciplines: Music information retrieval: methods for search and analysis of similarity between music signals. Auditory scene analysis: understanding and description of audio sources and events. Computational musicology and mathematical music theory: use of algorithms that employ musical knowledge for analysis of music data. Computer music: use of computers in creative musical applications. Machine musicianship: audition driven interactive music systems. == Areas of study == Since audio signals are interpreted by the human ear–brain system, that complex perceptual mechanism should be simulated somehow in software for "machine listening". In other words, to perform on par with humans, the computer should hear and understand audio content much as humans do. Analyzing audio accurately involves several fields: electrical engineering (spectrum analysis, filtering, and audio transforms); artificial intelligence (machine learning and sound classification); psychoacoustics (sound perception); cognitive sciences (neuroscience and artificial intelligence); acoustics (physics of sound production); and music (harmony, rhythm, and timbre). Furthermore, audio transformations such as pitch shifting, time stretching, and sound object filtering, should be perceptually and musically meaningful. For best results, these transformations require perceptual understanding of spectral models, high-level feature extraction, and sound analysis/synthesis. Finally, structuring and coding the content of an audio file (sound and metadata) could benefit from efficient compression schemes, which discard inaudible information in the sound. Computational models of music and sound perception and cognition can lead to a more meaningful representation, a more intuitive digital manipulation and generation of sound and music in musical human-machine interfaces. The study of CA could be roughly divided into the following sub-problems: Representation: signal and symbolic. This aspect deals with time-frequency representations, both in terms of notes and spectral models, including pattern playback and audio texture. Feature extraction: sound descriptors, segmentation, onset, pitch and envelope detection, chroma, and auditory representations. Musical knowledge structures: analysis of tonality, rhythm, and harmonies. Sound similarity: methods for comparison between sounds, sound identification, novelty detection, segmentation, and clustering. Sequence modeling: matching and alignment between signals and note sequences. Source separation: methods of grouping of simultaneous sounds, such as multiple pitch detection and time-frequency clustering methods. Auditory cognition: modeling of emotions, anticipation and familiarity, auditory surprise, and analysis of musical structure. Multi-modal analysis: finding correspondences between textual, visual, and audio signals. === Representation issues === Computer audition deals with audio signals that can be represented in a variety of fashions, from direct encoding of digital audio in two or more channels to symbolically represented synthesis instructions. Audio signals are usually represented in terms of analogue or digital recordings. Digital recordings are samples of acoustic waveform or parameters of audio compression algorithms. One of the unique properties of musical signals is that they often combine different types of representations, such as graphical scores and sequences of performance actions that are encoded as MIDI files. Since audio signals usually comprise multiple sound sources, then unlike speech signals that can be efficiently described in terms of specific models (such as source-filter model), it is hard to devise a parametric representation for general audio. Parametric audio representations usually use filter banks or sinusoidal models to capture multiple sound parameters, sometimes increasing the representation size in order to capture internal structure in the signal. Additional types of data that are relevant for computer audition are textual descriptions of audio contents, such as annotations, reviews, and visual information in the case of audio-visual recordings. === Features === Description of contents of general audio signals usually requires extraction of features that capture specific aspects of the audio signal. Generally speaking, one could divide the features into signal or mathematical descriptors such as energy, description of spectral shape etc., statistical characterization such as change or novelty detection, special representations that are better adapted to the nature of musical signals or the auditory system, such as logarithmic growth of sensitivity (bandwidth) in frequency or octave invariance (chroma). Since parametric models in audio usually require very many parameters, the features are used to summarize properties of multiple parameters in a more compact or salient representation. === Musical knowledge === Finding specific musical structures is possible by using musical knowledge as well as supervised and unsupervised machine learning methods. Examples of this include detection of tonality according to distribution of frequencies that correspond to patterns of occurrence of notes in musical scales, distribution of note onset times for detection of beat structure, distribution of energies in different frequencies to detect musical chords and so on. === Sound similarity and sequence modeling === Comparison of sounds can be done by comparison of features with or without reference to time. In some cases an overall similarity can be assessed by close values of features between two sounds. In other cases when temporal structure is important, methods of dynamic time warping need to be applied to "correct" for different temporal scales of acoustic events. Finding repetitions and similar sub-sequences of sonic events is important for tasks such as texture synthesis and machine improvisation. === Source separation === Since one of the basic characteristics of general audio is that it comprises multiple simultaneously sounding sources, such as multiple musical instruments, people talking, machine noises or animal vocalization, the ability to identify and separate individual sources is very desirable. Unfortunately, there are no methods that can solve this problem in a robust fashion. Existing methods of source separation rely sometimes on correlation between different audio channels in multi-channel recordings. The ability to separate sources from stereo signals requires different techniques than those usually applied in communications where multiple sensors are available. Other source separation methods rely on training or clustering of features in mono recording, such as tracking harmonically related partials for multiple pitch detection. Some methods, before explicit recognition, rely on revealing structures in data without knowing the structures (like recognizing objects in abstract pictures without attributing them meaningful labels) by finding the least complex data representations, for instance describing audio scenes as generated by a few tone patterns and their trajectories (polyphonic voices) and acoustical contours drawn by a tone (c

    Read more →
  • Integrated test facility

    Integrated test facility

    An integrated test facility (ITF) creates a fictitious entity in a database to process test transactions simultaneously with live input. ITF can be used to incorporate test transactions into a normal production run of a system. Its advantage is that periodic testing does not require separate test processes. However, careful planning is necessary, and test data must be isolated from production data. Moreover, ITF validates the correct operation of a transaction in an application, but it does not ensure that a system is being operated correctly. Integrated test facility is considered a useful audit tool during an IT audit because it uses the same programs to compare processing using independently calculated data. This involves setting up dummy entities on an application system and processing test or production data against the entity as a means of verifying processing accuracy.

    Read more →
  • Empowerment (artificial intelligence)

    Empowerment (artificial intelligence)

    Empowerment in the field of artificial intelligence formalises and quantifies (via information theory) the potential an agent perceives that it has to influence its environment. An agent which follows an empowerment maximising policy, acts to maximise future options (typically up to some limited horizon). Empowerment can be used as a (pseudo) utility function that depends only on information gathered from the local environment to guide action, rather than seeking an externally imposed goal, thus is a form of intrinsic motivation. The empowerment formalism depends on a probabilistic model commonly used in artificial intelligence. An autonomous agent operates in the world by taking in sensory information and acting to change its state, or that of the environment, in a cycle of perceiving and acting known as the perception-action loop. Agent state and actions are modelled by random variables ( S : s ∈ S , A : a ∈ A {\displaystyle S:s\in {\mathcal {S}},A:a\in {\mathcal {A}}} ) and time ( t {\displaystyle t} ). The choice of action depends on the current state, and the future state depends on the choice of action, thus the perception-action loop unrolled in time forms a causal bayesian network. == Definition == Empowerment ( E {\displaystyle {\mathfrak {E}}} ) is defined as the channel capacity ( C {\displaystyle C} ) of the actuation channel of the agent, and is formalised as the maximal possible information flow between the actions of the agent and the effect of those actions some time later. Empowerment can be thought of as the future potential of the agent to affect its environment, as measured by its sensors. E := C ( A t ⟶ S t + 1 ) ≡ max p ( a t ) I ( A t ; S t + 1 ) {\displaystyle {\mathfrak {E}}:=C(A_{t}\longrightarrow S_{t+1})\equiv \max _{p(a_{t})}I(A_{t};S_{t+1})} In a discrete time model, Empowerment can be computed for a given number of cycles into the future, which is referred to in the literature as 'n-step' empowerment. E ( A t n ⟶ S t + n ) = max p ( a t , . . . , a t + n − 1 ) I ( A t , . . . , A t + n − 1 ; S t + n ) {\displaystyle {\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n})=\max _{p(a_{t},...,a_{t+n-1})}I(A_{t},...,A_{t+n-1};S_{t+n})} The unit of empowerment depends on the logarithm base. Base 2 is commonly used in which case the unit is bits. === Contextual Empowerment === In general the choice of action (action distribution) that maximises empowerment varies from state to state. Knowing the empowerment of an agent in a specific state is useful, for example to construct an empowerment maximising policy. State-specific empowerment can be found using the more general formalism for 'contextual empowerment'. C {\displaystyle C} is a random variable describing the context (e.g. state). E ( A t n ⟶ S t + n ∣ C ) = ∑ c ∈ C p ( c ) E ( A t n ⟶ S t + n ∣ C = c ) {\displaystyle {\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C)=\sum _{c{\in }C}p(c){\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C=c)} == Application == Empowerment maximisation can be used as a pseudo-utility function to enable agents to exhibit intelligent behaviour without requiring the definition of external goals, for example balancing a pole in a cart-pole balancing scenario where no indication of the task is provided to the agent. Empowerment has been applied in studies of collective behaviour and in continuous domains. As is the case with Bayesian methods in general, computation of empowerment becomes computationally expensive as the number of actions and time horizon extends, but approaches to improve efficiency have led to usage in real-time control. Empowerment has been used for intrinsically motivated reinforcement learning agents playing video games, and in the control of underwater vehicles.

    Read more →
  • And–or tree

    And–or tree

    An and–or tree is a graphical representation of the reduction of problems (or goals) to conjunctions and disjunctions of subproblems (or subgoals). == Example == The and–or tree: represents the search space for solving the problem P, using the goal-reduction methods: P if Q and R P if S Q if T Q if U == Definitions == Given an initial problem P0 and set of problem solving methods of the form: P if P1 and … and Pn the associated and–or tree is a set of labelled nodes such that: The root of the tree is a node labelled by P0. For every node N labelled by a problem or sub-problem P and for every method of the form P if P1 and ... and Pn, there exists a set of children nodes N1, ..., Nn of the node N, such that each node Ni is labelled by Pi. The nodes are conjoined by an arc, to distinguish them from children of N that might be associated with other methods. A node N, labelled by a problem P, is a success node if there is a method of the form P if nothing (i.e., P is a "fact"). The node is a failure node if there is no method for solving P. If all of the children of a node N, conjoined by the same arc, are success nodes, then the node N is also a success node. Otherwise the node is a failure node. == Search strategies == An and–or tree specifies only the search space for solving a problem. Different search strategies for searching the space are possible. These include searching the tree depth-first, breadth-first, or best-first using some measure of desirability of solutions. The search strategy can be sequential, searching or generating one node at a time, or parallel, searching or generating several nodes in parallel. == Relationship with logic programming == The methods used for generating and–or trees are propositional logic programs (without variables). In the case of logic programs containing variables, the solutions of conjoint sub-problems must be compatible. Subject to this complication, sequential and parallel search strategies for and–or trees provide a computational model for executing logic programs. == Relationship with two-player games == And–or trees can also be used to represent the search spaces for two-person games. The root node of such a tree represents the problem of one of the players winning the game, starting from the initial state of the game. Given a node N, labelled by the problem P of the player winning the game from a particular state of play, there exists a single set of conjoint children nodes, corresponding to all of the opponents responding moves. For each of these children nodes, there exists a set of non-conjoint children nodes, corresponding to all of the player's defending moves. For solving game trees with proof-number search family of algorithms, game trees are to be mapped to and–or trees. MAX-nodes (i.e. maximizing player to move) are represented as OR nodes, MIN-nodes map to AND nodes. The mapping is possible, when the search is done with only a binary goal, which usually is "player to move wins the game".

    Read more →