AI Assistant Quest 3

AI Assistant Quest 3 — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Couch to 5K

    Couch to 5K

    Couch to 5K, abbreviated C25K, is an exercise plan that gradually progresses from beginner running toward a 5 kilometre (3.1 mile) run over nine weeks. == Operations == The Couch to 5K running plan, also known as C25K, created by Josh Clark in 1996, was developed with the expectation of creating a plan for new runners to start running. The plan is aimed to have users work out for 20 to 30 minutes, three days a week. Within the program, users can be expected to perform different tasks such as intervals of running with period of short walks in between to help build endurance in the weeks up to the final goal of a 5K run. During the nine weeks leading up to the race, the runner will learn to set their own pace and where their strengths and weaknesses are within running. Often, the daily workouts start with a five-minute warm-up walk and works up to running five kilometres without a walking break within nine weeks. Users are not expected to have any experience in running and can be some of the first running that they ever do. The main goal is to turn that unexperienced runner into someone who can run a 5K. Clark started the website Kick and featured C25K on the site. In 2001, Kick merged with Cool Running, a New England–based running site. Clark later sold his stake in Cool Running and the Couch to 5K program. Cool Running was absorbed into Active.com, operated by Active Network, LLC. Active Network provides mobile apps for Couch to 5K, as well as 5K to 10K, a follow-up program. The NHS in the UK provides downloadable podcasts and a smartphone app (Android and iOS) for the plan. A mobile app, created by Zen Labs, has training plans that are based on the Couch to 5K running plan from CoolRunning.com. It is one of the highest-rated health and fitness apps available on Android and iOS. As of 2016, the C25K app has been used by over 5 million people.

    Read more →
  • AIOps

    AIOps

    AIOps (Artificial Intelligence for IT Operations) refers to the use of artificial intelligence, machine learning, and big data analytics to automate and enhance data center management. It helps organizations manage complex IT environments by detecting, diagnosing, and resolving issues more efficiently than traditional methods. == History == AIOps was first defined by Gartner in 2016, combining "artificial intelligence" and "IT operations" to describe the application of AI and machine learning to enhance IT operations. This concept was introduced to address the increasing complexity and data volume in IT environments, aiming to automate processes such as event correlation, anomaly detection, and causality determination. == Definition == AIOps refers to multi-layered, complex technology platforms that enhance and automate IT operations by using machine learning and analytics to analyze the large amounts of data collected from various DevOps devices and tools, automatically identifying and responding to issues in real-time. AIOps represents a shift from isolated IT data to aggregated observational data (e.g., job logs and monitoring systems) and interaction data (such as ticketing, events, or incident records) within a big data platform. AIOps applies machine learning and analytics to this data, resulting in continuous visibility that, when combined with automation, can lead to ongoing improvements. AIOps connects three IT disciplines (automation, service management, and performance management) to achieve continuous visibility and improvement. This new approach in modern, accelerated, and hyper-scaled IT environments leverages advances in machine learning and big data to overcome previous limitations. == Components == AIOps includes, but is not limited to, the following processes and techniques: Anomaly Detection Log Analysis Root Cause Analysis Cohort Analysis Event Correlation Predictive Analytics Hardware Failure Prediction Automated Remediation Performance Prediction Incident Management Causality Determination Queue Management Resource Scheduling and Optimization Predictive Capacity Management Resource Allocation Service Quality Monitoring Deployment and Integration Testing System Configuration Auto-diagnosis and Problem Localization Efficient ML Training and Inferencing Using LLMs for Cloud Ops Auto Service Healing Data Center Management Customer Support Security and Privacy in Cloud Operations == Comparison with DevOps == AIOps is increasingly compared with DevOps in terms of impact on operational efficiency. While DevOps focuses on collaboration between development and operations teams to accelerate software delivery, AIOps integrates artificial intelligence to enhance monitoring, automation, and predictive capabilities. Various industry analyses have explored the similarities and differences between the two approaches, including discussions on how organizations can combine them to improve incident management and resource optimization. == Results == AI optimizes IT operations in five ways: First, intelligent monitoring powered by AI helps identify potential issues before they cause outages, improving metrics like Mean Time to Detect (MTTD) by 15-20%. Second, performance data analysis and insights enable quick decision-making by ingesting and analyzing large data sets in real time. Third, AI-driven automated infrastructure optimization efficiently allocates resources and thereby reducing cloud costs. Fourth, enhanced IT service management reduces critical incidents by over 50% through AI-driven end-to-end service management. Lastly, intelligent task automation accelerates problem resolution and automates remedial actions with minimal human intervention. In 2025, Atera Networks was identified as a leader in AIOps by the software review platform G2. == AIOps vs. MLOps == AIOps tools use big data analytics, machine learning algorithms, and predictive analytics to detect anomalies, correlate events, and provide proactive insights. This automation reduces the burden on IT teams, allowing them to focus on strategic tasks rather than routine operational issues. AIOps is widely used by IT operations teams, DevOps, network administrators, and IT service management (ITSM) teams to enhance visibility and enable quicker incident resolution in hybrid cloud environments, data centers, and other IT infrastructures. In contrast to MLOps (Machine Learning Operations), which focuses on the lifecycle management and operational aspects of machine learning models, AIOps focuses on optimizing IT operations using a variety of analytics and AI-driven techniques. While both disciplines rely on AI and data-driven methods, AIOps primarily targets IT operations, whereas MLOps is concerned with the deployment, monitoring, and maintenance of ML models. == Conferences == There are several conferences that are specific to AIOps: AIOps Summit AI Dev Summit IBM Think conference

    Read more →
  • AI-complete

    AI-complete

    In the field of artificial intelligence (AI), tasks that are hypothesized to require artificial general intelligence to solve are informally known as AI-complete or AI-hard. Calling a problem AI-complete reflects the belief that it cannot be solved by a simple specific algorithm. Prior to 2013, problems supposed to be AI-complete included computer vision, natural language understanding, and dealing with unexpected circumstances while solving any real-world problem. AI-complete tasks were notably considered useful for distinguishing humans from automated agents, as CAPTCHAs aim to do. == History == The term was coined by Fanya Montalvo by analogy with NP-complete and NP-hard in complexity theory, which formally describes the most famous class of difficult problems. Early uses of the term are in Erik Mueller's 1987 PhD dissertation and in Eric Raymond's 1991 Jargon File. Expert systems, that were popular in the 1980s, were able to solve very simple and/or restricted versions of AI-complete problems, but never in their full generality. When AI researchers attempted to "scale up" their systems to handle more complicated, real-world situations, the programs tended to become excessively brittle without commonsense knowledge or a rudimentary understanding of the situation: they would fail as unexpected circumstances outside of its original problem context would begin to appear. When human beings are dealing with new situations in the world, they are helped by their awareness of the general context: they know what the things around them are, why they are there, what they are likely to do and so on. They can recognize unusual situations and adjust accordingly. Expert systems lacked this adaptability and were brittle when facing new situations. DeepMind published a work in May 2022 in which they trained a single model to do several things at the same time. The model, named Gato, can "play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens." Similarly, some tasks once considered to be AI-complete, like machine translation, are among the capabilities of large language models. == AI-complete problems == AI-complete problems have been hypothesized to include: AI peer review (composite natural language understanding, automated reasoning, automated theorem proving, formalized logic expert system) Bongard problems Computer vision (and subproblems such as object recognition) Natural language understanding (and subproblems such as text mining, machine translation, and word-sense disambiguation) Autonomous driving Dealing with unexpected circumstances while solving any real world problem, whether navigation, planning, or even the kind of reasoning done by expert systems. == Formalization == Computational complexity theory deals with the relative computational difficulty of computable functions. By definition, it does not cover problems whose solution is unknown or has not been characterized formally. Since many AI problems have no formalization yet, conventional complexity theory does not enable a formal definition of AI-completeness. == Research == Roman Yampolskiy suggests that a problem C {\displaystyle C} is AI-Complete if it has two properties: It is in the set of AI problems (Human Oracle-solvable). Any AI problem can be converted into C {\displaystyle C} by some polynomial time algorithm. On the other hand, a problem H {\displaystyle H} is AI-Hard if and only if there is an AI-Complete problem C {\displaystyle C} that is polynomial time Turing-reducible to H {\displaystyle H} . This also gives as a consequence the existence of AI-Easy problems, that are solvable in polynomial time by a deterministic Turing machine with an oracle for some problem. Yampolskiy has also hypothesized that the Turing Test is a defining feature of AI-completeness. Groppe and Jain classify problems which require artificial general intelligence to reach human-level machine performance as AI-complete, while only restricted versions of AI-complete problems can be solved by the current AI systems. For Šekrst, getting a polynomial solution to AI-complete problems would not necessarily be equal to solving the issue of artificial general intelligence, while emphasizing the lack of computational complexity research being the limiting factor towards achieving artificial general intelligence. For Kwee-Bintoro and Velez, solving AI-complete problems would have strong repercussions on society.

    Read more →
  • Data Science and Predictive Analytics

    Data Science and Predictive Analytics

    The first edition of the textbook Data Science and Predictive Analytics: Biomedical and Health Applications using R, authored by Ivo D. Dinov, was published in August 2018 by Springer. The second edition of the book was printed in 2023. This textbook covers some of the core mathematical foundations, computational techniques, and artificial intelligence approaches used in data science research and applications. By using the statistical computing platform R and a broad range of biomedical case-studies, the 23 chapters of the book first edition provide explicit examples of importing, exporting, processing, modeling, visualizing, and interpreting large, multivariate, incomplete, heterogeneous, longitudinal, and incomplete datasets (big data). == Structure == === First edition table of contents === The first edition of the Data Science and Predictive Analytics (DSPA) textbook is divided into the following 23 chapters, each progressively building on the previous content. === Second edition table of contents === The significantly reorganized revised edition of the book (2023) expands and modernizes the presented mathematical principles, computational methods, data science techniques, model-based machine learning and model-free artificial intelligence algorithms. The 14 chapters of the new edition start with an introduction and progressively build foundational skills to naturally reach biomedical applications of deep learning. Introduction Basic Visualization and Exploratory Data Analytics Linear Algebra, Matrix Computing, and Regression Modeling Linear and Nonlinear Dimensionality Reduction Supervised Classification Black Box Machine Learning Methods Qualitative Learning Methods—Text Mining, Natural Language Processing, and Apriori Association Rules Learning Unsupervised Clustering Model Performance Assessment, Validation, and Improvement Specialized Machine Learning Topics Variable Importance and Feature Selection Big Longitudinal Data Analysis Function Optimization Deep Learning, Neural Networks == Reception == The materials in the Data Science and Predictive Analytics (DSPA) textbook have been peer-reviewed in the Journal of the American Statistical Association, International Statistical Institute’s ISI Review Journal, and the Journal of the American Library Association. Many scholarly publications reference the DSPA textbook. As of January 17, 2021, the electronic version of the book first edition (ISBN 978-3-319-72347-1) is freely available on SpringerLink and has been downloaded over 6 million times. The textbook is globally available in print (hardcover and softcover) and electronic formats (PDF and EPub) in many college and university libraries and has been used for data science, computational statistics, and analytics classes at various institutions.

    Read more →
  • Simulation noise

    Simulation noise

    Simulation noise is a function that creates a divergence-free vector field. This signal can be used in artistic simulations for the purpose of increasing the perception of extra detail. The function can be calculated in three dimensions by dividing the space into a regular lattice grid. With each edge is associated a random value, indicating a rotational component of material revolving around the edge. By following rotating material into and out of faces, one can quickly sum the flux passing through each face of the lattice. Flux values at lattice faces are then interpolated to create a field value for all positions. Perlin noise is the earliest form of lattice noise, which has become very popular in computer graphics. Perlin Noise is not suited for simulation because it is not divergence-free. Noises based on lattices, such as simulation noise and Perlin noise, are often calculated at different frequencies and summed together to form band-limited fractal signals. Other approaches developed later that use vector calculus identities to produce divergence free fields, such as "Curl-Noise" as suggested by Rook Bridson, and "Divergence-Free Noise" due to Ivan DeWolf. These often require calculation of lattice noise gradients, which sometimes are not readily available. A naive implementation would call a lattice noise function several times to calculate its gradient, resulting in more computation than is strictly necessary. Unlike these noises, simulation noise has a geometric rationale in addition to its mathematical properties. It simulates vortices scattered in space, to produce its pleasing aesthetic. == Curl noise == The vector field is created as follows, for every point (x,y,z) in the space a vector field G is created, every component x, y and z of the vector field (Gx, Gy, Gz) is defined by a 3D perlin or simplex noise function with x, y and z as parameters. The partial derivative of Gx, Gy, and Gz respect to x, y and z is obtained with the gradient of the perlin or simplex noise by finite differences of implicit calculation inside the simplex noise. The partial derivatives are used to calculate F as the curl of G given by F = ( ∂ G z ∂ y − ∂ G y ∂ z , ∂ G x ∂ z − ∂ G z ∂ x , ∂ G y ∂ x − ∂ G x ∂ y ) {\displaystyle F=({\frac {\partial Gz}{\partial y}}-{\frac {\partial Gy}{\partial z}},{\frac {\partial Gx}{\partial z}}-{\frac {\partial Gz}{\partial x}},{\frac {\partial Gy}{\partial x}}-{\frac {\partial Gx}{\partial y}})} == Bitangent noise == This method is based in the fact that the curl of the gradient of scalar field is zero and the identity that expand the divergence of a cross product of two vectors A and B as the difference of the dot products of each vector with the curl of the other: ∇ × ( ∇ φ ) = 0 . {\displaystyle \nabla \times (\nabla \varphi )=\mathbf {0} .} ∇ ⋅ ( A × B ) = ( ∇ × A ) ⋅ B − A ⋅ ( ∇ × B ) {\displaystyle \nabla \cdot (\mathbf {A} \times \mathbf {B} )=\ (\nabla {\times }\mathbf {A} )\cdot \mathbf {B} \,-\,\mathbf {A} \cdot (\nabla {\times }\mathbf {B} )} which means that if the curl of both vector fields is zero then the divergence of the product of two vectors that are the gradients of scalar fields is zero too. This result in a divergence free vector field by construction only calling two noise functions to create the scalar fields. The vector field es created as follows, two scalar fields are calculated ϕ {\displaystyle \phi } and ψ {\displaystyle \psi } using 3D perlin or simplex noise functions, then the gradients A and B of each of this fields is calculated, the cross product of A and B gives a divergence free vector field. == Signed distance noise == The vector field is created based on a closed and differentiable implicit surface S = F(x,y,z) = 0. For every point in the space, frequently outside or near the surface, we get a vector g that is normal to the surface, this is the gradient of S or the partial derivatives respect to x, y and z, this vector is not unitary, but we can get a unitary normal n by dividing each component of the point by the magnitude of the gradient g. Outside of the surface all these normals point away from the surface. g = ∇ F ( x , y , z ) = ( ∂ F ∂ x , ∂ F ∂ y , ∂ F ∂ z ) {\displaystyle g=\nabla F(x,y,z)=\left({\frac {\partial F}{\partial x}},{\frac {\partial F}{\partial y}},{\frac {\partial F}{\partial z}}\right)} n = g ( x , y , z ) ‖ ∇ F ( x , y , z ) ‖ {\displaystyle \mathbf {n} ={\frac {g(x,y,z)}{\|\nabla F(x,y,z)\|}}} ‖ ∇ F ( x , y , z ) ‖ = ( ∂ F ∂ x ) 2 + ( ∂ F ∂ y ) 2 + ( ∂ F ∂ z ) 2 {\displaystyle \|\nabla F(x,y,z)\|={\sqrt {\left({\frac {\partial F}{\partial x}}\right)^{2}+\left({\frac {\partial F}{\partial y}}\right)^{2}+\left({\frac {\partial F}{\partial z}}\right)^{2}}}} Afterwards we calculate a scalar value p for that point in the space using a 3D perlin or simplex noise function. Now we create a vector field V = pn pointing outside of the surface. The curl of this vector field gives the direction in every point in the space where the particles should move. S D N = ( ∂ V z ∂ y − ∂ V y ∂ z , ∂ V x ∂ z − ∂ V z ∂ x , ∂ V y ∂ x − ∂ V x ∂ y ) {\displaystyle SDN=({\frac {\partial Vz}{\partial y}}-{\frac {\partial Vy}{\partial z}},{\frac {\partial Vx}{\partial z}}-{\frac {\partial Vz}{\partial x}},{\frac {\partial Vy}{\partial x}}-{\frac {\partial Vx}{\partial y}})} By construction this vector SDN will point in a tangent direction to an isosurface at the level of the signed distance to the original surface and can be used to confine the movements of the particles to stay in that surface.

    Read more →
  • Algorithmic bias

    Algorithmic bias

    Algorithmic bias describes systematic and repeatable harmful tendency in a computerized sociotechnical system to create "unfair" outcomes, such as "privileging" one category over another in ways that may or may not be different from the intended function of the algorithm. Bias can emerge from many factors, including intentionally biased design decisions or the unintended or unanticipated use or decisions relating to the way data is coded, collected, selected or used to train the algorithm. For example, algorithmic bias has been observed in search engine results and social media platforms. This bias can have impacts ranging from privacy violations to reinforcing social biases of race, gender, sexuality, and ethnicity. The study of algorithmic bias is most concerned with algorithms that reflect "systematic and unfair" discrimination. This bias has only recently been addressed in legal frameworks, such as the European Union's General Data Protection Regulation (enforced in 2018) and the Artificial Intelligence Act (proposed in 2021 and adopted in 2024). As algorithms expand their ability to organize society, politics, institutions, and behavior, sociologists have become concerned with the ways in which unanticipated output and manipulation of data can impact the physical world. Because algorithms are often considered to be neutral and unbiased, they can inaccurately project greater authority than human expertise (in part due to the psychological phenomenon of automation bias), and in some cases, reliance on algorithms can displace human responsibility for their outcomes, without last mile thinking. Bias can enter into algorithmic systems as a result of pre-existing cultural, social, or institutional expectations; by how features and labels are chosen; because of technical limitations of their design; or by being used in unanticipated contexts or by audiences who are not considered in the software's initial design. Algorithmic bias has been cited in cases ranging from election outcomes to the spread of online hate speech. It has also arisen in criminal justice, healthcare, and hiring, compounding existing racial, socioeconomic, and gender biases. The relative inability of facial recognition technology to accurately identify darker-skinned faces has been linked to multiple wrongful arrests of black men, an issue stemming from imbalanced datasets. Problems in understanding, researching, and discovering algorithmic bias persist due to the proprietary nature of algorithms, which are typically treated as trade secrets. Even when full transparency is provided, the complexity of certain algorithms poses a barrier to understanding their functioning. Furthermore, algorithms may change, or respond to input or output in ways that cannot be anticipated or easily reproduced for analysis. In many cases, even within a single website or application, there is no single "algorithm" to examine, but a network of many interrelated programs and data inputs, even between users of the same service. A 2021 survey identified multiple forms of algorithmic bias, including historical, representation, and measurement biases, each of which can contribute to unfair outcomes. == Definitions == Algorithms are difficult to define, but may be generally understood as lists of instructions that determine how programs read, collect, process, and analyze data to generate a usable output. For a rigorous technical introduction, see Algorithms. Advances in computer hardware and software have led to an increased capability to process, store and transmit data. This has in turn made the design and adoption of technologies such as machine learning and artificial intelligence technically and commercially feasible. By analyzing and processing data, algorithms are the backbone of search engines, social media websites, recommendation engines, online retail, online advertising, and more. Contemporary social scientists are concerned with algorithmic processes embedded into hardware and software applications because of their political and social impact, and question the underlying assumptions of an algorithm's neutrality. The term algorithmic bias describes systematic and repeatable errors that create unfair outcomes, such as privileging one arbitrary group of users over others. For example, a credit score algorithm may deny a loan without being unfair, if it is consistently weighing relevant financial criteria. If the algorithm recommends loans to one group of users, but denies loans to another set of nearly identical users based on unrelated criteria, and if this behavior can be repeated across multiple occurrences, an algorithm can be described as biased. This bias may be intentional or unintentional (for example, it can come from biased data obtained from a worker that previously did the job the algorithm is going to do from now on). == Methods == Bias can be introduced to an algorithm in several ways. During the assemblage of a dataset, data may be collected, digitized, adapted, and entered into a database according to human-designed cataloging criteria. Next, programmers assign priorities, or hierarchies, for how a program assesses and sorts that data. This requires human decisions about how data is categorized, and which data is included or discarded. Some algorithms collect their own data based on human-selected criteria, which can also reflect the bias of human designers. Other algorithms may reinforce stereotypes and preferences as they process and display "relevant" data for human users, for example, by selecting information based on previous choices of a similar user or group of users. Beyond assembling and processing data, bias can emerge as a result of design. For example, algorithms that determine the allocation of resources or scrutiny (such as determining school placements) may inadvertently discriminate against a category when determining risk based on similar users (as in credit scores). Meanwhile, recommendation engines that work by associating users with similar users, or that make use of inferred marketing traits, might rely on inaccurate associations that reflect broad ethnic, gender, socio-economic, or racial stereotypes. Another example comes from determining criteria for what is included and excluded from results. These criteria could present unanticipated outcomes for search results, such as with flight-recommendation software that omits flights that do not follow the sponsoring airline's flight paths. Algorithms may also display an uncertainty bias, offering more confident assessments when larger data sets are available. This can skew algorithmic processes toward results that more closely correspond with larger samples, which may disregard data from underrepresented populations. == History == === Early critiques === The earliest computer programs were designed to mimic human reasoning and deductions, and were deemed to be functioning when they successfully and consistently reproduced that human logic. In his 1976 book Computer Power and Human Reason, artificial intelligence pioneer Joseph Weizenbaum suggested that bias could arise both from the data used in a program, but also from the way a program is coded. Weizenbaum wrote that programs are a sequence of rules created by humans for a computer to follow. By following those rules consistently, such programs "embody law", that is, enforce a specific way to solve problems. The rules a computer follows are based on the assumptions of a computer programmer for how these problems might be solved. That means the code could incorporate the programmer's imagination of how the world works, including their biases and expectations. While a computer program can incorporate bias in this way, Weizenbaum also noted that any data fed to a machine additionally reflects "human decision making processes" as data is being selected. Finally, he noted that machines might also transfer good information with unintended consequences if users are unclear about how to interpret the results. Weizenbaum warned against trusting decisions made by computer programs that a user doesn't understand, comparing such faith to a tourist who can find his way to a hotel room exclusively by turning left or right on a coin toss. Crucially, the tourist has no basis of understanding how or why he arrived at his destination, and a successful arrival does not mean the process is accurate or reliable. An early example of algorithmic bias resulted in as many as 60 women and ethnic minorities denied entry to St. George's Hospital Medical School per year from 1982 to 1986, based on implementation of a new computer-guidance assessment system that denied entry to women and men with "foreign-sounding names" based on historical trends in admissions. While many schools at the time employed similar biases in their selection process, St. George was most notable for automating said bias through the use of an algorithm, thus gaining the attention of people on a much

    Read more →
  • Rademacher complexity

    Rademacher complexity

    In computational learning theory (machine learning and theory of computation), Rademacher complexity, named after Hans Rademacher, measures richness of a class of sets with respect to a probability distribution. The concept can also be extended to real valued functions. == Definitions == === Rademacher complexity of a set === Given a set A ⊆ R m {\displaystyle A\subseteq \mathbb {R} ^{m}} , the Rademacher complexity of A is defined as follows: Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]} where σ 1 , σ 2 , … , σ m {\displaystyle \sigma _{1},\sigma _{2},\dots ,\sigma _{m}} are independent random variables drawn from the Rademacher distribution i.e. Pr ( σ i = + 1 ) = Pr ( σ i = − 1 ) = 1 / 2 {\displaystyle \Pr(\sigma _{i}=+1)=\Pr(\sigma _{i}=-1)=1/2} for i ∈ { 1 , 2 , … , m } {\displaystyle i\in \{1,2,\dots ,m\}} , and a = ( a 1 , … , a m ) ∈ A {\displaystyle a=(a_{1},\ldots ,a_{m})\in A} . Some authors take the absolute value of the sum before taking the supremum, but if A {\displaystyle A} is symmetric this makes no difference. === Rademacher complexity of a function class === Let S = { z 1 , z 2 , … , z m } ⊆ Z {\displaystyle S=\{z_{1},z_{2},\dots ,z_{m}\}\subseteq Z} be a sample of points and consider a function class F {\displaystyle {\mathcal {F}}} of real-valued functions over Z {\displaystyle Z} . Then, the empirical Rademacher complexity of F {\displaystyle {\mathcal {F}}} given S {\displaystyle S} is defined as: Rad S ⁡ ( F ) = 1 m E σ [ sup f ∈ F | ∑ i = 1 m σ i f ( z i ) | ] {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{f\in {\mathcal {F}}}\left|\sum _{i=1}^{m}\sigma _{i}f(z_{i})\right|\right]} This can also be written using the previous definition: Rad S ⁡ ( F ) = Rad ⁡ ( F ∘ S ) {\displaystyle \operatorname {Rad} _{S}({\mathcal {F}})=\operatorname {Rad} ({\mathcal {F}}\circ S)} where F ∘ S {\displaystyle {\mathcal {F}}\circ S} denotes function composition, i.e.: F ∘ S := { ( f ( z 1 ) , … , f ( z m ) ) ∣ f ∈ F } {\displaystyle {\mathcal {F}}\circ S:=\{(f(z_{1}),\ldots ,f(z_{m}))\mid f\in {\mathcal {F}}\}} The worst case empirical Rademacher complexity is Rad ¯ m ( F ) = sup S = { z 1 , … , z m } Rad S ⁡ ( F ) {\displaystyle {\overline {\operatorname {Rad} }}_{m}({\mathcal {F}})=\sup _{S=\{z_{1},\dots ,z_{m}\}}\operatorname {Rad} _{S}({\mathcal {F}})} Let P {\displaystyle P} be a probability distribution over Z {\displaystyle Z} . The Rademacher complexity of the function class F {\displaystyle {\mathcal {F}}} with respect to P {\displaystyle P} for sample size m {\displaystyle m} is: Rad P , m ⁡ ( F ) := E S ∼ P m [ Rad S ⁡ ( F ) ] {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}}):=\mathbb {E} _{S\sim P^{m}}\left[\operatorname {Rad} _{S}({\mathcal {F}})\right]} where the above expectation is taken over an identically independently distributed (i.i.d.) sample S = ( z 1 , z 2 , … , z m ) {\displaystyle S=(z_{1},z_{2},\dots ,z_{m})} generated according to P {\displaystyle P} . == Intuition == The Rademacher complexity is typically applied on a function class of models that are used for classification, with the goal of measuring their ability to classify points drawn from a probability space under arbitrary labellings. When the function class is rich enough, it contains functions that can appropriately adapt for each arrangement of labels, simulated by the random draw of σ i {\displaystyle \sigma _{i}} under the expectation, so that this quantity in the sum is maximized. The Rademacher complexity of a set A {\displaystyle A} can be rewritten as Rad ⁡ ( A ) := 1 m E σ [ sup a ∈ A ∑ i = 1 m σ i a i ] = 1 m 2 m ∑ σ ∈ { − 1 / m , + 1 / m } m [ sup a ∈ A ⟨ σ , a ⟩ ] . {\displaystyle \operatorname {Rad} (A):={\frac {1}{m}}\mathbb {E} _{\sigma }\left[\sup _{a\in A}\sum _{i=1}^{m}\sigma _{i}a_{i}\right]={\frac {1}{{\sqrt {m}}2^{m}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}}\left[\sup _{a\in A}\langle \sigma ,a\rangle \right].} Each term in the summation is the farthest distance of the set A {\displaystyle A} from the origin, along a unit-length direction σ {\displaystyle \sigma } . The directions are along the vertices of a hypercube. Thus, we can also write it as Rad ⁡ ( A ) = 1 2 m 1 2 m − 1 ∑ σ ∈ { − 1 / m , + 1 / m } m / { − 1 , + 1 } [ sup a ∈ A ⟨ σ , a ⟩ − inf a ∈ A ⟨ σ , a ⟩ ] {\displaystyle \operatorname {Rad} (A)={\frac {1}{2{\sqrt {m}}}}{\frac {1}{2^{m-1}}}\sum _{\sigma \in \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}}\left[\sup _{a\in A}\langle \sigma ,a\rangle -\inf _{a\in A}\langle \sigma ,a\rangle \right]} Here, the set { − 1 / m , + 1 / m } m / { − 1 , + 1 } {\displaystyle \{-1/{\sqrt {m}},+1/{\sqrt {m}}\}^{m}/\{-1,+1\}} denotes half of the vertices of a hypercube, selected so that each diagonal has exactly one vertex selected. In words, this states that 2 m Rad ⁡ ( A ) {\displaystyle 2{\sqrt {m}}\operatorname {Rad} (A)} is precisely the average width of the set A {\displaystyle A} along all diagonal directions of a hypercube. == Examples == A singleton set has 0 width in any direction, so it has Rademacher complexity 0. The set A = { ( 1 , 1 ) , ( 1 , 2 ) } ⊆ R 2 {\displaystyle A=\{(1,1),(1,2)\}\subseteq \mathbb {R} ^{2}} has average width 1 / 2 {\displaystyle 1/{\sqrt {2}}} along the two diagonal directions of the square, so it has Rademacher complexity 1 / 4 {\displaystyle 1/4} . The unit cube [ 0 , 1 ] m {\displaystyle [0,1]^{m}} has constant width m {\displaystyle {\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / 2 {\displaystyle 1/2} . Similarly, the unit cross-polytope { x ∈ R m : ‖ x ‖ 1 ≤ 1 } {\displaystyle \{x\in \mathbb {R} ^{m}:\|x\|_{1}\leq 1\}} has constant width 2 / m {\displaystyle 2/{\sqrt {m}}} along the diagonal directions, so it has Rademacher complexity 1 / m {\displaystyle 1/m} . == Using the Rademacher complexity == The Rademacher complexity can be used to derive data-dependent upper-bounds on the learnability of function classes. Intuitively, a function-class with smaller Rademacher complexity is easier to learn. === Bounding the representativeness === In machine learning, it is desired to have a training set that represents the true distribution of some sample data S {\displaystyle S} . This can be quantified using the notion of representativeness. Denote by P {\displaystyle P} the probability distribution from which the samples are drawn. Denote by H {\displaystyle H} the set of hypotheses (potential classifiers) and denote by F {\displaystyle {\mathcal {F}}} the corresponding set of error functions, i.e., for every hypothesis h ∈ H {\displaystyle h\in H} , there is a function f h ∈ F {\displaystyle f_{h}\in F} , that maps each training sample (features,label) to the error of the classifier h {\displaystyle h} (note in this case hypothesis and classifier are used interchangeably). For example, in the case that h {\displaystyle h} represents a binary classifier, the error function is a 0–1 loss function, i.e. the error function f h {\displaystyle f_{h}} returns 0 if h {\displaystyle h} correctly classifies a sample and 1 else. We omit the index and write f {\displaystyle f} instead of f h {\displaystyle f_{h}} when the underlying hypothesis is irrelevant. Define: L P ( f ) := E z ∼ P [ f ( z ) ] {\displaystyle L_{P}(f):=\mathbb {E} _{z\sim P}[f(z)]} – the expected error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the real distribution P {\displaystyle P} ; L S ( f ) := 1 m ∑ i = 1 m f ( z i ) {\displaystyle L_{S}(f):={1 \over m}\sum _{i=1}^{m}f(z_{i})} – the estimated error of some error function f ∈ F {\displaystyle f\in {\mathcal {F}}} on the sample S {\displaystyle S} . The representativeness of the sample S {\displaystyle S} , with respect to P {\displaystyle P} and F {\displaystyle {\mathcal {F}}} , is defined as: Rep P ⁡ ( F , S ) := sup f ∈ F ( L P ( f ) − L S ( f ) ) {\displaystyle \operatorname {Rep} _{P}({\mathcal {F}},S):=\sup _{f\in F}(L_{P}(f)-L_{S}(f))} Smaller representativeness is better, since it provides a way to avoid overfitting: it means that the true error of a classifier is not much higher than its estimated error, and so selecting a classifier that has low estimated error will ensure that the true error is also low. Note however that the concept of representativeness is relative and hence can not be compared across distinct samples. The expected representativeness of a sample can be bounded above by the Rademacher complexity of the function class: If F {\displaystyle {\mathcal {F}}} is a set of functions with range within [ 0 , 1 ] {\displaystyle [0,1]} , then Rad P , m ⁡ ( F ) − ln ⁡ 2 2 m ≤ E S ∼ P m [ Rep P ⁡ ( F , S ) ] ≤ 2 Rad P , m ⁡ ( F ) {\displaystyle \operatorname {Rad} _{P,m}({\mathcal {F}})-{\sqrt {\frac {\ln 2}{2m}}}\leq \mathbb {E} _{S\sim P^{m}}[\operatorname {Rep} _{P}({\

    Read more →
  • Manifold hypothesis

    Manifold hypothesis

    The manifold hypothesis posits that many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space. As a consequence of the manifold hypothesis, many data sets that appear to initially require many variables to describe, can actually be described by a comparatively small number of variables, linked to the local coordinate system of the underlying manifold. It is suggested that this principle underpins the effectiveness of machine learning algorithms in describing high-dimensional data sets by considering a few common features. The manifold hypothesis is related to the effectiveness of nonlinear dimensionality reduction techniques in machine learning. Many techniques of dimensional reduction make the assumption that data lies along a low-dimensional submanifold, such as manifold sculpting, manifold alignment, and manifold regularization. The major implications of this hypothesis is that Machine learning models only have to fit relatively simple, low-dimensional, highly structured subspaces within their potential input space (latent manifolds). Within one of these manifolds, it's always possible to interpolate between two inputs, that is to say, morph one into another via a continuous path along which all points fall on the manifold. The ability to interpolate between samples is the key to generalization in deep learning. == The information geometry of statistical manifolds == An empirically-motivated approach to the manifold hypothesis focuses on its correspondence with an effective theory for manifold learning under the assumption that robust machine learning requires encoding the dataset of interest using methods for data compression. This perspective gradually emerged using the tools of information geometry thanks to the coordinated effort of scientists working on the efficient coding hypothesis, predictive coding and variational Bayesian methods. The argument for reasoning about the information geometry on the latent space of distributions rests upon the existence and uniqueness of the Fisher information metric. In this general setting, we are trying to find a stochastic embedding of a statistical manifold. From the perspective of dynamical systems, in the big data regime this manifold generally exhibits certain properties such as homeostasis: We can sample large amounts of data from the underlying generative process. Machine Learning experiments are reproducible, so the statistics of the generating process exhibit stationarity. In a sense made precise by theoretical neuroscientists working on the free energy principle, the statistical manifold in question possesses a Markov blanket.

    Read more →
  • Data exploration

    Data exploration

    Data exploration is an approach similar to initial data analysis, whereby a data analyst uses visual exploration to understand what is in a dataset and the characteristics of the data, rather than through traditional data management systems. These characteristics can include size or amount of data, completeness of the data, correctness of the data, possible relationships amongst data elements or files/tables in the data. Data exploration is typically conducted using a combination of automated and manual activities. Automated activities can include data profiling or data visualization or tabular reports to give the analyst an initial view into the data and an understanding of key characteristics. This is often followed by manual drill-down or filtering of the data to identify anomalies or patterns identified through the automated actions. Data exploration can also require manual scripting and queries into the data (e.g. using languages such as SQL or R) or using spreadsheets or similar tools to view the raw data. All of these activities are aimed at creating a mental model and understanding of the data in the mind of the analyst, and defining basic metadata (statistics, structure, relationships) for the data set that can be used in further analysis. Once this initial understanding of the data is had, the data can be pruned or refined by removing unusable parts of the data (data cleansing), correcting poorly formatted elements and defining relevant relationships across datasets. This process is also known as determining data quality. Data exploration can also refer to the ad hoc querying or visualization of data to identify potential relationships or insights that may be hidden in the data and does not require to formulate assumptions beforehand. Traditionally, this had been a key area of focus for statisticians, with John Tukey being a key evangelist in the field. Today, data exploration is more widespread and is the focus of data analysts and data scientists; the latter being a relatively new role within enterprises and larger organizations. == Interactive Data Exploration == This area of data exploration has become an area of interest in the field of machine learning. This is a relatively new field and is still evolving. As its most basic level, a machine-learning algorithm can be fed a data set and can be used to identify whether a hypothesis is true based on the dataset. Common machine learning algorithms can focus on identifying specific patterns in the data. Many common patterns include regression and classification or clustering, but there are many possible patterns and algorithms that can be applied to data via machine learning. By employing machine learning, it is possible to find patterns or relationships in the data that would be difficult or impossible to find via manual inspection, trial and error or traditional exploration techniques. == Software == Trifacta – a data preparation and analysis platform Paxata – self-service data preparation software Alteryx – data blending and advanced data analytics software Microsoft Power BI - interactive visualization and data analysis tool OpenRefine - a standalone open source desktop application for data clean-up and data transformation Tableau software – interactive data visualization software

    Read more →
  • Hallucination (artificial intelligence)

    Hallucination (artificial intelligence)

    In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting, confabulation, or delusion) is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where a hallucination typically involves false percepts. For example, a chatbot powered by large language models (LLMs), like ChatGPT, may embed plausible-sounding random falsehoods within its generated content. Detecting and mitigating errors and hallucinations pose significant challenges for practical deployment and reliability of LLMs in high-stakes scenarios, such as chip design, supply chain logistics, and medical diagnostics. Some software engineers and statisticians have criticized the specific term "AI hallucination" for unreasonably anthropomorphizing computers. Symbolic artificial intelligence models generally do not produce hallucinations, unlike large language models. == Term == === Origin === Since the 1980s, the term "hallucination" has been used in computer vision with a positive connotation to describe the process of adding detail to an image. For example, the task of generating high-resolution face images from low-resolution inputs is called face hallucination. The first documented use of the term "hallucination" in this sense is in the PhD thesis of Eric Mjolsness in 1986. A notable work is the face hallucination algorithm by Simon Baker and Takeo Kanade published in 1999. In the 2000s, hallucinations were described in statistical machine translation as a failure mode. Since the 2010s, the term has undergone a semantic shift to signify the generation of factually incorrect or misleading outputs by AI systems in tasks like machine translation and object detection. In 2015, hallucinations were identified in visual semantic role labeling tasks by Saurabh Gupta and Jitendra Malik. In 2015, computer scientist Andrej Karpathy used the term "hallucinated" in a blog post to describe his recurrent neural network (RNN) language model generating an incorrect citation link. In 2017, Google researchers used the term to describe the responses generated by neural machine translation (NMT) models when they are not related to the source text, and in 2018, the term was used in computer vision to describe instances where non-existent objects are erroneously detected because of adversarial attacks. In July 2021, Meta warned during its release of BlenderBot 2 that the system is prone to "hallucinations", which Meta defined as "confident statements that are not true". Following OpenAI's ChatGPT release in beta version in November 2022, some users complained that such chatbots often seem to pointlessly embed plausible-sounding random falsehoods within their generated content. Many news outlets, including The New York Times, started to use the term "hallucinations" to describe these models' frequently incorrect or inconsistent responses. In 2023, the Cambridge dictionary updated its definition of hallucination to include this new sense specific to the field of AI. Some researchers have highlighted a lack of consistency in how the term is used, but also identified several alternative terms in the literature, such as confabulations, fabrications, and factual errors. === Definitions and alternatives === Uses, definitions and characterizations of the term "hallucination" in the context of LLMs include: "a tendency to invent facts in moments of uncertainty" (OpenAI, May 2023) "a model's logical mistakes" (OpenAI, May 2023) "fabricating information entirely, but behaving as if spouting facts" (CNBC, May 2023) "making up information" (The Verge, February 2023) "probability distributions" (in scientific contexts) Journalist Benj Edwards, in Ars Technica, writes that the term "hallucination" is controversial, but that some form of metaphor remains necessary; Edwards suggests "confabulation" as an analogy for processes that involve "creative gap-filling". In July 2024, a White House report on fostering public trust in AI research mentioned hallucinations only in the context of reducing them. Notably, when acknowledging David Baker's Nobel Prize-winning work with AI-generated proteins, the Nobel committee avoided the term entirely, instead referring to "imaginative protein creation". Hicks, Humphries, and Slater, in their article in Ethics and Information Technology, argue that the output of LLMs is "bullshit" under Harry Frankfurt's definition of the term, and that the models are "in an important way indifferent to the truth of their outputs", with true statements only accidentally true, and false ones accidentally false. Some researchers also use the derogatory term "botshit", often referring to uncritical use of AI. === Criticism === In the scientific community, some researchers avoid the term "hallucination", seeing it as potentially misleading. It has been criticized by Usama Fayyad, executive director of the Institute for Experimental Artificial Intelligence at Northeastern University, on the grounds that it misleadingly personifies large language models and is vague. Mary Shaw said, "The current fashion for calling generative AI's errors 'hallucinations' is appalling. It anthropomorphizes the software, and it spins actual errors as somehow being idiosyncratic quirks of the system even when they're objectively incorrect." In Salon, statistician Gary Smith argues that LLMs "do not understand what words mean" and consequently that the term "hallucination" unreasonably anthropomorphizes the machine. Murray Shanahan argues that anthropomorphic framing of LLM capabilities, including terms like "hallucination", encourages users and researchers to attribute cognitive processes to systems that operate through statistical pattern completion, and advocates for more careful linguistic practices when discussing LLM behavior. Kristina Šekrst argues that applying psychological vocabulary to LLM outputs obscures the difference between the appearance of mental properties and their genuine presence. Förster & Skop assert that tech companies use the hallucination metaphor to anthropomorphize models and deflect responsibility for non-factual outputs. Some see the AI outputs not as illusory but as prospective—that is, having some chance of being true, similar to early-stage scientific conjectures. The term has also been criticized for its association with psychedelic drug experiences. == In natural language generation == In natural language generation, there are several reasons why natural language models hallucinate: === Hallucination from data === Hallucinations can stem from incomplete, inaccurate or unrepresentative data sets. === Modeling-related causes === The pre-training of generative pretrained transformers (GPT) involves predicting the next word. It incentivizes GPT models to "give a guess" about what the next word is, even when they lack information. Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension between novelty and usefulness. For instance, Amabile and Pratt define human creativity as the production of novel and useful ideas. By extension, a focus on novelty in machine creativity can lead to the production of original but inaccurate responses—that is, falsehoods—whereas a focus on usefulness may result in memorized content lacking originality. By 2022, newspapers such as The New York Times expressed concern that, as the adoption of bots based on large language models continued to grow, unwarranted user confidence in bot output could lead to problems. === Interpretability research === In 2025, interpretability research by Anthropic on the LLM Claude identified internal circuits that cause it to decline to answer questions unless it knows the answer. By default, the circuit is active and the LLM doesn't answer. When the LLM has sufficient information, these circuits are inhibited and the LLM answers the question. Hallucinations were found to occur when this inhibition happens incorrectly, such as when Claude recognizes a name but lacks sufficient information about that person, causing it to generate plausible but untrue responses. === Examples === On 15 November 2022, researchers from Meta AI published Galactica, designed to "store, combine and reason about scientific knowledge". Content generated by Galactica came with the warning: "Outputs may be unreliable! Language Models are prone to hallucinate text." In one case, when asked to draft a paper on creating avatars, Galactica cited a fictitious paper from a real author who works in the relevant area. Meta withdrew Galactica on 17 November due to offensiveness and inaccuracy. OpenAI's ChatGPT, released in beta version to the public on November 30, 2022, was based on the foundation model GPT-3.5 (a revision of GPT-3). Professor Ethan Mollick of Wharton called it an "omniscient, eager-to-please intern who sometimes lies to you". Data scientist Teresa Kuba

    Read more →
  • Artificial intimacy

    Artificial intimacy

    Artificial intimacy is a form of human-AI interaction in which an individual will form social connections, emotional bonds, or intimate relationships with various forms of artificial intelligence, including chatbots, virtual assistants, and other artificial entities. Artificially intimate relationships include not only romances, but parasocial relationships with virtual AI characters and the use of griefbots trained on a dead or otherwise lost individual. Artificial intimacy can arise because humans are prone to anthropomorphism. Responses from these AI models are often designed to simulate human interaction. Individuals experiencing artificial intimacy may exhibit attachment, love and commitment to certain AI models, akin to the bonds typically shared between humans. == Causes == === Perceived responsiveness === Robin Dunbar famously proposed that due to emergence of larger groups of humans, vocal communication and language in humans evolved to replace grooming as a means of bonding, arguing that language was a more efficient way to maintain and strengthen social bonds across wider social settings and networks. Further research in this field leads many psychologists to agree that social cognition, affiliative bonding and language in humans are deeply connected. The interpersonal model of intimacy considers communication to be key in affiliative bonding, suggesting that intimacy develops and deepens through open communication between partners in relationship. Specifically, when individuals communicate emotions and perceive their partner as responsive and caring, feelings of closeness and connection are enhanced, building intimacy. Social penetration theory also aligns with the idea of communication being central to intimacy, by explaining how interpersonal relationships develop through gradual increases in self-disclosure. When the benefits of emotional bonding outweigh the costs of vulnerability, individuals will partake in self-disclosure, opening up to one another. Thereby, the literature can be used to provide a proximate explanation for the emergence of artificial intimacy to understand how the phenomenon occurs. Artificial entities are able to mimic interpersonal communication between humans, which in turn can simulate sensations of intimacy within human users though a perceived sense of responsiveness. The relationship between human and AI does not come with the cost of vulnerability or social rejection, which may make self-disclosure easier than with other humans. Altogether, these factors may lead to the experience of anthropomorphism and formation of affiliative relationships. Skjuve et al's interview study on Replika chatbot users further aligns with this explanation, finding that users' perception of chatbots as "accepting, understanding and non-judgmental" facilitated relationship development between the AI and users, and the act of self-disclosure possibly strengthened relationships. Another study on Replika users' reviews and survey results found users perceived chatbots as emotional supportive companions. This evidence further suggests that the perception of artificial entities as capable of empathy and responsiveness in communication facilitate the development of intimate relationships between users and AI. === Loneliness and coping with negative emotions === Research has suggested that humans evolved social bonds as a result of evolutionary pressures that favored cooperation, information exchange and transmission, and group living. Many studies stress the presence of social bonds to be important for human living: research by Baumeister and Leary suggests that humans have a basic psychological need to form and maintain "strong, stable interpersonal relationships", and that a lack of social bonds or sense of belonging leads to negative psychological and physical outcomes. Eisenberger et al's study on the neuroimaging of brain activity suggests that human brains process social rejection and exclusion similarly to physical pain. Furthermore, Song et al's study found that lonely individuals tend to seek more connections in mediated environments, such as online platforms like Facebook. This was suggested to be as a means to reduce their offline loneliness from a lack of in-person interaction, while also fulfilling a need to communicate. Leading on from this, an ultimate explanation for why humans seek the perceived sense of connection from artificial intimacy is to fulfil an evolutionary need for bonding and belonging. Xie et al's study found loneliness to be a driving factor in chatbot interaction. Herbener and Damholdt's study on Danish high school students found that students who sought emotional support or engaged in reciprocal conversations with chatbots were significantly more lonely than their peers, perceived themselves as having less social support, and used the chatbots to cope with negative emotions. The aforementioned notion that chatbots were perceived to have a positive effect on users' negative emotions is also further supported by other studies. Skjuve et al's study found that chatbot relationships may have a positive effect on users' wellbeing. De Freitas et al ran several studies on the effect of chatbots on loneliness, consistently finding evidence suggesting that interaction with chatbots reduces loneliness in users: It was found that existing chatbot users used AI to alleviate loneliness, having an AI companion consistently reduced loneliness over the course of a week, and reductions in loneliness could be explained by chatbot performance—and specifically whether it was able to make users feel heard. Overall the evidence suggests an innate need for bonding evokes feelings of loneliness in users, who turn to artificial intimacy as a low-cost method alleviate these emotions. While many users report positive experiences, some researchers caution that pursuing artificial intimacy may lead to reduced social motivation, social substitution effects, withdrawal from real-life relationships and difficulty discerning reality from fantasy, which may increase longer-term loneliness and isolation. The long-term psychological and societal impacts remain under active investigation.

    Read more →
  • Confusion matrix

    Confusion matrix

    In machine learning, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. In unsupervised learning it is usually called a matching matrix. The term is used specifically in the problem of statistical classification. Each row of the matrix represents the instances in an actual class while each column represents the instances in a predicted class, or vice versa – both variants are found in the literature. The diagonal of the matrix therefore represents all instances that are correctly predicted. The name stems from the fact that it makes it easy to identify whether the system is confusing two classes (i.e., commonly mislabeling one class as another). The confusion matrix has its origins in human perceptual studies of auditory stimuli. It was adapted for machine learning studies and used by Frank Rosenblatt, among other early researchers, to compare human and machine classifications of visual (and later auditory) stimuli. It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions (each combination of dimension and class is a variable in the contingency table). == Example == Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows: Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9). Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column: The actual classification is positive and the predicted classification is positive (1,1). This is called a true positive result because the positive sample was correctly identified by the classifier. The actual classification is positive and the predicted classification is negative (1,0). This is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. The actual classification is negative and the predicted classification is positive (0,1). This is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. The actual classification is negative and the predicted classification is negative (0,0). This is called a true negative result because the negative sample gets correctly identified by the classifier. We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable. The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows: The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data. Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier: In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. P = T P + F N {\displaystyle P=TP+FN} and N = F P + T N {\displaystyle N=FP+TN} . == Table of confusion == In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly. For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer). According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC). Other metrics can be included in a confusion matrix, each of them having their significance and use. Some researchers have argued that the confusion matrix, and the metrics derived from it, do not truly reflect a model's knowledge. In particular, the confusion matrix cannot show whether correct predictions were reached through sound reasoning or merely by chance (a problem known in philosophy as epistemic luck). It also does not capture situations where the facts used to make a prediction later change or turn out to be wrong (defeasibility). This means that while the confusion matrix is a useful tool for measuring classification performance, it may give an incomplete picture of a model’s true reliability. == Confusion matrices with more than two categories == Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity. == Confusion matrices in multi-label and soft-label classification == Confusion matrices are not limited to single-label classification (where only one class is present) or hard-label settings (where classes are either fully present, 1, or absent, 0). They can also be extended to Multi-label classification (where multiple classes can be predicted at once) and soft-label classification (where classes can be partially present). One such extension is the Transport-based Confusion Matrix (TCM), which builds on the theory of optimal transport and the principle of maximum entropy. TCM applies to single-label, multi-label, and soft-label settings. It retains the familiar structure of the standard confusion matrix: a square matrix sized by the number of classes, with diagonal entries indicating correct predictions and off-diagonal entries indicating confusion. In the single-label case, TCM is identical to the standard confusion matrix. TCM follows the same reasoning as the standard confusion matrix: if class A is overestimated (its predicted value is greater than its label value) and class B is underestimated (its predicted value is less than its label value), A is considered confused with B, and the entry (B, A) is increased. If a class is both predicted and present, it is correctly identified, and the diagonal entry (A, A) increases. Optimal transport and maximum entropy are used to determine the extent to which these entries are updated. TCM enables clearer comparison between predictions and labels in complex classification tasks, while maintaining a consistent matrix format across settings.

    Read more →
  • Reference Software International

    Reference Software International

    Reference Software International, Inc. (RSI), was an American software developer active from 1985 to 1993 and based in Albuquerque, New Mexico, and San Francisco, California. The company released several productivity and reference software packages, including the Grammatik grammar checker, for MS-DOS. The company was acquired by WordPerfect Corporation in 1993. == History == === Background (1980–1985) === Reference Software International, Inc., was founded by Donald "Don" Emery and Bruce Wampler in 1985 in San Francisco, California. Both Wampler and Emery were college professors when they founded RSI: Wampler at the University of New Mexico as a professor of computer science and Emery a professor of marketing at San Francisco State University. After graduating from the University of Utah in around 1978, Wampler founded his first software company, Aspen Software, in Tijeras, New Mexico, in 1979. Wampler founded Aspen to develop an early spell checker software package, called Proofreader, for the TRS-80, licensing Random House's Webster's Unabridged Dictionary for the package's lexicon. In 1980, he began development on a grammar checker inspired by Writer's Workbench, a pioneering grammar checker for Unix systems. Wampler used Writer's Workbench heavily during the writer of his doctoral dissertation but disliked having to jump between the Apple II on which he composed the dissertation and the mainframe on which Writer's Workbench ran, and so wanted to develop a version of the latter for microcomputers. Wampler's work came to fruition as Grammatik in 1981, eventually ported to several other microcomputer platforms in the early 1980s. In 1983, by which point the company had 12 employees and sold a combined 80,000 units of Grammatik and Proofreader, Wampler sold Aspen to Dictronics, a software company best known for developing the Electronic Thesaurus, an early thesaurus program for microcomputers. Dictronics was in turn purchased by Wang Laboratories; according to Wampler, "Wang bought [Aspen] and sat on it. They did nothing with it". Wampler moved on to teach for the University of New Mexico, but, frustrated by Wang's inaction, got the urge to resurrect his work. In 1985, he was able to license back Grammatik and Proofreader from a small California-based software firm that had grandfathered rights to a forked version of both. In the same year, he met Emery, who, impressed by Wampler's, founded Reference Software International to market his software. RSI's research and development headquarters were based in Albuquerque, while the company's sales and marketing department was based in Walnut Creek, California. === Success (1985–1992) === In August 1985, RSI released their first product: the Random House Reference Set, a new version of Proofreader for the IBM Personal Computer and compatibles, revised to be a terminate-and-stay-resident program that ran atop other word processors such as WordStar or WordPerfect. At the time, Reference Set was the only such program on the market that functioned like this. RSI netted $114,000 from sales of Reference Set by the end of 1985. In June 1986, they released version 2.0 of Grammatik as Grammatik II for the PC. The latter was a breakout hit for RSI, receiving praise in the press (including technology journals such as PC Magazine) and RSI selling 1,000 units a month. In spring 1987, they released Reference Set II, which allowed users to import their own words into the built-in dictionary and added a thesaurus of 300,000 words. In November 1987, they released version 3.0 of Reference Set, which comprised two new field-specific dictionaries for the medical and legal professions. As well as the general Random House dictionary and thesaurus, it included Stedman's Medical Dictionary and Black's Law Dictionary. Emery consulted Paul Brest and Bob Jackson—professors of law at Stanford Law School and San Francisco State respectively—for the curation of the law dictionary; and Burton Grebin—at the time the executive director of Mount Saint Mary's Hospital—for the curation of the medical dictionary. In fall 1988, the company released Grammatik III, a total rewrite that made use of artificial intelligence to more accurately judge the grammar of sentences by breaking them down into a syntactic hierarchy. Grammatik III received universal acclaim, with Gloria Morris of InfoWorld calling it the apparent leader in the grammar checking field and Sandra Anderson of Mac Home Journal calling it "hands down ... the best of the industry" six years after its release. By 1989, the product had competitors in Correct Grammar by Lifetree Software and RightWriter by Rightsoft, Inc. By 1990, RSI achieved annual sales of $9.7 million. In the same year they released Grammatik IV, which was the first to offer direct integration with WordPerfect on both MS-DOS and Windows. In March 1992—by which point RSI had sold 1.5 million copies of Grammatik across all versions—the company released version 5 of the program, another rewrite that updated the lexicon further and added new functions such as word redundancy detection. Around the same time, the company introduced Easy Proof, a pared-down version of Grammatik intended for novice writers, students, and family computers. In 1991, the company was engaged in a trademark dispute with Systems Compatibility Corporation (SCC) of Chicago, Illinois, over the rights to the Software Toolkit title. Both companies had published software bundles bearing the name in the turn of the 1990s; SCC had published theirs first in 1988 and registered the trademark with the USPTO. SCC was granted a restraining order against RSI in January 1991. The following month, RSI agreed to rename their product, preventing a protracted legal battle. === Decline and acquisition (1992–1993) === By early 1992, RSI achieved annual sales of more than $13 million, employed 120 people, and had opened international offices in London, Belgium, and Antwerp to sell foreign versions of Reference Set and Grammatik. The company reached peak employment in the middle of 1992, with 140 employees. However, RSI's launch of six disparate titles in the year proved problematic for the company when they failed to sell as well as they had projected, and the company laid off employees by the dozens. By December 1992, only 71 employees were left, 32 from their San Francisco office. On the last day of 1992, RSI received an acquisition offer from WordPerfect Corporation, makers of the namesake word processor based in Orem, Utah. The deal was inked in January 1993, RSI's stakeholders receiving $19 million. The company's remaining employees were absorbed into WordPerfect in Orem. WordPerfect continued selling Grammatik as a standalone product for several years.

    Read more →
  • H2O (software)

    H2O (software)

    H2O is an open-source, in-memory, distributed machine learning and predictive analytics platform developed by the company H2O.ai (previously 0xdata). The software uses a distributed architecture for parallel processing on standard hardware. It supports algorithms for large-scale data analysis and model deployment. H2O is primarily used by data scientists and developers for statistical modeling and data-driven decision-making. The platform is designed to handle in-memory computations across a distributed computing environment. It offers implementations for numerous statistical and machine learning algorithms, which are accessible through various programming interfaces. The software is released under the Apache License 2.0. == Functionality and features == H2O provides a suite of supervised and unsupervised machine learning algorithms. Its core functions include: Supervised learning: algorithms in the field of statistics, data mining and machine learning such as generalized linear models, random forests, gradient boosting and deep learning are implemented for classification and regression tasks. Unsupervised learning: including K-Means clustering and principal component analysis. Automated machine learning: a features designed to automate the processes of model selection, tuning, and ensemble creation. The software can ingest data from various sources, including the Hadoop Distributed File System, Amazon S3, SQL databases, as well as local file systems. It operates natively on Apache Spark clusters through Sparkling Water. Proponents claim that improved performance is achieved compared to other analysis tools. The software is distributed free of charge, under a business model based on the development of individual applications and support. == Architecture == H2O is primarily written in Java. It uses a distributed architecture that allows the platform to cluster nodes for parallel processing and in-memory storage of data and models. Users interact with the H2O platform through several primary interfaces: Programming language interfaces: APIs are provided for the R and Python programming languages, and various Apache offerings (Apache Hadoop and Spark, as well as Maven). H2O Flow: a graphical web-based interactive computational environment that functions as a notebook interface for data exploration, model building, and scripting. REST-API: allows for integration with other applications and frameworks such as Microsoft Excel or RStudio. With the H2O Machine Learning Integration Nodes, KNIME offers algorithmic workflows. While the algorithm executes, approximate results are displayed, so that users can track the progress and intervene if needed. == History, influences, and extensions == The software project was initiated by the company 0xdata, which later changed its name to H2O.ai. The three Stanford professors Stephen P. Boyd, Robert Tibshirani and Trevor Hastie form a panel that advises H2O on scientific issues. Since its inception, H2O provides open-source machine learning libraries for enterprise use. The core H2O platform is often complemented by offerings from H2O.ai, such as H2O Driverless AI. == Reception == H2O is referenced in peer-reviewed literature regarding automated machine learning (AutoML). The platform has been categorized as a "Leader" and a "Strong Performer" in industry reports by Forrester Research. H2O (the open-source platform) and the associated commercial platform Driverless AI have been recurring winners of InfoWorld's most prestigious awards, including both the Best of Open Source Software ("Bossies") and the Technology of the Year awards.

    Read more →
  • Evolvability (computer science)

    Evolvability (computer science)

    The term evolvability is a framework of computational learning introduced by Leslie Valiant in his paper of the same name. The aim of this theory is to model biological evolution and categorize which types of mechanisms are evolvable. Evolution is an extension of PAC learning and learning from statistical queries. == General framework == Let F n {\displaystyle F_{n}\,} and R n {\displaystyle R_{n}\,} be collections of functions on n {\displaystyle n\,} variables. Given an ideal function f ∈ F n {\displaystyle f\in F_{n}} , the goal is to find by local search a representation r ∈ R n {\displaystyle r\in R_{n}} that closely approximates f {\displaystyle f\,} . This closeness is measured by the performance Perf ⁡ ( f , r ) {\displaystyle \operatorname {Perf} (f,r)} of r {\displaystyle r\,} with respect to f {\displaystyle f\,} . As is the case in the biological world, there is a difference between genotype and phenotype. In general, there can be multiple representations (genotypes) that correspond to the same function (phenotype). That is, for some r , r ′ ∈ R n {\displaystyle r,r'\in R_{n}} , with r ≠ r ′ {\displaystyle r\neq r'\,} , still r ( x ) = r ′ ( x ) {\displaystyle r(x)=r'(x)\,} for all x ∈ X n {\displaystyle x\in X_{n}} . However, this need not be the case. The goal then, is to find a representation that closely matches the phenotype of the ideal function, and the spirit of the local search is to allow only small changes in the genotype. Let the neighborhood N ( r ) {\displaystyle N(r)\,} of a representation r {\displaystyle r\,} be the set of possible mutations of r {\displaystyle r\,} . For simplicity, consider Boolean functions on X n = { − 1 , 1 } n {\displaystyle X_{n}=\{-1,1\}^{n}\,} , and let D n {\displaystyle D_{n}\,} be a probability distribution on X n {\displaystyle X_{n}\,} . Define the performance in terms of this. Specifically, Perf ⁡ ( f , r ) = ∑ x ∈ X n f ( x ) r ( x ) D n ( x ) . {\displaystyle \operatorname {Perf} (f,r)=\sum _{x\in X_{n}}f(x)r(x)D_{n}(x).} Note that Perf ⁡ ( f , r ) = Prob ⁡ ( f ( x ) = r ( x ) ) − Prob ⁡ ( f ( x ) ≠ r ( x ) ) . {\displaystyle \operatorname {Perf} (f,r)=\operatorname {Prob} (f(x)=r(x))-\operatorname {Prob} (f(x)\neq r(x)).} In general, for non-Boolean functions, the performance will not correspond directly to the probability that the functions agree, although it will have some relationship. Throughout an organism's life, it will only experience a limited number of environments, so its performance cannot be determined exactly. The empirical performance is defined by Perf s ⁡ ( f , r ) = 1 s ∑ x ∈ S f ( x ) r ( x ) , {\displaystyle \operatorname {Perf} _{s}(f,r)={\frac {1}{s}}\sum _{x\in S}f(x)r(x),} where S {\displaystyle S\,} is a multiset of s {\displaystyle s\,} independent selections from X n {\displaystyle X_{n}\,} according to D n {\displaystyle D_{n}\,} . If s {\displaystyle s\,} is large enough, evidently Perf s ⁡ ( f , r ) {\displaystyle \operatorname {Perf} _{s}(f,r)} will be close to the actual performance Perf ⁡ ( f , r ) {\displaystyle \operatorname {Perf} (f,r)} . Given an ideal function f ∈ F n {\displaystyle f\in F_{n}} , initial representation r ∈ R n {\displaystyle r\in R_{n}} , sample size s {\displaystyle s\,} , and tolerance t {\displaystyle t\,} , the mutator Mut ⁡ ( f , r , s , t ) {\displaystyle \operatorname {Mut} (f,r,s,t)} is a random variable defined as follows. Each r ′ ∈ N ( r ) {\displaystyle r'\in N(r)} is classified as beneficial, neutral, or deleterious, depending on its empirical performance. Specifically, r ′ {\displaystyle r'\,} is a beneficial mutation if Perf s ⁡ ( f , r ′ ) − Perf s ⁡ ( f , r ) ≥ t {\displaystyle \operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r)\geq t} ; r ′ {\displaystyle r'\,} is a neutral mutation if − t < Perf s ⁡ ( f , r ′ ) − Perf s ⁡ ( f , r ) < t {\displaystyle -t<\operatorname {Perf} _{s}(f,r')-\operatorname {Perf} _{s}(f,r) 0 {\displaystyle \epsilon >0\,} , for all ideal functions f ∈ F n {\displaystyle f\in F_{n}} and representations r 0 ∈ R n {\displaystyle r_{0}\in R_{n}} , with probability at least 1 − ϵ {\displaystyle 1-\epsilon \,} , Perf ⁡ ( f , r g ( n , 1 / ϵ ) ) ≥ 1 − ϵ , {\displaystyle \operatorname {Perf} (f,r_{g(n,1/\epsilon )})\geq 1-\epsilon ,} where the sizes of neighborhoods N ( r ) {\displaystyle N(r)\,} for r ∈ R n {\displaystyle r\in R_{n}\,} are at most p ( n , 1 / ϵ ) {\displaystyle p(n,1/\epsilon )\,} , the sample size is s ( n , 1 / ϵ ) {\displaystyle s(n,1/\epsilon )\,} , the tolerance is t ( 1 / n , ϵ ) {\displaystyle t(1/n,\epsilon )\,} , and the generation size is g ( n , 1 / ϵ ) {\displaystyle g(n,1/\epsilon )\,} . F {\displaystyle F\,} is evolvable over D {\displaystyle D\,} if it is evolvable by some R {\displaystyle R\,} over D {\displaystyle D\,} . F {\displaystyle F\,} is evolvable if it is evolvable over all distributions D {\displaystyle D\,} . == Results == The class of conjunctions and the class of disjunctions are evolvable over the uniform distribution for short conjunctions and disjunctions, respectively. The class of parity functions (which evaluate to the parity of the number of true literals in a given subset of literals) are not evolvable, even for the uniform distribution. Evolvability implies PAC learnability.

    Read more →