AI For Business Rules

AI For Business Rules — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Eager learning

    Eager learning

    In artificial intelligence, eager learning is a learning method in which the system tries to construct a general, input-independent target function during training of the system, as opposed to lazy learning, where generalization beyond the training data is delayed until a query is made to the system. The main advantage gained in employing an eager learning method, such as an artificial neural network, is that the target function will be approximated globally during training, thus requiring much less space than using a lazy learning system. Eager learning systems also deal much better with noise in the training data. Eager learning is an example of offline learning, in which post-training queries to the system have no effect on the system itself, and thus the same query to the system will always produce the same result. The main disadvantage with eager learning is that it is generally unable to provide good local approximations in the target function.

    Read more →
  • Semantic analysis (machine learning)

    Semantic analysis (machine learning)

    In machine learning, semantic analysis of a text corpus is the task of building structures that approximate concepts from a large set of documents. It generally does not involve prior semantic understanding of the documents. Semantic analysis strategies include: Metalanguages based on first-order logic, which can analyze the speech of humans. Understanding the semantics of a text is symbol grounding: if language is grounded, it is equal to recognizing a machine-readable meaning. For the restricted domain of spatial analysis, a computer-based language understanding system was demonstrated. Latent semantic analysis (LSA), a class of techniques where documents are represented as vectors in a term space. A prominent example is probabilistic latent semantic analysis (PLSA). Latent Dirichlet allocation, which involves attributing document terms to topics. n-grams and hidden Markov models, which work by representing the term stream as a Markov chain, in which each term is derived from preceding terms. == Stochastic semantic analysis ==

    Read more →
  • Embodied agent

    Embodied agent

    In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. Agents that are represented graphically with a body, for example a human or a cartoon animal, are also called embodied agents, although they have only virtual, not physical, embodiment. A branch of artificial intelligence focuses on empowering such agents to interact autonomously with human beings and the environment. Mobile robots are one example of physically embodied agents; Ananova and Microsoft Agent are examples of graphically embodied agents. Embodied conversational agents are embodied agents (usually with a graphical front-end as opposed to a robotic body) that are capable of engaging in conversation with one another and with humans employing the same verbal and nonverbal means that humans do (such as gesture, facial expression, and so forth). == Embodied conversational agents == Embodied conversational agents are a form of intelligent user interface. Graphically embodied agents aim to unite gesture, facial expression and speech to enable face-to-face communication with users, providing a powerful means of human-computer interaction. == Advantages == Face-to-face communication allows communication protocols that give a much richer communication channel than other means of communicating. It enables pragmatic communication acts such as conversational turn-taking, facial expression of emotions, information structure and emphasis, visualization and iconic gestures, and orientation in a three-dimensional environment. This communication takes place through both verbal and non-verbal channels such as gaze, gesture, spoken intonation and body posture. Research has found that users prefer a non-verbal visual indication of an embodied system's internal state to a verbal indication, demonstrating the value of additional non-verbal communication channels. As well as this, the face-to-face communication involved in interacting with an embodied agent can be conducted alongside another task without distracting the human participants, instead improving the enjoyment of such an interaction. Furthermore, the use of an embodied presentation agent results in improved recall of the presented information. Embodied agents also provide a social dimension to the interaction. Humans willingly ascribe social awareness to computers, and thus interaction with embodied agents follows social conventions, similar to human to human interactions. This social interaction both raises the believably and perceived trustworthiness of agents, and increases the user's engagement with the system. Rickenberg and Reeves found that the presence of an embodied agent on a website increased the level of user trust in that website, but also increased users' anxiety and affected their performance, as if they were being watched by a real human. Another effect of the social aspect of agents is that presentations given by an embodied agent are perceived as being more entertaining and less difficult than similar presentations given without an agent. Research shows that perceived enjoyment, followed by perceived usefulness and ease of use, is the major factor influencing user adoption of embodied agents. A study in January 2004 by Byron Reeves at Stanford demonstrated how digital characters could "enhance online experiences" through explaining how virtual characters essentially add a sense of familiarity to the user experience and make it more approachable. This increase in likability in turn helps make the products better, which benefits both the end users and those creating the product. === Applications === The rich style of communication that characterizes human conversation makes conversational interaction with embodied conversational agents ideal for many non-traditional interaction tasks. A familiar application of graphically embodied agents is computer games; embodied agents are ideal for this setting because the richer communication style makes interacting with the agent enjoyable. Embodied conversational agents have also been used in virtual training environments, portable personal navigation guides, interactive fiction and storytelling systems, interactive online characters and automated presenters and commentators. Major virtual assistants like Siri, Amazon Alexa and Google Assistant do not come with any visual embodied representation, which is believed to limit the sense of human presence by users. The U.S. Department of Defense utilizes a software agent called SGT STAR on U.S. Army-run Web sites and Web applications for site navigation, recruitment and propaganda purposes. Sgt. Star is run by the Army Marketing and Research Group, a division operated directly from The Pentagon. Sgt. Star is based upon the ActiveSentry technology developed by Next IT, a Washington-based information technology services company. Other such bots in the Sgt. Star "family" are utilized by the Federal Bureau of Investigation and the Central Intelligence Agency for intelligence gathering purposes.

    Read more →
  • Agent verification

    Agent verification

    Agent verification is activity to gain assurances that purposeful artificial constructs act in accordance with their specifications. While primitive forms of inorganic agents have been used in manufacturing for centuries, the study of artificial agents did not begin until the mid 20th century. Foundational work on such agents was closely bound with the emergence of artificial intelligence as an academic discipline. Early agents deployed for industrial control systems and in computing were often controlled by quite simple logic however, not involving artificial intelligence as such. When deployed as part of a multi-agent system, even such simple agents could require special agent orientated testing methods, as their collective behaviour was challenging to verify with traditional testing techniques. Difficulties in providing assurances that agents will not behave in dangerous ways became more prevalent after the introduction of LLM agents, especially after the rapid acceleration of their deployment in 2025. The verification of agent behaviour can be conducted by formal or informal methods. Informal verification requires less mathematical skill. But when agents are part of systems where errors have significant risks — such as danger to human life, environmental damage or major financial loss — formal verification is preferred. Both regulators and system designers themselves like formal verification as it provides a high degree of mathematical certainty. It is not however always possible to formally test all aspects of an agent based system's behaviour, especially where newer LLM based agents are concerned, due in part to their high degree of autonomy. Accordingly, agent verification for low impact deployments might be carried out only with informal methods, while for high impact deployments, it may be performed with a mix of formal and informal techniques. == Terminology == In academia, the term agent verification is often defined to mean activity concerned with gaining assurance that the agent behaves in accordance with its specification - whether by processes such as testing or simulation. 'Verification' is typically contrasted with 'validation', the latter meaning activity concerned with checking that the specification itself meets user or real world needs. Such definitions are not universally adhered to however - for example, in some workplaces and documents, the words 'verification' and 'validation' can be used synonymously. Efforts to gain confidence in Agents have intensified sharply since 2025 due to the rapid roll out of LLM agents; different terms are sometimes used in the commercial sector. Here the term 'agent verification' can be used in the same sense as it is in academia, but sometimes the same activity can be covered by more ambiguous and wider ranging terms such as 'Agent governance' , 'Agent observability' or 'AI agent policing'. == History == === Classical agents === The theoretical underpinnings for artificial (inorganic) agents emerged in the mid 20th century, with establishment of cybernetics and artificial intelligence. Oliver Selfridge's 1958 Pandemonium - A Paradigm for Learning paper was an important early theoretical contribution in establishing agent oriented architecture. Practical implementations of agents for real world applications began to become widespread in the 1990s, after the introduction of the belief–desire–intention software model (BDI), and agent-oriented programming. Pure digital agents were deployed in computer infrastructure for purposes such as monitoring, while agents connected to real-world sensors and actuators were increasingly used in industrial control systems. While the concept of artificial agents was interwoven with early artificial intelligence studies right from the start, early agents lacked general purpose reasoning capabilities, often only having simple if then logic. Even a device as simple as a thermostat, which has a sensor and a means of acting, can be considered a proto agent in this sense. Verifying the behaviours of a simple single agent system is not generally especially difficult, but it can be a different matter when several simple agents coexist in the same system. Craig Reynolds's work on boids showed that relatively complex, "intelligent" behaviour can emerge from a number of such simple agents working together in a Multi-agent system (MAS). By the 1990s, even the behaviour of a single agent system could sometimes be quite complex; in accordance with the Belief–desire–intention software model, agents could have believes that might evolve over time. Agents were increasingly introduced that were controlled by quite large decision tree models, which had new vulnerabilities to adversarial attack. It was becoming increasingly apparent that traditional software verification methods had limitations for testing such agents, or even for the more primitive type of agents when they were deployed as part of a MAS. It was the use of agents for industrial control systems, sometimes associated with robotics, that lent urgency to the practice of agent verification. Informal testing might be acceptable for digital agents used say to monitor whether each of an organisation's computers are properly licensed. But with an increasing potential for faulty agents to result in a failure that might cause a large fire to break out at a chemical manufacturing plant, a botched medical operation, or even a crashed aircraft, the need to develop reliable means of verifying behaviour of such agents was considered urgent. The Foundation for Intelligent Physical Agents was established in 1996. From the late 90s, a growing number of industry and university based scientists began working on the problem, with researchers publishing papers on the verification of both single and multi agent systems. Much of this work showed how formal verification techniques like model checking could be used to gain a high level of assurance that agent based systems would conform with their specification. A 2018 systematic review covering 231 studies found that model checking was the most common technique for agent verification, with theorem proving the second most commonly used formal verification method. In the first two decades of the 20th century, agents run by AI became more common, with Siri and Alexa being well known examples. But such agents still lacked general reasoning capabilities and did not pose new pressing problems for agent verification. === General purpose reasoning agents === The advent of LLMs created huge potential for further use of artificial agents, as agents based on them could have general purpose cognitive abilities. Agents run by LLMs (and occasionally non-LLM foundation models) have similar vulnerability to adversarial attack as those run by decision tree models. The wider scope of actions for LLM agents has created new challenges for their verification, over and above those present for classical agents. For example, the LLM's neural network endows it with infinite domains, an especial challenge for traditional formal verification techniques. Academics began to study the problems involved in verifying LLM agents from 2018. Deployment of such agents began to accelerate in late 2023 after OpenAI's "function-calling" API was made available, and especially after Anthropic's late 2024 introduction of Model Context Protocol (MCP), a standardised way for LLM agents to gain contextual awareness, and to act on the world by calling various external tools. The rapid rollout of LLM agents following MCP's release has seen the task of agent verification receive increased attention within academia, and also from the private sector. In 2024 and 2025 several startups focusing on LLM agent verification have been founded in both Europe and the US to meet growing demand. == Approaches == === Formal verification === Formal verification involves proving the correctness of some or all aspects of a system using mathematical methods. Such methods can range from manual formal proof, to verification assisted with automated theorem provers like Isabelle. For agent verification, model checking is by far the most frequently used formal verification method; for pre-LLM models it was often complemented with techniques using computation tree logic. Another common method is theorem proving. Formal verification provides a higher degree of confidence than informal methods, but it is not always used, even when it is possible. Sometimes a person or organisation developing software agents won't have the necessary skills, or may not see it as worth the effort if the agent(s) will not have the ability to cause much harm even if they malfunction. When agents are deployed in systems where errors could have serious consequences, the ability of formal verification methods to provide mathematical certainty tends to be strongly preferred by both regulators and designers themselves. But even for high impact systems, formal verificatio

    Read more →
  • Paprika (app)

    Paprika (app)

    Paprika is an app and website that helps users organize recipes, produce meal plans, and create grocery lists. The app is available for Android, iOS, macOS, and Windows devices. == Overview == The app allows users to import recipes from various sources, including websites and other apps. The app also allows users to automatically generate meal plans, which are also customizable, in order to achieve specific objectives such as weight loss, muscle gain, adherence to various dietary preferences, or personal taste. The app is also capable of generating grocery lists based on the daily or weekly meal plans chosen by the user. All the recipes, menus, and grocery lists of each user are accessible from smartphones, tablets, and computers. The app is part of a broader category of mobile apps focused on meal planning, recipe management, and shopping list automation, which have grown in popularity with the expansion of smartphone usage and digital cooking tools. == History == Paprika Recipe Manager for iPad version 1.0 was initially released in September 2010 by Hindsight LLC. Paprika 2.0 was released for iPhone and iPad in November 2013, and Paprika 3.0 was released for iOS and macOS in November 2017. == Reception == Paprika has been featured in technology and lifestyle publications as a recipe management and meal planning application. Coverage has noted features such as importing recipes from websites, ingredient scaling, and cross-platform synchronization. The app has also appeared in lists of cooking and meal planning tools published by outlets including The Verge and The Kitchn.

    Read more →
  • Intelligent control

    Intelligent control

    Intelligent control is a class of control techniques that use various artificial intelligence computing approaches like neural networks, Bayesian probability, fuzzy logic, machine learning, reinforcement learning, evolutionary computation and genetic algorithms. == Overview == Intelligent control can be divided into the following major sub-domains: Neural network control Machine learning control Reinforcement learning Bayesian control Fuzzy control Neuro-fuzzy control Expert Systems Genetic control New control techniques are created continuously as new models of intelligent behavior are created and computational methods developed to support them. === Neural network controller === Neural networks have been used to solve problems in almost all spheres of science and technology. Neural network control basically involves two steps: System identification Control It has been shown that a feedforward network with nonlinear, continuous and differentiable activation functions have universal approximation capability. Recurrent networks have also been used for system identification. Given, a set of input-output data pairs, system identification aims to form a mapping among these data pairs. Such a network is supposed to capture the dynamics of a system. For the control part, deep reinforcement learning has shown its ability to control complex systems. === Bayesian controllers === Bayesian probability has produced a number of algorithms that are in common use in many advanced control systems, serving as state space estimators of some variables that are used in the controller. The Kalman filter and the Particle filter are two examples of popular Bayesian control components. The Bayesian approach to controller design often requires an important effort in deriving the so-called system model and measurement model, which are the mathematical relationships linking the state variables to the sensor measurements available in the controlled system. In this respect, it is very closely linked to the system-theoretic approach to control design.

    Read more →
  • Random feature

    Random feature

    Random features (RF) are a technique used in machine learning to approximate kernel methods, introduced by Ali Rahimi and Ben Recht in their 2007 paper "Random Features for Large-Scale Kernel Machines", and extended by. RF uses a Monte Carlo approximation to kernel functions by randomly sampled feature maps. It is used for datasets that are too large for traditional kernel methods like support vector machine, kernel ridge regression, and gaussian process. == Mathematics == === Kernel method === Given a feature map ϕ : R d → V {\textstyle \phi :\mathbb {R} ^{d}\to V} , where V {\textstyle V} is a Hilbert space (more specifically, a reproducing kernel Hilbert space), the kernel trick replaces inner products in feature space ⟨ ϕ ( x i ) , ϕ ( x j ) ⟩ V {\displaystyle \langle \phi (x_{i}),\phi (x_{j})\rangle _{V}} by a kernel function k ( x i , x j ) : R d × R d → R {\displaystyle k(x_{i},x_{j}):\mathbb {R} ^{d}\times \mathbb {R} ^{d}\to \mathbb {R} } Kernel methods replaces linear operations in high-dimensional space by operations on the kernel matrix: K X := [ k ( x i , x j ) ] i , j ∈ 1 : N {\displaystyle K_{X}:=[k(x_{i},x_{j})]_{i,j\in 1:N}} where N {\textstyle N} is the number of data points. === Random kernel method === The problem with kernel methods is that the kernel matrix K X {\textstyle K_{X}} has size N × N {\textstyle N\times N} . This becomes computationally infeasible when N {\textstyle N} reaches the order of a million. The random kernel method replaces the kernel function k {\textstyle k} by an inner product in low-dimensional feature space R D {\textstyle \mathbb {R} ^{D}} : k ( x , y ) ≈ ⟨ z ( x ) , z ( y ) ⟩ {\displaystyle k(x,y)\approx \langle z(x),z(y)\rangle } where z {\textstyle z} is a randomly sampled feature map z : R d → R D {\textstyle z:\mathbb {R} ^{d}\to \mathbb {R} ^{D}} . This converts kernel linear regression into linear regression in feature space, kernel SVM into SVM in feature space, etc. Since we have K X ≈ Z X T Z X {\displaystyle K_{X}\approx Z_{X}^{T}Z_{X}} where Z X = [ z ( x 1 ) , … , z ( x N ) ] {\displaystyle Z_{X}=[z(x_{1}),\dots ,z(x_{N})]} , these methods no longer involve matrices of size O ( N 2 ) {\textstyle O(N^{2})} , but only random feature matrices of size O ( D N ) {\textstyle O(DN)} . == Random Fourier feature == === Radial basis function kernel === The radial basis function (RBF) kernel on two samples x i , x j ∈ R d {\displaystyle x_{i},x_{j}\in \mathbb {R} ^{d}} is defined as k ( x i , x j ) = exp ⁡ ( − ‖ x i − x j ‖ 2 2 σ 2 ) {\displaystyle k(x_{i},x_{j})=\exp \left(-{\frac {\|x_{i}-x_{j}\|^{2}}{2\sigma ^{2}}}\right)} where ‖ x i − x j ‖ 2 {\displaystyle \|x_{i}-x_{j}\|^{2}} is the squared Euclidean distance and σ {\displaystyle \sigma } is a free parameter defining the shape of the kernel. It can be approximated by a random Fourier feature map z : R d → R 2 D {\displaystyle z:\mathbb {R} ^{d}\to \mathbb {R} ^{2D}} : z ( x ) := 1 D [ cos ⁡ ⟨ ω 1 , x ⟩ , sin ⁡ ⟨ ω 1 , x ⟩ , … , cos ⁡ ⟨ ω D , x ⟩ , sin ⁡ ⟨ ω D , x ⟩ ] T {\displaystyle z(x):={\frac {1}{\sqrt {D}}}[\cos \langle \omega _{1},x\rangle ,\sin \langle \omega _{1},x\rangle ,\ldots ,\cos \langle \omega _{D},x\rangle ,\sin \langle \omega _{D},x\rangle ]^{T}} where ω 1 , . . . , ω D {\displaystyle \omega _{1},...,\omega _{D}} are IID samples from the multidimensional normal distribution N ( 0 , σ − 2 I ) {\displaystyle N(0,\sigma ^{-2}I)} . Since cos , sin {\displaystyle \cos ,\sin } are bounded, there is a stronger convergence guarantee by Hoeffding's inequality. === Random Fourier features === By Bochner's theorem, the above construction can be generalized to arbitrary positive definite shift-invariant kernel k ( x , y ) = k ( x − y ) {\displaystyle k(x,y)=k(x-y)} . Define its Fourier transform p ( ω ) = 1 2 π ∫ R d e − j ⟨ ω , Δ ⟩ k ( Δ ) d Δ {\displaystyle p(\omega )={\frac {1}{2\pi }}\int _{\mathbb {R} ^{d}}e^{-j\langle \omega ,\Delta \rangle }k(\Delta )d\Delta } then ω 1 , . . . , ω D {\displaystyle \omega _{1},...,\omega _{D}} are sampled IID from the probability distribution with probability density p {\displaystyle p} . This applies for other kernels like the Laplace kernel and the Cauchy kernel. === Neural network interpretation === Given a random Fourier feature map z {\displaystyle z} , training the feature on a dataset by featurized linear regression is equivalent to fitting complex parameters θ 1 , … , θ D ∈ C {\displaystyle \theta _{1},\dots ,\theta _{D}\in \mathbb {C} } such that f θ ( x ) = R e ( ∑ k θ k e i ⟨ ω k , x ⟩ ) {\displaystyle f_{\theta }(x)=\mathrm {Re} \left(\sum _{k}\theta _{k}e^{i\langle \omega _{k},x\rangle }\right)} which is a neural network with a single hidden layer, with activation function t ↦ e i t {\displaystyle t\mapsto e^{it}} , zero bias, and the parameters in the first layer frozen. In the overparameterized case, when 2 D ≥ N {\displaystyle 2D\geq N} , the network linearly interpolates the dataset { ( x i , y i ) } i ∈ 1 : N {\displaystyle \{(x_{i},y_{i})\}_{i\in 1:N}} , and the network parameters is the least-norm solution: θ ^ = arg ⁡ min θ ∈ C D , f θ ( x k ) = y k ∀ k ∈ 1 : N ‖ θ ‖ {\displaystyle {\hat {\theta }}=\arg \min _{\theta \in \mathbb {C} ^{D},f_{\theta }(x_{k})=y_{k}\forall k\in 1:N}\|\theta \|} At the limit of D → ∞ {\displaystyle D\to \infty } , the L2 norm ‖ θ ^ ‖ → ‖ f K ‖ H {\displaystyle \|{\hat {\theta }}\|\to \|f_{K}\|_{H}} where f K {\displaystyle f_{K}} is the interpolating function obtained by the kernel regression with the original kernel, and ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{H}} is the norm in the reproducing kernel Hilbert space for the kernel. == Other examples == === Random binning features === A random binning features map partitions the input space using randomly shifted grids at randomly chosen resolutions and assigns to an input point a binary bit string that corresponds to the bins in which it falls. The grids are constructed so that the probability that two points x i , x j ∈ R d {\displaystyle x_{i},x_{j}\in \mathbb {R} ^{d}} are assigned to the same bin is proportional to K ( x i , x j ) {\displaystyle K(x_{i},x_{j})} . The inner product between a pair of transformed points is proportional to the number of times the two points are binned together, and is therefore an unbiased estimate of K ( x i , x j ) {\displaystyle K(x_{i},x_{j})} . Since this mapping is not smooth and uses the proximity between input points, Random Binning Features works well for approximating kernels that depend only on the L 1 {\displaystyle L_{1}} distance between datapoints. === Orthogonal random features === Orthogonal random features uses a random orthogonal matrix instead of a random Fourier matrix. == Historical context == In NIPS 2006, deep learning had just become competitive with linear models like PCA and linear SVMs for large datasets, and people speculated about whether it could compete with kernel SVMs. However, there was no way to train kernel SVM on large datasets. The two authors developed the random feature method to train those. It was then found that the O ( 1 / D ) {\displaystyle O(1/D)} variance bound did not match practice: the variance bound predicts that approximation to within 0.01 {\displaystyle 0.01} requires D ∼ 10 4 {\displaystyle D\sim 10^{4}} , but in practice required only ∼ 10 2 {\displaystyle \sim 10^{2}} . Attempting to discover what caused this led to the subsequent two papers.

    Read more →
  • Statistical learning theory

    Statistical learning theory

    Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics. == Introduction == The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be R {\displaystyle R} , such that V = I R {\displaystyle V=IR} Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set. == Formal description == Take X {\displaystyle X} to be the vector space of all possible inputs, and Y {\displaystyle Y} to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space Z = X × Y {\displaystyle Z=X\times Y} , i.e. there exists some unknown p ( z ) = p ( x , y ) {\displaystyle p(z)=p(\mathbf {x} ,y)} . The training set is made up of n {\displaystyle n} samples from this probability distribution, and is notated S = { ( x 1 , y 1 ) , … , ( x n , y n ) } = { z 1 , … , z n } {\displaystyle S=\{(\mathbf {x} _{1},y_{1}),\dots ,(\mathbf {x} _{n},y_{n})\}=\{\mathbf {z} _{1},\dots ,\mathbf {z} _{n}\}} Every x i {\displaystyle \mathbf {x} _{i}} is an input vector from the training data, and y i {\displaystyle y_{i}} is the output that corresponds to it. In this formalism, the inference problem consists of finding a function f : X → Y {\displaystyle f:X\to Y} such that f ( x ) ∼ y {\displaystyle f(\mathbf {x} )\sim y} . Let H {\displaystyle {\mathcal {H}}} be a space of functions f : X → Y {\displaystyle f:X\to Y} called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let V ( f ( x ) , y ) {\displaystyle V(f(\mathbf {x} ),y)} be the loss function, a metric for the difference between the predicted value f ( x ) {\displaystyle f(\mathbf {x} )} and the actual value y {\displaystyle y} . The expected risk is defined to be I [ f ] = ∫ X × Y V ( f ( x ) , y ) p ( x , y ) d x d y {\displaystyle I[f]=\int _{X\times Y}V(f(\mathbf {x} ),y)\,p(\mathbf {x} ,y)\,d\mathbf {x} \,dy} The target function, the best possible function f {\displaystyle f} that can be chosen, is given by the f {\displaystyle f} that satisfies f = argmin h ∈ H ⁡ I [ h ] {\displaystyle f=\mathop {\operatorname {argmin} } _{h\in {\mathcal {H}}}I[h]} Because the probability distribution p ( x , y ) {\displaystyle p(\mathbf {x} ,y)} is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk I S [ f ] = 1 n ∑ i = 1 n V ( f ( x i ) , y i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})} A learning algorithm that chooses the function f S {\displaystyle f_{S}} that minimizes the empirical risk is called empirical risk minimization. == Loss functions == The choice of loss function is a determining factor on the function f S {\displaystyle f_{S}} that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification. === Regression === The most common loss function for regression is the square loss function (also known as the L2-norm). This familiar loss function is used in Ordinary Least Squares regression. The form is: V ( f ( x ) , y ) = ( y − f ( x ) ) 2 {\displaystyle V(f(\mathbf {x} ),y)=(y-f(\mathbf {x} ))^{2}} The absolute value loss (also known as the L1-norm) is also sometimes used: V ( f ( x ) , y ) = | y − f ( x ) | {\displaystyle V(f(\mathbf {x} ),y)=|y-f(\mathbf {x} )|} === Classification === In some sense the 0-1 indicator function is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with Y = { − 1 , 1 } {\displaystyle Y=\{-1,1\}} , this is: V ( f ( x ) , y ) = θ ( − y f ( x ) ) {\displaystyle V(f(\mathbf {x} ),y)=\theta (-yf(\mathbf {x} ))} where θ {\displaystyle \theta } is the Heaviside step function. == Regularization == In machine learning problems, a major problem that arises is that of overfitting. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. Empirical risk minimization runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well. Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well. Regularization can solve the overfitting problem and give the problem stability. Regularization can be accomplished by restricting the hypothesis space H {\displaystyle {\mathcal {H}}} . A common example would be restricting H {\displaystyle {\mathcal {H}}} to linear functions: this can be seen as a reduction to the standard problem of linear regression. H {\displaystyle {\mathcal {H}}} could also be restricted to polynomial of degree p {\displaystyle p} , exponentials, or bounded functions on L1. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero. One example of regularization is Tikhonov regularization. This consists of minimizing 1 n ∑ i = 1 n V ( f ( x i ) , y i ) + γ ‖ f ‖ H 2 {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})+\gamma \left\|f\right\|_{\mathcal {H}}^{2}} where γ {\displaystyle \gamma } is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution. == Bounding empirical risk == Consider a binary classifier f : X → { 0 , 1 } {\displaystyle f:{\mathcal {X}}\to \{0,1\}} . We can apply Hoeffding's inequality to bound the probability that the empirical risk deviates from the true risk to be a Sub-Gaussian distribution. P ( | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 e − 2 n ϵ 2 {\displaystyle \mathbb {P} (|{\hat {R}}(f)-R(f)|\geq \epsilon )\leq 2e^{-2n\epsilon ^{2}}} But generally, when we do empirical risk minimization, we are not given a classifier; we must choose it. Therefore, a more useful result is to bound the probability of the supremum of the difference over the whole class. P ( sup f ∈ F | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 S ( F , n ) e − n ϵ 2 / 8 ≈ n d e − n ϵ 2 / 8 {\displaystyle \mathbb {P} {\bigg (}\sup _{f\in {\mathcal {F}}}|{\hat {R}}(f)-R(f)|\geq \epsilon {\bigg )}\leq 2S({\mathcal {F}},n)e^{-n\epsilon ^{2}/8}\approx n^{d}e^{-n\epsilon ^{2}/8}} where S ( F , n ) {\displaystyle S({\mathcal {F}},n)} is the shattering number and n {\displaystyle n} is the number of samples in your dataset. The exponential term comes from Hoeffding but there is an extra cost of taking the supremum over the whole cla

    Read more →
  • Kindara

    Kindara

    Kindara is a femtech company headquartered in Colorado that develops apps that help women identify their fertile window. The products are used for women trying to get pregnant, or women who want to track their menstrual cycle for overall health. Their latest product, Priya Fertility and Ovulation Monitor, maximizes a woman's chance of getting pregnancy by identifying her most fertile days. == Overview == Kindara was founded in 2011 by husband-and-wife team Will Sacks and Kati Bicknell. The company launched its free mobile application in 2012. Kindara's mobile application allows women to track signs of fertility, such as basal body temperature, cervical fluid, and the position of the cervix to determine when ovulation is occurring. Kindara also sells a thermometer, Wink, which records basal body temperature and syncs automatically to the Kindara fertility application. In 2018, Kindara was acquired by the company Prima-Temp.

    Read more →
  • Automated negotiation

    Automated negotiation

    Automated negotiation is a form of interaction in systems that are composed of multiple autonomous agents, in which the aim is to reach agreements through an iterative process of making offers. Automated negotiation can be employed for many tasks human negotiators regularly engage in, such as bargaining and joint decision making. The main topics in automated negotiation revolve around the design of protocols and negotiating strategies. == History == Through digitization, the beginning of the 21st century has seen a growing interest in the automation of negotiation and e-negotiation systems, for example in the setting of e-commerce. This interest is fueled by the promise of automated agents being able to negotiate on behalf of human negotiators, and to find better outcomes than human negotiators. == Examples == Examples of automated negotiation include: Online dispute resolution, in which disagreements between parties are settled. Sponsored search auction, where bids are placed on advertisement keywords. Content negotiation, in which user agents negotiate over HTTP about how to best represent a web resource. Negotiation support systems, in which negotiation decision-making activities are supported by an information system.

    Read more →
  • Sparkles emoji

    Sparkles emoji

    The Sparkles emoji (U+2728 ✨ SPARKLES) is an emoji that has one large star surrounded by smaller stars. Originating from Japan to represent sparkles used in anime and manga, the sparkles are often used as emphasis in text by surrounding words or phrases with it. It is the third most-used emoji in the world on Twitter as of 2021. Since the early 2020s it has been used by major software companies to represent artificial intelligence, marketing the technology as "like magic". == Development == According to Emojipedia, the Sparkles emoji was first used by Japanese mobile operators SoftBank, Docomo and au in the late 1990s. The emoji was added to Unicode 6.0 in 2010 and Emoji 1.0 in 2015. On some platforms the Sparkles emoji has been multicoloured whilst on other platforms it has been one colour. Twitter and Microsoft's Sparkles have changed from being multicoloured to being a single colour. Samsung's version of the emoji previously had a night sky in the background. == Usage == === Interpersonal communication === The Sparkles emoji was originally meant to represent the usage of sparkles in Japanese anime and manga, where the sparkles are used to represent beauty, happiness or awe. The emoji has several meanings and depends upon context. Starting in the late 2010s, the emoji started being used to surround words or phrases to be used as emphasis, an example from the book Because Internet being "I would simply ✨pass away✨". It can also be used as sarcasm, irony or as a way to mock people. Without emoji this could be represented with tildes or asterisks, for example, "~tildes~" or "~asterisk plus tilde~" or "~~true sparkle exuberance~~". The sparkles emoji can be used to represent stars in text, be used to represent cleanliness or can be used to mean "orgasm" whilst sexting. In September 2021 the Sparkles emoji overtook the Pleading Face (🥺) emoji to become the third most-used emoji in the world according to Emojipedia, with approximately 1 per cent of all tweets containing the Sparkles emoji. === Artificial intelligence === In the early 2020s, the Sparkles emoji started being used as an icon to represent artificial intelligence (AI). Companies who use the emoji this way include Google, OpenAI, Samsung, Microsoft, Adobe, Spotify and Zoom. As of August 2024, seven of the top 10 software companies by market capitalisation use the Sparkles emojis with AI. OpenAI has different versions of the Sparkles for different versions of the models that ChatGPT uses. One explanation is that Sparkles is being used by these companies as a way to market AI as "magic". Marketing technology as "magic" has been used before AI, particularly by Apple. Another explanation given by designers and marketers choosing to use Sparkles to signify AI is simply that other platforms are doing it, making it familiar to users. Around 2024, some of these companies started removing two of the smaller stars from the emoji in their AI services and have kept the one large star, an example being Google's Gemini chatbot. In early 2024, the Nielsen Norman Group provided test subjects with the star in isolation and found that people did not associate the symbol with AI, but instead mostly with "optimisation" or "favourite or save an item".

    Read more →
  • Lazy learning

    Lazy learning

    (Not to be confused with the lazy learning regime, see Neural tangent kernel). In machine learning, lazy learning is a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to eager learning, where the system tries to generalize the training data before receiving queries. The primary motivation for employing lazy learning, as in the K-nearest neighbors algorithm, used by online recommendation systems ("people who viewed/purchased/listened to this movie/item/tune also ...") is that the data set is continuously updated with new entries (e.g., new items for sale at Amazon, new movies to view at Netflix, new clips at YouTube, new music at Spotify or Pandora). Because of the continuous update, the "training data" would be rendered obsolete in a relatively short time especially in areas like books and movies, where new best-sellers or hit movies/music are published/released continuously. Therefore, one cannot really talk of a "training phase". Lazy classifiers are most useful for large, continuously changing datasets with few attributes that are commonly queried. Specifically, even if a large set of attributes exist - for example, books have a year of publication, author/s, publisher, title, edition, ISBN, selling price, etc. - recommendation queries rely on far fewer attributes - e.g., purchase or viewing co-occurrence data, and user ratings of items purchased/viewed. == Advantages == The main advantage gained in employing a lazy learning method is that the target function will be approximated locally, such as in the k-nearest neighbor algorithm. Because the target function is approximated locally for each query to the system, lazy learning systems can simultaneously solve multiple problems and deal successfully with changes in the problem domain. At the same time they can reuse a lot of theoretical and applied results from linear regression modelling (notably PRESS statistic) and control. It is said that the advantage of this system is achieved if the predictions using a single training set are only developed for few objects. This can be demonstrated in the case of the k-NN technique, which is instance-based and function is only estimated locally. == Disadvantages == Theoretical disadvantages with lazy learning include: The large space requirement to store the entire training dataset. In practice, this is not an issue because of advances in hardware and the relatively small number of attributes (e.g., as co-occurrence frequency) that need to be stored. Particularly noisy training data increases the case base unnecessarily, because no abstraction is made during the training phase. In practice, as stated earlier, lazy learning is applied to situations where any learning performed in advance soon becomes obsolete because of changes in the data. Also, for the problems for which lazy learning is optimal, "noisy" data does not really occur - the purchaser of a book has either bought another book or hasn't. Lazy learning methods are usually slower to evaluate. In practice, for very large databases with high concurrency loads, the queries are not postponed until actual query time, but recomputed in advance on a periodic basis - e.g., nightly, in anticipation of future queries, and the answers stored. This way, the next time new queries are asked about existing entries in the database, the answers are merely looked up rapidly instead of having to be computed on the fly, which would almost certainly bring a high-concurrency multi-user system to its knees. Larger training data also entail increased cost. Particularly, there is the fixed amount of computational cost, where a processor can only process a limited amount of training data points. There are standard techniques to improve re-computation efficiency so that a particular answer is not recomputed unless the data that impact this answer has changed (e.g., new items, new purchases, new views). In other words, the stored answers are updated incrementally. This approach, used by large e-commerce or media sites, has long been used in the Entrez portal of the National Center for Biotechnology Information (NCBI) to precompute similarities between the different items in its large datasets: biological sequences, 3-D protein structures, published-article abstracts, etc. Because "find similar" queries are asked so frequently, the NCBI uses highly parallel hardware to perform nightly recomputation. The recomputation is performed only for new entries in the datasets against each other and against existing entries: the similarity between two existing entries need not be recomputed. == Examples of Lazy Learning Methods == K-nearest neighbors, which is a special case of instance-based learning. Local regression. Lazy naive Bayes rules, which are extensively used in commercial spam detection software. Here, the spammers keep getting smarter and revising their spamming strategies, and therefore the learning rules must also be continually updated.

    Read more →
  • Site-specific browser

    Site-specific browser

    A site-specific browser (SSB) is a software application dedicated to accessing pages from a single source (site) on a computer network such as the Internet or a private intranet. SSBs typically simplify the more complex functions of a web browser by excluding the menus, toolbars and browser graphical user interface associated with functions that are external to the workings of a single site. Modern site-specific browsers range from simple browser windows without navigation controls to sophisticated desktop applications built with frameworks like Electron that bundle entire browser engines. This evolution has enabled many popular desktop applications to be built using web technologies, effectively making them advanced site-specific browsers. == History == === Early development === One of the earliest examples of an SSB was MacDICT, a Mac OS 9 application that accessed various websites to define, translate, or find synonyms for words typed into a text box. However, the first general-purpose SSB is considered to be Bubbles, which launched in late 2005 on the Windows platform. Bubbles introduced the term "Site Specific Extensions" for SSB userscripts and created the first SSB JavaScript API. In 2007, Mozilla announced Prism (originally called WebRunner), a project to integrate web applications with the desktop. That same year, Todd Ditchendorf, a former Apple Dashboard engineer, released Fluid for macOS. On 2 September 2008, Google Chrome was released with a built-in "Create application shortcut" feature, bringing SSB functionality to mainstream users. This feature allowed any website to be launched in a separate window without the browser interface. === Modern era === The landscape of site-specific browsers changed dramatically with the introduction of Electron in 2013 (originally called Atom Shell). Electron combined Chromium and Node.js into a single runtime, enabling developers to build desktop applications using web technologies. This framework has since powered applications used by hundreds of millions of users, including Visual Studio Code, Slack, Discord, and Microsoft Teams. In 2015, the concept of Progressive Web Apps (PWAs) was introduced by Google engineers Alex Russell and Frances Berriman, representing a parallel evolution in web-to-desktop technology. While PWAs share similar goals with SSBs, they follow web standards and can be installed directly from browsers. More recently, alternative frameworks like Tauri have emerged, offering significantly smaller application sizes by using the system's native web renderer instead of bundling Chromium. == Technical implementation == Site-specific browsers can be implemented through various approaches: === Browser-based SSBs === The simplest form of SSB is created through browser features that allow websites to run in separate windows without the standard browser interface. Modern Chromium-based browsers offer "Install as app" or "Create shortcut" functionality that creates a dedicated window for a specific website. These SSBs share the browser's underlying engine and resources but operate in isolated windows. === Framework-based SSBs === More sophisticated SSBs are built using application frameworks: Electron: Bundles a complete Chromium browser with Node.js, resulting in applications of 85MB or larger. Each Electron application runs its own browser instance, providing full access to system APIs but consuming significant resources. Tauri: Uses the operating system's native web rendering engine (WebView2 on Windows, WebKit on macOS, and WebKitGTK on Linux), resulting in applications typically 2.5-10MB in size. Other frameworks: Include Neutralino.js (ultra-lightweight using system browser), Wails (Go-based), and the Chromium Embedded Framework (CEF). == Comparison with Progressive Web Apps == While site-specific browsers and Progressive Web Apps (PWAs) share the goal of bringing web content to the desktop, they differ in several key aspects: == Applications == Site-specific browsers have become the foundation for many popular desktop applications: Communication and collaboration: Many modern communication tools are built as SSBs, including Slack, Discord, Microsoft Teams, and WhatsApp Desktop. These applications benefit from web-based development while providing desktop integration. Development tools: Visual Studio Code, used by 73.6% of developers according to Stack Overflow's 2024 survey, is built with Electron, as are Atom and GitHub Desktop. Productivity software: Applications like Notion, Obsidian, and various project management tools use SSB technology to provide consistent experiences across platforms. Security and Privacy: Web browsers can be modified to only have access to a single site, in order to protect the security and privacy of the user via compartmentalization == Security and performance == === Memory usage === Framework-based SSBs, particularly those using Electron, are known for high memory consumption. Studies show Electron applications typically use 120-300MB at baseline, with complex applications consuming significantly more. This is approximately 5-10 times more memory than equivalent native applications. === Security considerations === SSBs can provide security benefits through process isolation, where each application runs in its own sandboxed environment. However, bundling an entire browser engine also means each application must be updated independently to patch security vulnerabilities. Research presented at the Network and Distributed System Security (NDSS) Symposium has identified various security challenges specific to Electron applications. === Bundle sizes === The choice of framework significantly impacts application size: Electron applications: 85MB+ (includes full Chromium) Tauri applications: 2.5-10MB (uses system WebView) Browser-based SSBs: No additional download (uses existing browser) == Software == === Browser support === Most modern browsers provide some form of SSB functionality: Chromium-based browsers (Google Chrome, Microsoft Edge, Brave, Opera, Vivaldi): "Install as app" or "Create shortcut" feature Safari: "Add to Dock" feature in macOS Sonoma (2023) Firefox: Removed SSB support in December 2020 (version 85) GNOME Web: "Install Site as Web Application" feature === Standalone tools === ==== Active ==== WebCatalog (Windows, macOS, Linux) – Manages multiple SSBs with isolated storage Fluid (macOS) – Pioneering SSB creator for Mac Unite (macOS) – Creates SSBs with customization options Coherence X (macOS) – Advanced SSB creation tool Pake (cross-platform) – Open-source SSB creator Wavebox (cross-platform) – Workspace browser with SSB features ==== Discontinued ==== Mozilla Prism – Cross-platform SSB creator (discontinued 2011) Nativefier – Command-line SSB creator (discontinued 2023) Epichrome – macOS SSB creator (discontinued 2021) === Development frameworks === Electron – Most popular framework, bundles Chromium and Node.js Tauri – Rust-based framework using system WebView Chromium Embedded Framework (CEF) – C++ library for embedding Chromium Neutralino.js – Lightweight framework using system browser Wails – Go-based framework for web frontends

    Read more →
  • Pattern theory

    Pattern theory

    Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithms and machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language. Broad in its mathematical coverage, Pattern Theory spans algebra and statistics, as well as local topological and global entropic properties. In addition to the new algebraic vocabulary, its statistical approach is novel in its aim to: Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was previously commonplace. Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbs-like graph. Study the randomness and variability of these graphs. Create the basic classes of stochastic models applied by listing the deformations of the patterns. Synthesize (sample) from the models, not just analyze signals with them. The Brown University Pattern Theory Group was formed in 1972 by Ulf Grenander. Many mathematicians are currently working in this group, noteworthy among them being the Fields Medalist David Mumford. Mumford regards Grenander as his "guru" in Pattern Theory.

    Read more →
  • Deadbot

    Deadbot

    A deadbot, deathbot, or griefbot is a digital avatar, created with artificial intelligence, which resembles a person who is dead. Griefbots employ natural language processing and machine-learning techniques to approximate the style and personality of a deceased person. They may appear as chatbots, voice assistants, or animated avatars, and are often trained on an individual's digital remains. == History == Among the earliest researchers, Muhammad Aurangzeb Ahmad of the University of Washington, developed the Grandpa Bot project, a conversational simulation of his late father designed for his children to interact with. Other efforts include journalist James Vlahos's Dadbot, which evolved into the commercial platform HereAfter AI. Hossein Rahnama's Augmented Eternity research at MIT Media Lab and Toronto Metropolitan University, and game designer Jason Rohrer's "Project December", have enabled users to converse with language-model representations of loved ones. Early commercial projects such as Eternime, founded by Marius Ursache, also popularized the notion of interactive digital immortality. == Cultural and societal impact == Scholars have proposed frameworks and critiques addressing the ethics of these technologies. Tomasz Hollanek and Katarzyna Nowaczyk-Basińska developed a design-ethics taxonomy distinguishing the data donor, data recipient, and interactant. Edina Harbinja and Lilian Edwards formalized the concept of post-mortem privacy, and Carl J. Öhman at the Oxford Internet Institute studied the management of large-scale digital remains. Cultural acceptance varies: while some view them as expressions of remembrance, others regard them as unsettling or ethically problematic. Concerns have been raised about deadbots' potential for creating psychological harm. Griefbots are considered part of the phenomenon of artificial intimacy.

    Read more →