AI For Business Brian Hanson

AI For Business Brian Hanson — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

National Cyber Security Policy 2013

National Cyber Security Policy is a policy framework by Department of Electronics and Information Technology (DeitY) It aims at protecting the public and private infrastructure from cyber attacks. The policy also intends to safeguard "information, such as personal information (of web users), financial and banking information and sovereign data". This was particularly relevant in the wake of US National Security Agency (NSA) leaks that suggested the US government agencies are spying on Indian users, who have no legal or technical safeguards against it. Ministry of Communications and Information Technology (India) defines Cyberspace as a complex environment consisting of interactions between people, software services supported by worldwide distribution of information and communication technology. == Reason for Cyber Security policies == India had no Cyber security policy before 2013. In 2013, The Hindu newspaper, citing documents leaked by NSA whistle-blower Edward Snowden, has alleged that much of the NSA surveillance was focused on India's domestic politics and its strategic and commercial interests. This sparked a furore among people. Under pressure, the government unveiled a National Cyber Security Policy 2013 on 2 July 2013. == Vision == To build a secure and resilient cyberspace for citizens, business, and government and also to protect anyone from intervening in user's privacy.It mentioned a five year target of training five lakh cyber security personnel by 2018. == Mission == To protect information and information infrastructure in cyberspace, build capabilities to prevent and respond to cyber threat, reduce vulnerabilities and minimize damage from cyber incidents through a combination of institutional structures, people, processes, technology, and cooperation. == Objective == Ministry of Communications and Information Technology (India) define objectives as follows: To create a secure cyber ecosystem in the country, generate adequate trust and confidence in IT system and transactions in cyberspace and thereby enhance adoption of IT in all sectors of the economy. To create an assurance framework for the design of security policies and promotion and enabling actions for compliance to global security standards and best practices by way of conformity assessment (Product, process, technology & people). To strengthen the Regulatory Framework for ensuring a SECURE CYBERSPACE ECOSYSTEM. To enhance and create National and Sectoral level 24x7 mechanism for obtaining strategic information regarding threats to ICT infrastructure, creating scenarios for response, resolution and crisis management through effective predictive, preventive, protective response and recovery actions. -To improve visibility of integrity of ICT products and services by establishing infrastructure for testing & validation of security of such product. To create workforce for 500,000 professionals skilled in next 5 years through capacity building skill development and training. To provide fiscal benefit to businesses for adoption of standard security practices and processes. To enable Protection of information while in process, handling, storage & transit so as to safeguard privacy of citizen's data and reducing economic losses due to cyber crime or data theft. To enable effective prevention, investigation and prosecution of cybercrime and enhancement of law enforcement capabilities through appropriate legislative intervention. == Strategies == Creating a secured Ecosystem. Creating an assurance framework. Encouraging Open Standards. Strengthening The regulatory Framework. Creating a mechanism for Security Threats Early Warning, Vulnerability management, and response to security threats. Securing E-Governance services. Protection and resilience of Critical Information Infrastructure. Promotion of Research and Development in cyber security. Reducing supply chain risks Human Resource Development (fostering education and training programs both in formal and informal sectors to Support the Nation's cyber security needs and build capacity. Creating cyber security awareness. Developing effective Public-Private partnerships. To develop bilateral and multilateral relationships in the area of cyber security with another country. (Information sharing and cooperation) a Prioritized approach for implementation.
Read more →
OpenVINO

OpenVINO is an open-source software toolkit developed by Intel for optimizing and deploying deep learning models. It supports several popular model formats and categories, such as large language models, computer vision, and generative AI. OpenVINO is optimized for Intel hardware, but offers support for ARM/ARM64 processors. It sees great use in AI Sound Processing drivers when tied with Intel's Gaussian & Neural Accelerator (GNA). Based in C++, it extends API support for C and Python, as well as Node.js (in early preview). OpenVINO is cross-platform and free for use under Apache License 2.0. == Workflow == The simplest OpenVINO usage involves obtaining a model and running it as is. Yet for the best results, a more complete workflow is suggested: obtain a model in one of supported frameworks, convert the model to OpenVINO IR using the OpenVINO Converter tool, optimize the model, using training-time or post-training options provided by OpenVINO's NNCF. execute inference, using OpenVINO Runtime by specifying one of several inference modes. == OpenVINO model format == OpenVINO IR is the default format used to run inference. It is saved as a set of two files, .bin and .xml, containing weights and topology, respectively. It is obtained by converting a model from one of the supported frameworks, using the application's API or a dedicated converter. Models of the supported formats may also be used for inference directly, without prior conversion to OpenVINO IR. Such an approach is more convenient but offers fewer optimization options and lower performance, since the conversion is performed automatically before inference. Some pre-converted models can be found in the Hugging Face repository. The supported model formats are: PyTorch TensorFlow TensorFlow Lite ONNX (including formats that may be serialized to ONNX) PaddlePaddle JAX/Flax == OS support == OpenVINO runs on Windows, Linux and MacOS.
Read more →
Philosophy of information

The philosophy of information (PI) is a branch of philosophy that studies topics relevant to information processing, representational system and consciousness, cognitive science, computer science, information science and information technology. It includes: the critical investigation of the conceptual nature and basic principles of information, including its dynamics, utilisation and sciences the elaboration and application of information-theoretic and computational methodologies to philosophical problems. == History == The philosophy of information (PI) has evolved from the philosophy of artificial intelligence, logic of information, cybernetics, social theory, ethics and the study of language and information. === Logic of information === The logic of information, also known as the logical theory of information, considers the information content of logical signs and expressions along the lines initially developed by Charles Sanders Peirce. === Study of language and information === Later contributions to the field were made by Fred Dretske, Jon Barwise, Brian Cantwell Smith, and others. The Center for the Study of Language and Information (CSLI) was founded at Stanford University in 1983 by philosophers, computer scientists, linguists, and psychologists, under the direction of John Perry and Jon Barwise. === P.I. === More recently this field has become known as the philosophy of information. The expression was coined in the 1990s by Luciano Floridi, who has published prolifically in this area with the intention of elaborating a unified and coherent, conceptual frame for the whole subject. == Definitions of "information" == The concept information has been defined by several theorists. Charles S. Peirce's theory of information was embedded in his wider theory of symbolic communication he called the semiotic, now a major part of semiotics. For Peirce, information integrates the aspects of signs and expressions separately covered by the concepts of denotation and extension, on the one hand, and by connotation and comprehension on the other. Donald M. MacKay says that information is a distinction that makes a difference. According to Luciano Floridi, four kinds of mutually compatible phenomena are commonly referred to as "information": Information about something (e.g. a train timetable) Information as something (e.g. DNA, or fingerprints) Information for something (e.g. algorithms or instructions) Information in something (e.g. a pattern or a constraint). == Philosophical directions == === Computing and philosophy === Recent creative advances and efforts in computing, such as semantic web, ontology engineering, knowledge engineering, and modern artificial intelligence provide philosophy with fertile ideas, new and evolving subject matters, methodologies, and models for philosophical inquiry. While computer science brings new opportunities and challenges to traditional philosophical studies, and changes the ways philosophers understand foundational concepts in philosophy, further major progress in computer science would only be feasible when philosophy provides sound foundations for areas such as bioinformatics, software engineering, knowledge engineering, and ontologies. Classical topics in philosophy, namely, mind, consciousness, experience, reasoning, knowledge, truth, morality and creativity are rapidly becoming common concerns and foci of investigation in computer science, e.g., in areas such as agent computing, software agents, and intelligent mobile agent technologies. According to Luciano Floridi " one can think of several ways for applying computational methods towards philosophical matters: Conceptual experiments in silico: As an innovative extension of an ancient tradition of thought experiment, a trend has begun in philosophy to apply computational modeling schemes to questions in logic, epistemology, philosophy of science, philosophy of biology, philosophy of mind, and so on. Pancomputationalism: On this view, computational and informational concepts are considered to be so powerful that given the right level of abstraction, anything in the world could be modeled and represented as a computational system, and any process could be simulated computationally. Then, however, pancomputationalists have the hard task of providing credible answers to the following two questions: how can one avoid blurring all differences among systems? what would it mean for the system under investigation not to be an informational system (or a computational system, if computation is the same as information processing)?
Read more →
Responsible AI Safety and Education Act

The Responsible AI Safety and Education Act (RAISE Act) is a New York State law that imposes transparency, safety, and reporting requirements on developers of large frontier artificial intelligence models. The law was signed by Governor Kathy Hochul on December 19, 2025. It was sponsored by State Senator Andrew Gounardes and Assemblymember Alex Bores. The RAISE Act is the second U.S. state law to regulate frontier AI model developers, following California's Transparency in Frontier Artificial Intelligence Act (TFAIA), which was signed in September 2025. Hochul signed the bill on the condition that the legislature would pass chapter amendments to bring the law closer to the California model. The amending bills (A9449/S8828) were introduced in January 2026; as of February 2026 they remain in committee, though the Governor's office and legal commentators treat the agreed-upon amendments as representing the final form of the law. == Provisions == The following describes the RAISE Act as it is expected to operate after the agreed-upon chapter amendments take effect. The law is expected to take effect on January 1, 2027. === Scope === The law applies to "large frontier developers," defined as companies with annual revenues exceeding $500 million that develop "frontier models," which are foundation models trained using more than 1026 floating-point operations (FLOPs). The version passed by the legislature in June 2025 had instead defined large developers based on having spent over $100 million in aggregate compute costs, and also included a provision prohibiting deployment of frontier models posing "unreasonable risk of critical harm"; both were removed as part of the negotiations between Hochul and the legislature. Accredited colleges and universities engaged in academic research are exempt, as is the state's Empire AI consortium. === Safety and transparency framework === Large frontier developers must write, implement, and publicly publish a "frontier AI framework" describing how they assess and mitigate catastrophic risks, secure unreleased model weights against unauthorized access, use third-party evaluators, govern internal use of frontier models, and respond to safety incidents. The framework must describe these measures "in detail," a requirement that goes beyond the California TFAIA's requirement to describe a developer's "approach." The framework must be reviewed at least annually, and material modifications must be published with justification within 30 days. Before or concurrently with deploying a new or substantially modified frontier model, developers must publish a transparency report including the model's release date, supported languages and output modalities, intended uses, and any restrictions on use. Large frontier developers must additionally include summaries of catastrophic risk assessments and the extent of third-party involvement. === Catastrophic risk and incident reporting === The law defines "catastrophic risk" as a foreseeable and material risk that a frontier model will contribute to the death of or serious injury to more than 50 people, or more than $1 billion in property damage, arising from a frontier model providing expert-level assistance in creating chemical, biological, radiological, or nuclear weapons; engaging in cyberattacks or conduct equivalent to crimes such as murder, assault, or theft without meaningful human oversight; or evading the control of its developer or user. Loss of equity value is explicitly excluded from the definition of property damage. "Critical safety incidents" include unauthorized access to model weights resulting in death or injury, materialization of a catastrophic risk, loss of control of a frontier model causing death or injury, and a model using deceptive techniques to subvert developer controls outside of an evaluation context in a manner that increases catastrophic risk. Frontier developers must report critical safety incidents within 72 hours, or within 24 hours if the incident poses an imminent risk of death or serious physical injury. === Enforcement === The chapter amendments establish a new office within the New York State Department of Financial Services to oversee compliance, receive incident reports, and publish annual reports on AI safety beginning in 2028. Large frontier developers must file disclosure statements with this office and pay pro rata assessments to fund its operations. The New York Attorney General may bring civil actions, with penalties of up to $1 million for a first violation and $3 million for subsequent violations. The version passed by the legislature in June 2025 had set penalties at up to $10 million and $30 million respectively. The law does not create a private right of action. == Legislative history == The bill was introduced in the Assembly on March 5, 2025, by Assemblymember Alex Bores, and in the Senate on March 27, 2025, by Senator Andrew Gounardes. After a series of amendments, the legislature passed the bill in June 2025. Governor Hochul did not immediately sign the bill, using nearly all the time available under New York law before acting; had she not signed by the end of 2025, the bill would have been pocket vetoed. The tech industry lobbied against the bill during this period, and Hochul initially proposed a near-complete rewrite modeled on California's TFAIA. Legislators resisted the extent of the changes, and the two sides ultimately agreed on a version that used the California law as a base but preserved several provisions that went beyond it, including the 72-hour incident reporting timeline and the creation of a dedicated enforcement office. Hochul signed the original bill (S6953-B/A6453-B) on December 19, 2025, with the legislature committing to pass chapter amendments formalizing the agreed changes in the January 2026 session. The amending bills (A9449 in the Assembly, S8828 in the Senate) were introduced on January 6 and January 8, 2026. OpenAI and Anthropic expressed support for the law. Anthropic's head of external affairs Sarah Heck said the two state laws "should inspire Congress to build on them." The super PAC network Leading the Future, backed by Andreessen Horowitz and OpenAI president Greg Brockman, subsequently announced plans to challenge Bores in a future election. == Federal preemption debate == Hochul signed the RAISE Act eight days after President Donald Trump issued an executive order on December 11, 2025, directing the Department of Justice to challenge state AI laws deemed to conflict with a "minimally burdensome" national AI policy. On January 9, 2026, the Department of Justice announced the establishment of an AI Litigation Task Force as called for by the executive order. The executive order also threatened states with loss of certain federal broadband funding if their AI laws were found to be onerous. Legal commentators have noted several potential avenues for federal challenge, including arguments that the law constitutes compelled speech, violates the dormant Commerce Clause by creating a patchwork of state regulations, or is preempted by federal AI policy. == Comparison with California's TFAIA == The RAISE Act was designed to align with California's Transparency in Frontier Artificial Intelligence Act, signed on September 29, 2025. Both laws use the same 1026 FLOP threshold to define frontier models and the same $500 million revenue threshold to define large developers. Both require public safety frameworks, transparency reports, and incident reporting. The RAISE Act's 72-hour incident reporting window is stricter than the TFAIA's 15-day window, though both require faster reporting for incidents posing imminent physical risk (24 hours under the RAISE Act, immediate under the TFAIA). The RAISE Act establishes a dedicated enforcement office within the Department of Financial Services, whereas California routes reports through the Office of Emergency Services. The RAISE Act requires developers to describe their safety measures "in detail" and how they "handle" various risks, whereas the TFAIA requires developers to describe their "approach."
Read more →
Sample complexity

The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function. More precisely, the sample complexity is the number of training-samples that we need to supply to the algorithm, so that the function returned by the algorithm is within an arbitrarily small error of the best possible function, with probability arbitrarily close to 1. There are two variants of sample complexity: The weak variant fixes a particular input-output distribution; The strong variant takes the worst-case sample complexity over all input-output distributions. The No free lunch theorem, discussed below, proves that, in general, the strong sample complexity is infinite, i.e. that there is no algorithm that can learn the globally-optimal target function using a finite number of training samples. However, if we are only interested in a particular class of target functions (e.g., only linear functions) then the sample complexity is finite, and it depends linearly on the VC dimension on the class of target functions. == Definition == Let X {\displaystyle X} be a space which we call the input space, and Y {\displaystyle Y} be a space which we call the output space, and let Z {\displaystyle Z} denote the product X × Y {\displaystyle X\times Y} . For example, in the setting of binary classification, X {\displaystyle X} is typically a finite-dimensional vector space and Y {\displaystyle Y} is the set { − 1 , 1 } {\displaystyle \{-1,1\}} . Fix a hypothesis space H {\displaystyle {\mathcal {H}}} of functions h : X → Y {\displaystyle h\colon X\to Y} . A learning algorithm over H {\displaystyle {\mathcal {H}}} is a computable map from Z {\displaystyle Z} to H {\displaystyle {\mathcal {H}}} . In other words, it is an algorithm that takes as input a finite sequence of training samples and outputs a function from X {\displaystyle X} to Y {\displaystyle Y} . Typical learning algorithms include empirical risk minimization, without or with Tikhonov regularization. Fix a loss function L : Y × Y → R ≥ 0 {\displaystyle {\mathcal {L}}\colon Y\times Y\to \mathbb {R} _{\geq 0}} , for example, the square loss L ( y , y ′ ) = ( y − y ′ ) 2 {\displaystyle {\mathcal {L}}(y,y')=(y-y')^{2}} , where h ( x ) = y ′ {\displaystyle h(x)=y'} . For a given distribution ρ {\displaystyle \rho } on X × Y {\displaystyle X\times Y} , the expected risk of a hypothesis (a function) h ∈ H {\displaystyle h\in {\mathcal {H}}} is E ( h ) := E ρ [ L ( h ( x ) , y ) ] = ∫ X × Y L ( h ( x ) , y ) d ρ ( x , y ) {\displaystyle {\mathcal {E}}(h):=\mathbb {E} _{\rho }[{\mathcal {L}}(h(x),y)]=\int _{X\times Y}{\mathcal {L}}(h(x),y)\,d\rho (x,y)} In our setting, we have h = A ( S n ) {\displaystyle h={\mathcal {A}}(S_{n})} , where A {\displaystyle {\mathcal {A}}} is a learning algorithm and S n = ( ( x 1 , y 1 ) , … , ( x n , y n ) ) ∼ ρ n {\displaystyle S_{n}=((x_{1},y_{1}),\ldots ,(x_{n},y_{n}))\sim \rho ^{n}} is a sequence of vectors which are all drawn independently from ρ {\displaystyle \rho } . Define the optimal risk E H ∗ = inf h ∈ H E ( h ) . {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}={\underset {h\in {\mathcal {H}}}{\inf }}{\mathcal {E}}(h).} Set h n = A ( S n ) {\displaystyle h_{n}={\mathcal {A}}(S_{n})} , for each sample size n {\displaystyle n} . h n {\displaystyle h_{n}} is a random variable and depends on the random variable S n {\displaystyle S_{n}} , which is drawn from the distribution ρ n {\displaystyle \rho ^{n}} . The algorithm A {\displaystyle {\mathcal {A}}} is called consistent if E ( h n ) {\displaystyle {\mathcal {E}}(h_{n})} probabilistically converges to E H ∗ {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}} . In other words, for all ϵ , δ > 0 {\displaystyle \epsilon ,\delta >0} , there exists a positive integer N {\displaystyle N} , such that, for all sample sizes n ≥ N {\displaystyle n\geq N} , we have Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] < δ . {\displaystyle \Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]<\delta .} The sample complexity of A {\displaystyle {\mathcal {A}}} is then the minimum N {\displaystyle N} for which this holds, as a function of ρ , ϵ {\displaystyle \rho ,\epsilon } , and δ {\displaystyle \delta } . We write the sample complexity as N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} to emphasize that this value of N {\displaystyle N} depends on ρ , ϵ {\displaystyle \rho ,\epsilon } , and δ {\displaystyle \delta } . If A {\displaystyle {\mathcal {A}}} is not consistent, then we set N ( ρ , ϵ , δ ) = ∞ {\displaystyle N(\rho ,\epsilon ,\delta )=\infty } . If there exists an algorithm for which N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is finite, then we say that the hypothesis space H {\displaystyle {\mathcal {H}}} is learnable. In others words, the sample complexity N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} defines the rate of consistency of the algorithm: given a desired accuracy ϵ {\displaystyle \epsilon } and confidence δ {\displaystyle \delta } , one needs to sample N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} data points to guarantee that the risk of the output function is within ϵ {\displaystyle \epsilon } of the best possible, with probability at least 1 − δ {\displaystyle 1-\delta } . In probably approximately correct (PAC) learning, one is concerned with whether the sample complexity is polynomial, that is, whether N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is bounded by a polynomial in 1 / ϵ {\displaystyle 1/\epsilon } and 1 / δ {\displaystyle 1/\delta } . If N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is polynomial for some learning algorithm, then one says that the hypothesis space H {\displaystyle {\mathcal {H}}} is PAC-learnable. This is a stronger notion than being learnable. == Unrestricted hypothesis space: infinite sample complexity == One can ask whether there exists a learning algorithm so that the sample complexity is finite in the strong sense, that is, there is a bound on the number of samples needed so that the algorithm can learn any distribution over the input-output space with a specified target error. More formally, one asks whether there exists a learning algorithm A {\displaystyle {\mathcal {A}}} , such that, for all ϵ , δ > 0 {\displaystyle \epsilon ,\delta >0} , there exists a positive integer N {\displaystyle N} such that for all n ≥ N {\displaystyle n\geq N} , we have sup ρ ( Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] ) < δ , {\displaystyle \sup _{\rho }\left(\Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]\right)<\delta ,} where h n = A ( S n ) {\displaystyle h_{n}={\mathcal {A}}(S_{n})} , with S n = ( ( x 1 , y 1 ) , … , ( x n , y n ) ) ∼ ρ n {\displaystyle S_{n}=((x_{1},y_{1}),\ldots ,(x_{n},y_{n}))\sim \rho ^{n}} as above. The No Free Lunch Theorem says that without restrictions on the hypothesis space H {\displaystyle {\mathcal {H}}} , this is not the case, i.e., there always exist "bad" distributions for which the sample complexity is arbitrarily large. Thus, in order to make statements about the rate of convergence of the quantity sup ρ ( Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] ) , {\displaystyle \sup _{\rho }\left(\Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]\right),} one must either constrain the space of probability distributions ρ {\displaystyle \rho } , e.g. via a parametric approach, or constrain the space of hypotheses H {\displaystyle {\mathcal {H}}} , as in distribution-free approaches. == Restricted hypothesis space: finite sample-complexity == The latter approach leads to concepts such as VC dimension and Rademacher complexity which control the complexity of the space H {\displaystyle {\mathcal {H}}} . A smaller hypothesis space introduces more bias into the inference process, meaning that E H ∗ {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}} may be greater than the best possible risk in a larger space. However, by restricting the complexity of the hypothesis space it becomes possible for an algorithm to produce more uniformly consistent functions. This trade-off leads to the concept of regularization. It is a theorem from VC theory that the following three statements are equivalent for a hypothesis space H {\displaystyle {\mathcal {H}}} : H {\displaystyle {\mathcal {H}}} is PAC-learnable. The VC dimension of H {\displaystyle {\mathcal {H}}} is finite. H {\displaystyle {\mathcal {H}}} is a uniform Glivenko-Cantelli class. This gives a way to prove that certain hypothesis spaces are PAC learnable, and by extension, learnable. === An example of a PAC-learnable hypothesis space === X = R d , Y = { − 1 , 1 } {\displaystyle X=\mathbb {R} ^{d},Y=\{-1,1\}} , and let H {\displaystyle {\mathcal {H}}} be the space of affine functions on X {\displaystyle X} , that is, functions of the form x ↦ ⟨ w , x ⟩ + b {\displaystyle x\mapsto \langl
Read more →
Learning rule

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weight and bias levels of a network when it is simulated in a specific data environment. A learning rule may accept existing conditions (weights and biases) of the network, and will compare the expected result and actual result of the network to give new and improved values for the weights and biases. Depending on the complexity of the model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations. The learning rule is one of the factors which decides how fast or how accurately the neural network can be developed. Depending on the process to develop the network, there are three main paradigms of machine learning: supervised learning, unsupervised learning, and reinforcement learning. == Background == A lot of the learning methods in machine learning work similar to each other, and are based on each other, which makes it difficult to classify them in clear categories. But they can be broadly understood in 4 categories of learning methods, though these categories don't have clear boundaries and they tend to belong to multiple categories of learning methods - Hebbian - Neocognitron, Brain-state-in-a-box Gradient Descent - ADALINE, Hopfield Network, Recurrent Neural Network Competitive - Learning Vector Quantisation, Self-Organising Feature Map, Adaptive Resonance Theory Stochastic - Boltzmann Machine, Cauchy Machine Though these learning rules might appear to be based on similar ideas, they do have subtle differences, as they are a generalisation or application over the previous rule, and hence it makes sense to study them separately based on their origins and intents. === Hebbian Learning === Developed by Donald Hebb in 1949 to describe biological neuron firing. In the mid-1950s it was also applied to computer simulations of neural networks. Δ w i = η x i y {\displaystyle \Delta w_{i}=\eta x_{i}y} Where η {\displaystyle \eta } represents the learning rate, x i {\displaystyle x_{i}} represents the input of neuron i, and y is the output of the neuron. It has been shown that Hebb's rule in its basic form is unstable. Oja's Rule, BCM Theory are other learning rules built on top of or alongside Hebb's Rule in the study of biological neurons. ==== Perceptron Learning Rule (PLR) ==== The perceptron learning rule originates from the Hebbian assumption, and was used by Frank Rosenblatt in his perceptron in 1958. The net is passed to the activation (transfer) function and the function's output is used for adjusting the weights. The learning signal is the difference between the desired response and the actual response of a neuron. The step function is often used as an activation function, and the outputs are generally restricted to -1, 0, or 1. The weights are updated with w new = w old + η ( t − o ) x i {\displaystyle w_{\text{new}}=w_{\text{old}}+\eta (t-o)x_{i}} where "t" is the target value and "o" is the output of the perceptron, and η {\displaystyle \eta } is called the learning rate. The algorithm converges to the correct classification if: the training data is linearly separable η {\displaystyle \eta } is sufficiently small (though smaller η {\displaystyle \eta } generally means a longer learning time and more epochs) It should also be noted that a single layer perceptron with this learning rule is incapable of working on linearly non-separable inputs, and hence the XOR problem cannot be solved using this rule alone === Backpropagation === Seppo Linnainmaa in 1970 is said to have developed the Backpropagation Algorithm but the origins of the algorithm go back to the 1960s with many contributors. It is a generalisation of the least mean squares algorithm in the linear perceptron and the Delta Learning Rule. It implements gradient descent search through the space possible network weights, iteratively reducing the error, between the target values and the network outputs. ==== Widrow-Hoff Learning (Delta Learning Rule) ==== Similar to the perceptron learning rule but with different origin. It was developed for use in the ADALINE network, which differs from the Perceptron mainly in terms of the training. The weights are adjusted according to the weighted sum of the inputs (the net), whereas in perceptron the sign of the weighted sum was useful for determining the output as the threshold was set to 0, -1, or +1. This makes ADALINE different from the normal perceptron. Delta rule (DR) is similar to the Perceptron Learning Rule (PLR), with some differences: Error (δ) in DR is not restricted to having values of 0, 1, or -1 (as in PLR), but may have any value DR can be derived for any differentiable output/activation function f, whereas in PLR only works for threshold output function Sometimes only when the Widrow-Hoff is applied to binary targets specifically, it is referred to as Delta Rule, but the terms seem to be used often interchangeably. The delta rule is considered to a special case of the back-propagation algorithm. Delta rule also closely resembles the Rescorla-Wagner model under which Pavlovian conditioning occurs. === Competitive Learning === Competitive learning is considered a variant of Hebbian learning, but it is special enough to be discussed separately. Competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data. Models and algorithms based on the principle of competitive learning include vector quantization and self-organizing maps (Kohonen maps).
Read more →
Infer.NET

Infer.NET is a free and open source .NET software library for machine learning. It supports running Bayesian inference in graphical models and can also be used for probabilistic programming. == Overview == Infer.NET follows a model-based approach and is used to solve different kinds of machine learning problems including standard problems like classification, recommendation or clustering, customized solutions and domain-specific problems. The framework is used in various different domains such as bioinformatics, epidemiology, computer vision, and information retrieval. Development of the framework was started by a team at Microsoft's research centre in Cambridge, UK in 2004. It was first released for academic use in 2008 and later open sourced in 2018. In 2013, Microsoft was awarded the USPTO's Patents for Humanity Award in Information Technology category for Infer.NET and the work in advanced machine learning techniques. Infer.NET is used internally at Microsoft as the machine learning engine in some of their products such as Office, Azure, and Xbox. The source code is licensed under MIT License and available on GitHub. It is also available as NuGet package.
Read more →
AlphaGeometry

AlphaGeometry is an artificial intelligence (AI) program that can solve hard problems in Euclidean geometry. The system comprises a data-driven large language model (LLM) and a rule-based symbolic engine (Deductive Database Arithmetic Reasoning). It was developed by DeepMind, a subsidiary of Google. The program solved 25 geometry problems out of 30 from the International Mathematical Olympiad (IMO) under competition time limits—a performance almost as good as the average human gold medallist. For comparison, the previous AI program, called Wu's method, managed to solve only 10 problems. DeepMind published a paper about AlphaGeometry in the peer-reviewed journal Nature on 17 January 2024. AlphaGeometry was featured in MIT Technology Review on the same day. Traditional geometry programs are symbolic engines that rely exclusively on human-coded rules to generate rigorous proofs, which makes them lack flexibility in unusual situations. AlphaGeometry combines such a symbolic engine with a specialized large language model trained on synthetic data of geometrical proofs. When the symbolic engine doesn't manage to find a formal and rigorous proof on its own, it solicits the large language model, which suggests a geometrical construct to move forward. However, it is unclear how applicable this method is to other domains of mathematics or reasoning, because symbolic engines rely on domain-specific rules and because of the need for synthetic data. == AlphaGeometry 2 == AlphaGeometry 2 is an improved version of AlphaGeometry, published on February 5, 2025. They added more features to the representation language to describe more geometry problems that involve movements of objects, and problems containing linear equations of angles, ratios, and distances. They targeted IMO geometry questions from 2000 to 2024. The expanded representation language allowed them to cover 88% of the questions. It uses Gemini finetuned on a synthetically generated dataset of problems and solutions in the representation language. The model is used for making auxiliary constructions like lines and points, to help the tree search. It is also used for autoformalization, i.e. converting a problem in English to a problem in the representation language.
Read more →
Likewise, Inc.

Likewise, Inc., is an American technology startup company which provides a social networking service for finding and saving content recommendations for movies, TV shows, books, and podcasts. A team of ex-Microsoft employees founded Likewise in October 2017 with financial investment from Microsoft co-founder Bill Gates. The company is led by CEO Ian Morris and as of 2020 had a team of about 35 employees. Its headquarters operates in Bellevue, Washington. As of July 2020, 1 million users had joined the platform. == History == === Ideation (October 2017) === In 2017, former Microsoft Communications Chief Larry Cohen came up with the idea for Likewise in Bill Gates’ private office, Gates Ventures. Cohen currently serves as Gates Ventures’ CEO and managing partner. Cohen collaborated with colleagues Michael Dix and Ian Morris to co-found what would become Likewise, with Morris as its CEO. Gates funded the company's early development. The company developed its platform in stealth mode before launching publicly in October 2018. === Release (October 2018) === Likewise officially released its platform in the US and Canada on October 3, 2018. === Growth (2020 COVID-19 pandemic) === Likewise experienced accelerated growth alongside the COVID-19 pandemic. From March 2020 to July 2020, the platform's monthly active users tripled in numbers. The company reached one million users in July 2020. == Applications == === Mobile === Likewise is available as a mobile app for the Android and iOS mobile operating systems. Users receive recommendations from the Likewise algorithm, people they follow, and the Likewise editorial team. === Likewise TV === In October 2019, the company launched its Apple TV app called Likewise TV. The television app organizes shows across streaming services under one watchlist. On July 20, 2020, Likewise TV expanded to Android TV and Amazon Fire TV users.
Read more →
Open Neural Network Exchange

The Open Neural Network Exchange (ONNX) [ˈɒnɪks] is an open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and software tools to enable a standard format for representing machine learning models. ONNX is available on GitHub. == History == ONNX was originally named Toffee and was developed by the PyTorch team at Facebook. In September 2017 it was renamed to ONNX and announced by Facebook and Microsoft. Later, IBM, Huawei, Intel, AMD, Arm and Qualcomm announced support for the initiative. In October 2017, Microsoft announced that it would add its Cognitive Toolkit and Project Brainwave platform to the initiative. In November 2019 ONNX was accepted as graduate project in Linux Foundation AI. In October 2020 Zetane Systems became a member of the ONNX ecosystem. == Intent == The initiative targets: === Framework interoperability === Enable developers to move machine learning models between different frameworks, which may be used at different stages of the development process, such as training, architecture design, or deployment on mobile devices. === Shared optimization === Provide a common representation that can be used by hardware vendors and other developers to apply optimizations to artificial neural network models across multiple machine learning frameworks. == Contents == ONNX provides definitions of an extensible computation graph model, built-in operators and standard data types, focused on inferencing (evaluation).. The container format is Protocol Buffers. Each computation dataflow graph is a list of nodes that form an acyclic graph. Nodes have inputs and outputs. Each node is a call to an operator. Metadata documents the graph. Built-in operators are to be available on each ONNX-supporting framework. ONNX models can be trained in a single framework, such as PyTorch or TensorFlow, and then exported to ONNX. This format allows models to be transferred from the training framework to other environments for testing or deployment. Once a model is in ONNX format, it can be executed in different runtime systems or on various hardware platforms, such as GPUs or specialized AI accelerators. Using a common format enables the same model representation to be used across multiple systems and frameworks.
Read more →
UMBEL

UMBEL (Upper Mapping and Binding Exchange Layer) is a logically organized knowledge graph of 34,000 concepts and entity types that can be used in information science for relating information from disparate sources to one another. It was retired at the end of 2019. UMBEL was first released in July 2008. Version 1.00 was released in February 2011. Its current release is version 1.50. The grounding of this information occurs by common reference to the permanent URIs for the UMBEL concepts; the connections within the UMBEL upper ontology enable concepts from sources at different levels of abstraction or specificity to be logically related. Since UMBEL is an open-source extract of the OpenCyc knowledge base, it can also take advantage of the reasoning capabilities within Cyc. UMBEL has two means to promote the semantic interoperability of information:. It is: An ontology of about 35,000 reference concepts, designed to provide common mapping points for relating different ontologies or schema to one another, and A vocabulary for aiding that ontology mapping, including expressions of likelihood relationships distinct from exact identity or equivalence. This vocabulary is also designed for interoperable domain ontologies. UMBEL is written in the Semantic Web languages of SKOS and OWL 2. It is a class structure used in Linked Data, along with OpenCyc, YAGO, and the DBpedia ontology. Besides data integration, UMBEL has been used to aid concept search, concept definitions, query ranking, ontology integration, and ontology consistency checking. It has also been used to build large ontologies and for online question answering systems. Including OpenCyc, UMBEL has about 65,000 formal mappings to DBpedia, PROTON, GeoNames, and schema.org, and provides linkages to more than 2 million Wikipedia pages (English version). All of its reference concepts and mappings are organized under a hierarchy of 31 different "super types", which are mostly disjoint from one another. Each of these "super types" has its own typology of entity classes to provide flexible tie-ins for external content. 90% of UMBEL is contained in these entity classes.
Read more →
Ari Holtzman

Ari Holtzman is a professor of Computer Science at the University of Chicago and an expert in the area of natural language processing and computational linguistics. Previously, Holtzman was a PhD student at the University of Washington where he was advised by Luke Zettlemoyer. In 2017, he was a member of the winning team for the inaugural Alexa Prize for developing a conversational AI system for the Amazon Alexa device. Holtzman has made multiple contributions in the area of text generation and language models such as the introduction of nucleus sampling in 2019, his work on AI safety and neural fake news detection, and the fine-tuning of quantized large language models.
Read more →
Cross-validation (statistics)

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance. == Motivation == Assume a model with one or more unknown parameters, and a data set to which the model can be fit (the training data set). The fitting process optimizes the model parameters to make the model fit the training data as well as possible. If an independent sample of validation data is taken from the same population as the training data, it will generally turn out that the model does not fit the validation data as well as it fits the training data. The size of this difference is likely to be large especially when the size of the training data set is small, or when the number of parameters in the model is large. Cross-validation is a way to estimate the size of this effect. === Example: linear regression === In linear regression, there exist real response values y 1 , … , y n {\textstyle y_{1},\ldots ,y_{n}} , and n p-dimensional vector covariates x1, ..., xn. The components of the vector xi are denoted xi1, ..., xip. If least squares is used to fit a function in the form of a hyperplane ŷ = a + βTx to the data (xi, yi) 1 ≤ i ≤ n, then the fit can be assessed using the mean squared error (MSE). The MSE for given estimated parameter values a and β on the training set (xi, yi) 1 ≤ i ≤ n is defined as: MSE = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 = 1 n ∑ i = 1 n ( y i − a − β T x i ) 2 = 1 n ∑ i = 1 n ( y i − a − β 1 x i 1 − ⋯ − β p x i p ) 2 {\displaystyle {\begin{aligned}{\text{MSE}}&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-{\boldsymbol {\beta }}^{T}\mathbf {x} _{i})^{2}\\&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})^{2}\end{aligned}}} If the model is correctly specified, it can be shown under mild assumptions that the expected value of the MSE for the training set is (n − p − 1)/(n + p + 1) < 1 times the expected value of the MSE for the validation set (the expected value is taken over the distribution of training sets). Thus, a fitted model and computed MSE on the training set will result in an optimistically biased assessment of how well the model will fit an independent data set. This biased estimate is called the in-sample estimate of the fit, whereas the cross-validation estimate is an out-of-sample estimate. Since in linear regression it is possible to directly compute the factor (n − p − 1)/(n + p + 1) by which the training MSE underestimates the validation MSE under the assumption that the model specification is valid, cross-validation can be used for checking whether the model has been overfitted, in which case the MSE in the validation set will substantially exceed its anticipated value. (Cross-validation in the context of linear regression is also useful in that it can be used to select an optimally regularized cost function.) === General case === In most other regression procedures (e.g. logistic regression), there is no simple formula to compute the expected out-of-sample fit. Cross-validation is, thus, a generally applicable way to predict the performance of a model on unavailable data using numerical computation in place of theoretical analysis. == Types == Two types of cross-validation can be distinguished: exhaustive and non-exhaustive cross-validation. === Exhaustive cross-validation === Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set. ==== Leave-p-out cross-validation ==== Leave-p-out cross-validation (LpO CV) involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set. LpO cross-validation require training and validating the model C p n {\displaystyle C_{p}^{n}} times, where n is the number of observations in the original sample, and where C p n {\displaystyle C_{p}^{n}} is the binomial coefficient. For p > 1 and for even moderately large n, LpO CV can become computationally infeasible. For example, with n = 100 and p = 30, C 30 100 ≈ 3 × 10 25 . {\displaystyle C_{30}^{100}\approx 3\times 10^{25}.} A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under ROC curve of binary classifiers. ==== Leave-one-out cross-validation ==== Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1. The process looks similar to jackknife; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only. LOO cross-validation requires less computation time than LpO cross-validation because there are only C 1 n = n {\displaystyle C_{1}^{n}=n} passes rather than C p n {\displaystyle C_{p}^{n}} . However, n {\displaystyle n} passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate. Pseudo-code algorithm: Input: x, {vector of length N with x-values of incoming points} y, {vector of length N with y-values of the expected result} interpolate( x_in, y_in, x_out ), { returns the estimation for point x_out after the model is trained with x_in-y_in pairs} Output: err, {estimate for the prediction error} Steps: err ← 0 for i ← 1, ..., N do // define the cross-validation subsets x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N]) y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N]) x_out ← x[i] y_out ← interpolate(x_in, y_in, x_out) err ← err + (y[i] − y_out)^2 end for err ← err/N === Non-exhaustive cross-validation === Non-exhaustive cross validation methods do not compute all ways of splitting the original sample. These methods are approximations of leave-p-out cross-validation. ==== k-fold cross-validation ==== In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples, often referred to as "folds". Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used, but in general k remains an unfixed parameter. For example, setting k = 2 results in 2-fold cross-validation. In 2-fold cross-validation, the dataset is randomly shuffled into two sets d0 and d1, so that both sets are equal size (this is usually implemented by shuffling the data array and then splitting it in two). We then train on d0 and validate on d1, followed by training on d1 and validating on d0. When k = n (the number of observations), k-fold cross-validation is equivalent to leave-one-out cr
Read more →
DAYDREAMER

DAYDREAMER is a goal-based agent and cognitive architecture developed at the University of California, Los Angeles by Erik T. Mueller and Michael G. Dyer beginning in 1983. The system models the human stream of thought and how it is triggered and directed by emotions, simulating human daydreaming. Taking situational descriptions as input, DAYDREAMER produces English-language daydreams as output and encodes new daydreams, plans, and planning strategies for later reuse. The program comprises five components: a scenario generator based on relaxed planning, a dynamic episodic memory, a collection of personal goals and control goals, an emotion component, and domain knowledge of interpersonal relations and everyday occurrences. The source code was released under a free software license in 2015. == History == Erik Mueller began DAYDREAMER in 1983 while he was a doctoral student in the Artificial Intelligence Laboratory of the Computer Science Department at the University of California, Los Angeles, studying under Michael G. Dyer. Initial development of the project was supported by a grant from the W. M. Keck Foundation with matching funds from the UCLA School of Engineering and Applied Sciences. Additionally, Mueller was supported by an Atlantic Richfield Doctoral Fellowship and Dyer by an IBM Faculty Development Award. The first published descriptions of the program appeared in 1985 at the Ninth International Joint Conference on Artificial Intelligence in Los Angeles and at the Seventh Annual Conference of the Cognitive Science Society in Irvine. Work on the program continued, and a book, Daydreaming in Humans and Machines, was published by Ablex Publishing in 1990. The program was implemented on top of GATE, a knowledge-representation and inference substrate developed by Mueller and Uri Zernik at UCLA, and was originally written in T, a dialect of Scheme. In 2015, Mueller released the DAYDREAMER source code, version 3.5, a Common Lisp rewrite of the original T implementation, on GitHub under the GNU General Public License version 2. The release comprised approximately 12,000 lines of Common Lisp code, along with the GATE knowledge-representation substrate on which DAYDREAMER had originally been built. == Architecture == The program operates in two modes. In daydreaming mode it daydreams continuously until interrupted, while performance mode allows it to demonstrate behavior it has learned through daydreaming. === Emotion and control goals === Emotions and daydreaming form a feedback loop for DAYDREAMER. Emotions activate goals that produce daydreams, and the resulting daydreams modify existing emotions and trigger new ones, which prompt subsequent daydreaming. Recall of a goal success produces a positive emotion whereas recall of a goal failure produces a negative emotion. Emotions activate a set of goals, called control goals, which direct the course of a daydream. The program has four control goals. "Rationalization" generates reasons why an unsatisfactory outcome is in fact acceptable, in order to reduce a negative emotion and maintain self-esteem. "Revenge" is activated by anger when a failure is caused by another and reduces negative emotion through imagined retaliation. "Failure/success reversal" imagines alternative scenarios in which a failure was prevented or a success did not occur as a means of learning planning strategies for future situations. "Preparation" generates hypothetical future scenarios in order to rehearse plans and actions for events that have not yet occurred. === Scenario generator and relaxed planning === The scenario generator produces the sequence of events that make up a daydream. It operates under multiple, often conflicting personal goals rather than pursuing a single goal, applies relaxation rules that permit the generation of non-realistic scenarios, and it draws on episodic memory of past experiences both as subject matter and as a source of planning knowledge. The personal goals that guide the scenario generator include health, food, sex, friendship, love, possessions, self-esteem, social esteem, enjoyment, and achievement. These goals are organized into a goal tree that specifies their relative importance at any given time. Relaxation rules allow the program to set aside its ordinary constraints when generating a scenario. The four constraints that may be relaxed are the behavior of others, the daydreamer's own attributes, physical constraints, and social constraints. The degree of relaxation varies with the active control goal. For example a failure-reversal goal aimed at alternatives uses a low level of relaxation, whereas a revenge goal aimed at a retaliation uses a high level. === Episodic memory and analogy === DAYDREAMER's episodic memory stores its personal and vicarious experiences along with the daydreams it generates. The memory is described as dynamic because it is continually modified during daydreaming such that previously daydreamed episodes become available alongside real ones. As it daydreams, the program indexes daydreams, future plans or actions, and planning strategies into memory. Episodes are organized and retrieved using surface-level similarities, emotions, abstract themes, and Plot Units which are abstract configurations of positive and negative outcomes developed by Wendy Lehnert. A recalled episode is adapted to the current situation through analogy, which requires less effort than generating an equivalent scenario from scratch. == Sample output == In the sample experience from the source code, called LOVERS1, DAYDREAMER begins from an initial situation in which it has a job, is not romantically involved, and is at home. Starting in daydreaming mode, it activates a top-level goal to be in a romantic relationship because it is not currently in one, and a positive motivating emotion of interest becomes associated with that goal. The program then activates a goal to be entertained and pursues seeing a film as a way to achieve it. Facts asserted into memory are converted to English and produced as output, such as "I want to be going out with someone" and "I have to go see a movie". == Reception and influence == DAYDREAMER has been cited in research on computational models of creativity, emotion, and narrative. Linda Wills and Janet Kolodner cite the program as an example of work on opportunism in their study of serendipitous recognition in design. Joseph Bates, A. Bryan Loyall, and W. Scott Reilly of the Carnegie Mellon Oz Project cite DAYDREAMER among prior work in their description of an architecture combining action, emotion, and social behavior. Rafael Pérez y Pérez, Ricardo Sosa, and Christian Lemaitre cite Mueller's DAYDREAMER as one of the few computer models at the time to model daydreaming during the creative process. Jichen Zhu and D. Fox Harrell likewise cite the program in their work on imagining and agency in generative interactive narrative.
Read more →
Hyper basis function network

In machine learning, a Hyper basis function network, or HyperBF network, is a generalization of radial basis function (RBF) networks concept, where the Mahalanobis-like distance is used instead of the Euclidean distance measure. Hyper basis function networks were first introduced by Poggio and Girosi in the 1990 paper “Networks for Approximation and Learning”. == Network Architecture == The typical HyperBF network structure consists of a real input vector x ∈ R n {\displaystyle x\in \mathbb {R} ^{n}} , a hidden layer of activation functions and a linear output layer. The output of the network is a scalar function of the input vector, ϕ : R n → R {\displaystyle \phi :\mathbb {R} ^{n}\to \mathbb {R} } , is given by where N {\displaystyle N} is a number of neurons in the hidden layer, μ j {\displaystyle \mu _{j}} and a j {\displaystyle a_{j}} are the center and weight of neuron j {\displaystyle j} . The activation function ρ j ( | | x − μ j | | ) {\displaystyle \rho _{j}(||x-\mu _{j}||)} at the HyperBF network takes the following form where R j {\displaystyle R_{j}} is a positive definite d × d {\displaystyle d\times d} matrix. Depending on the application, the following types of matrices R j {\displaystyle R_{j}} are usually considered R j = 1 2 σ 2 I d × d {\displaystyle R_{j}={\frac {1}{2\sigma ^{2}}}\mathbb {I} _{d\times d}} , where σ > 0 {\displaystyle \sigma >0} . This case corresponds to the regular RBF network. R j = 1 2 σ j 2 I d × d {\displaystyle R_{j}={\frac {1}{2\sigma _{j}^{2}}}\mathbb {I} _{d\times d}} , where σ j > 0 {\displaystyle \sigma _{j}>0} . In this case, the basis functions are radially symmetric, but are scaled with different width. R j = d i a g ( 1 2 σ j 1 2 , . . . , 1 2 σ j z 2 ) I d × d {\displaystyle R_{j}=diag\left({\frac {1}{2\sigma _{j1}^{2}}},...,{\frac {1}{2\sigma _{jz}^{2}}}\right)\mathbb {I} _{d\times d}} , where σ j i > 0 {\displaystyle \sigma _{ji}>0} . Every neuron has an elliptic shape with a varying size. Positive definite matrix, but not diagonal. == Training == Training HyperBF networks involves estimation of weights a j {\displaystyle a_{j}} , shape and centers of neurons R j {\displaystyle R_{j}} and μ j {\displaystyle \mu _{j}} . Poggio and Girosi (1990) describe the training method with moving centers and adaptable neuron shapes. The outline of the method is provided below. Consider the quadratic loss of the network H [ ϕ ∗ ] = ∑ i = 1 N ( y i − ϕ ∗ ( x i ) ) 2 {\displaystyle H[\phi ^{}]=\sum _{i=1}^{N}(y_{i}-\phi ^{}(x_{i}))^{2}} . The following conditions must be satisfied at the optimum: where R j = W T W {\displaystyle R_{j}=W^{T}W} . Then in the gradient descent method the values of a j , μ j , W {\displaystyle a_{j},\mu _{j},W} that minimize H [ ϕ ∗ ] {\displaystyle H[\phi ^{}]} can be found as a stable fixed point of the following dynamic system: where ω {\displaystyle \omega } determines the rate of convergence. Overall, training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.
Read more →