AI Generator Detector

AI Generator Detector — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Synonym (database)

    Synonym (database)

    In databases, a synonym is an alias or alternate name for a table, view, sequence, or other schema object. They are used mainly to make it intuitive for users to access database objects owned by other users. They also hide the underlying object's identity and make it harder for a malicious program or user to target the underlying object (security through obscurity). Because a synonym is just an alternate name for an object, it requires no storage other than its definition. When an application uses a synonym, the DBMS forwards the request to the synonym's underlying base object. By coding your programs to use synonyms instead of database object names, you insulate yourself from any changes in the name, ownership, or object locations, at the cost of adding another layer that also needs to be maintained. Users can also have different needs, for example some may wish to use a shorter name to refer to database objects they often query, which can be done with aliases without having to rename the underlying object and alter the code referring to it. Synonyms are very powerful from the point of view of allowing users access to objects that do not lie within their schema. All synonyms have to be created explicitly with the CREATE SYNONYM command and the underlying objects can be located in the same database or in other databases that are connected by database links There are two major uses of synonyms: Object invisibility: Synonyms can be created to keep the original object hidden from the user. Location invisibility: Synonyms can be created as aliases for tables and other objects that are not part of the local database. When a table or a procedure is created, it is created in a particular schema, and other users can access it only by using that schema's name as a prefix to the object's name. The way around for this is for the schema owner creates a synonym with the same name as the table name. == Public synonyms == Public synonyms are owned by special schema in the Oracle Database called PUBLIC. As mentioned earlier, public synonyms can be referenced by all users in the database. Public synonyms are usually created by the application owner for the tables and other objects such as procedures and packages so the users of the application can see the objects The following code shows how to create a public synonym for the employee table: Now any user can see the table by just typing the original table name. If you wish, you could provide a different table name for that table in the CREATE SYNONYM statement. Remember that the DBA must create public synonyms. Just because you can see a table through public (or private) synonym doesn’t mean that you can also perform SELECT, INSERT, UPDATE or DELETE operations on the table. To be able to perform those operations, a user needs specific privileges for the underlying object, either directly or through roles from the application owner. == Private synonyms == A private synonym is a synonym within a database schema that a developer typically uses to mask the true name of a table, view stored procedure, or other database object in an application schema. Private synonyms, unlike public synonyms, can be referenced only by the schema that owns the table or object. You may want to create private synonyms when you want to refer to the same table by different contexts. Private synonym overrides public synonym definitions. You create private synonyms the same way you create public synonyms, but you omit the PUBLIC keyword in the CREATE statement. The following example shows how to create a private synonym called addresses for the locations table. Note that once you create the private synonym, you can refer to the synonym exactly as you would the original table name. == Drop a synonym == Synonyms, both private and public, are dropped in the same manner by using the DROP SYNONYM command, but there is one important difference. If you are dropping a public synonym; you need to add the keyword PUBLIC after the keyword DROP. The ALL_SYNONYMS (or DBA_SYNONYMS) view provides information on all synonyms in your database.

    Read more →
  • Software agent

    Software agent

    In computer science, a software agent is a computer program that acts for a user or another program in a relationship of agency. The term agent is derived from the Latin agere (to do): an agreement to act on one's behalf. Such "action on behalf of" implies the authority to decide which, if any, action is appropriate. Some agents are colloquially known as bots, from robot. They may be embodied, as when execution is paired with a robot body, or as software such as a chatbot executing on a computer, such as a mobile device, e.g. Siri. Software agents may be autonomous or work together with other agents or people. Software agents interacting with people (e.g. chatbots, human-robot interaction environments) may possess human-like qualities such as natural language understanding and speech, personality or embody humanoid form (see Asimo). Related and derived concepts include intelligent agents (in particular exhibiting some aspects of artificial intelligence, such as reasoning), autonomous agents (capable of modifying the methods of achieving their objectives), distributed agents (being executed on physically distinct computers), multi-agent systems (distributed agents that work together to achieve an objective that could not be accomplished by a single agent acting alone), and mobile agents (agents that can relocate their execution onto different processors). == Concepts == The basic attributes of an autonomous software agent are that agents: are not strictly invoked for a task, but activate themselves, may reside in wait status on a host, perceiving context, may get to run status on a host upon starting conditions, do not require interaction of user, may invoke other tasks including communication. The concept of an agent provides a method of describing a complex software entity that is capable of acting with a certain degree of autonomy in order to accomplish tasks on behalf of its host. But unlike objects, which are defined in terms of methods and attributes, an agent is defined in terms of its behavior. Various authors have proposed different definitions of agents, these commonly include concepts such as: persistence: code is not executed on demand but runs continuously and decides for itself when it should perform some activity; autonomy: agents have capabilities of task selection, prioritization, goal-directed behavior, decision-making without human intervention; social ability: agents are able to engage other components through some sort of communication and coordination, they may collaborate on a task; reactivity: agents perceive the context in which they operate and react to it appropriately. === Distinguishing agents from programs === All agents are programs, but not all programs are agents. Contrasting the term with related concepts may help clarify its meaning. Franklin & Graesser (1997) discuss four key notions that distinguish agents from arbitrary programs: reaction to the environment, autonomy, goal-orientation and persistence. === Intuitive distinguishing agents from objects === Agents are more autonomous than objects. Agents have flexible behavior: reactive, proactive, social. Agents have at least one thread of control but may have more. === Distinguishing agents from expert systems === Expert systems are not coupled to their environment. Expert systems are not designed for reactive, proactive behavior. Expert systems do not consider social ability. === Distinguishing intelligent software agents from intelligent agents in AI === Intelligent agents (also known as rational agents) are not just computer programs: they may also be machines, human beings, communities of human beings (such as firms) or anything that is capable of goal-directed behavior. == Impact of software agents == Software agents may offer various benefits to their end users by automating complex or repetitive tasks. However, there are organizational and cultural impacts of this technology that need to be considered prior to implementing software agents. === Organizational impact === === Work contentment and job satisfaction impact === People like to perform easy tasks providing the sensation of success unless the repetition of the simple tasking is affecting the overall output. In general implementing software agents to perform administrative requirements provides a substantial increase in work contentment, as administering their own work does never please the worker. The effort freed up serves for a higher degree of engagement in the substantial tasks of individual work. Hence, software agents may provide the basics to implement self-controlled work, relieved from hierarchical controls and interference. Such conditions may be secured by application of software agents for required formal support. === Cultural impact === The cultural effects of the implementation of software agents include trust affliction, skills erosion, privacy attrition and social detachment. Some users may not feel entirely comfortable fully delegating important tasks to software applications. Those who start relying solely on intelligent agents may lose important skills, for example, relating to information literacy. In order to act on a user's behalf, a software agent needs to have a complete understanding of a user's profile, including his/her personal preferences. This, in turn, may lead to unpredictable privacy issues. When users start relying on their software agents more, especially for communication activities, they may lose contact with other human users and look at the world with the eyes of their agents. These consequences are what agent researchers and users must consider when dealing with intelligent agent technologies. === History === The concept of an agent can be traced back to Hewitt's Actor Model (Hewitt, 1977) - "A self-contained, interactive and concurrently-executing object, possessing internal state and communication capability." To be more academic, software agent systems are a direct evolution of Multi-Agent Systems (MAS). MAS evolved from Distributed Artificial Intelligence (DAI), Distributed Problem Solving (DPS) and Parallel AI (PAI), thus inheriting all characteristics (good and bad) from DAI and AI. John Sculley's 1987 "Knowledge Navigator" video portrayed an image of a relationship between end-users and agents. Being an ideal first, this field experienced a series of unsuccessful top-down implementations, instead of a piece-by-piece, bottom-up approach. The range of agent types is now (from 1990) broad: WWW, search engines, etc. == Examples of intelligent software agents == === Buyer agents (shopping bots) === Buyer agents travel around a network (e.g. the internet) retrieving information about goods and services. These agents, also known as 'shopping bots', work very efficiently for commodity products such as CDs, books, electronic components, and other one-size-fits-all products. Buyer agents are typically optimized to allow for digital payment services used in e-commerce and traditional businesses. === User agents (personal agents) === User agents, or personal agents, are intelligent agents that take action on your behalf. In this category belong those intelligent agents that already perform, or will shortly perform, the following tasks: Check your e-mail, sort it according to the user's order of preference, and alert you when important emails arrive. Play computer games as your opponent or patrol game areas for you. Assemble customized news reports for you. There are several versions of these, including CNN. Find information for you on the subject of your choice. Fill out forms on the Web automatically for you, storing your information for future reference Scan Web pages looking for and highlighting text that constitutes the "important" part of the information there Discuss topics with you ranging from your deepest fears to sports Facilitate with online job search duties by scanning known job boards and sending the resume to opportunities who meet the desired criteria Profile synchronization across heterogeneous social networks === Monitoring-and-surveillance (predictive) agents === Monitoring and surveillance agents are used to observe and report on equipment, usually computer systems. The agents may keep track of company inventory levels, observe competitors' prices and relay them back to the company, watch stock manipulation by insider trading and rumors, etc. For example, NASA's Jet Propulsion Laboratory has an agent that monitors inventory, planning, schedules equipment orders to keep costs down, and manages food storage facilities. These agents usually monitor complex computer networks that can keep track of the configuration of each computer connected to the network. A special case of monitoring-and-surveillance agents are organizations of agents used to automate decision-making process during tactical operations. The agents monitor the status of assets (ammunition, weapons available, platforms for transport, etc.) and receive goals from hi

    Read more →
  • Apprenticeship learning

    Apprenticeship learning

    In artificial intelligence, apprenticeship learning (or learning from demonstration or imitation learning) is the process of learning by observing an expert. It can be viewed as a form of supervised learning, where the training dataset consists of task executions by a demonstration teacher. == Mapping function approach == Mapping methods try to mimic the expert by forming a direct mapping either from states to actions, or from states to reward values. For example, in 2002 researchers used such an approach to teach an AIBO robot basic soccer skills. === Inverse reinforcement learning approach === Inverse reinforcement learning (IRL) is the process of deriving a reward function from observed behavior. While ordinary "reinforcement learning" involves using rewards and punishments to learn behavior, in IRL the direction is reversed, and a robot observes a person's behavior to figure out what goal that behavior seems to be trying to achieve. The IRL problem can be defined as: Given 1) measurements of an agent's behaviour over time, in a variety of circumstances; 2) measurements of the sensory inputs to that agent; 3) a model of the physical environment (including the agent's body): Determine the reward function that the agent is optimizing. IRL researcher Stuart J. Russell proposes that IRL might be used to observe humans and attempt to codify their complex "ethical values", in an effort to create "ethical robots" that might someday know "not to cook your cat" without needing to be explicitly told. The scenario can be modeled as a "cooperative inverse reinforcement learning game", where a "person" player and a "robot" player cooperate to secure the person's implicit goals, despite these goals not being explicitly known by either the person nor the robot. In 2017, OpenAI and DeepMind applied deep learning to the cooperative inverse reinforcement learning in simple domains such as Atari games and straightforward robot tasks such as backflips. The human role was limited to answering queries from the robot as to which of two different actions were preferred. The researchers found evidence that the techniques may be economically scalable to modern systems. Apprenticeship via inverse reinforcement learning (AIRP) was developed by in 2004 Pieter Abbeel, Professor in Berkeley's EECS department, and Andrew Ng, Associate Professor in Stanford University's Computer Science Department. AIRP deals with "Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform". AIRP has been used to model reward functions of highly dynamic scenarios where there is no obvious reward function intuitively. Take the task of driving for example, there are many different objectives working simultaneously - such as maintaining safe following distance, a good speed, not changing lanes too often, etc. This task, may seem easy at first glance, but a trivial reward function may not converge to the policy wanted. One domain where AIRP has been used extensively is helicopter control. While simple trajectories can be intuitively derived, complicated tasks like aerobatics for shows has been successful. These include aerobatic maneuvers like - in-place flips, in-place rolls, loops, hurricanes and even auto-rotation landings. This work was developed by Pieter Abbeel, Adam Coates, and Andrew Ng - "Autonomous Helicopter Aerobatics through Apprenticeship Learning" === System model approach === System models try to mimic the expert by modeling world dynamics. == Plan approach == The system learns rules to associate preconditions and postconditions with each action. In one 1994 demonstration, a humanoid learns a generalized plan from only two demonstrations of a repetitive ball collection task. == Example == Learning from demonstration is often explained from a perspective that the working Robot-control-system is available and the human-demonstrator is using it. And indeed, if the software works, the Human operator takes the robot-arm, makes a move with it, and the robot will reproduce the action later. For example, he teaches the robot-arm how to put a cup under a coffeemaker and press the start-button. In the replay phase, the robot is imitating this behavior 1:1. But that is not how the system works internally; it is only what the audience can observe. In reality, Learning from demonstration is much more complex. One of the first works on learning by robot apprentices (anthropomorphic robots learning by imitation) was Adrian Stoica's PhD thesis in 1995. In 1997, robotics expert Stefan Schaal was working on the Sarcos robot-arm. The goal was simple: solve the pendulum swingup task. The robot itself can execute a movement, and as a result, the pendulum is moving. The problem is, that it is unclear what actions will result into which movement. It is an Optimal control-problem which can be described with mathematical formulas but is hard to solve. The idea from Schaal was, not to use a Brute-force solver but record the movements of a human-demonstration. The angle of the pendulum is logged over three seconds at the y-axis. This results into a diagram which produces a pattern. In computer animation, the principle is called spline animation. That means, on the x-axis the time is given, for example 0.5 seconds, 1.0 seconds, 1.5 seconds, while on the y-axis is the variable given. In most cases it's the position of an object. In the inverted pendulum it is the angle. The overall task consists of two parts: recording the angle over time and reproducing the recorded motion. The reproducing step is surprisingly simple. As an input we know, in which time step which angle the pendulum must have. Bringing the system to a state is called “Tracking control” or PID control. That means, we have a trajectory over time, and must find control actions to map the system to this trajectory. Other authors call the principle “steering behavior”, because the aim is to bring a robot to a given line.

    Read more →
  • Something Big Is Happening

    Something Big Is Happening

    "Something Big Is Happening" is an essay by Matt Shumer, an AI entrepreneur, about the impact of artificial intelligence, published in February 2026, that has since been reportedly viewed more than 80 million times and widely discussed. Shumer noted that the technology has crossed an important threshold, where AI has become capable of creating self-improving systems. Referring to one the most recent AI models, he wrote: "It was making intelligent decisions. It had something that felt, for the first time, like judgment. Like taste." Speaking to CNBC's Power Lunch, Shumer said that his "core message" is "people in the workforce should start to use and experiment with AI tools so they can understand what’s coming". Even as the essay was widely shared and discussed, the essay also elicited criticism. Paulo Carvao, in an essay published by the Forbes Magazine stated that some of his advice is sound, but added: "It reads at times like a sales pitch. He urges readers to subscribe to the most advanced AI tools. He implies that those with access to premium models will outpace those without. He frames paid AI subscriptions as a form of insurance against obsolescence." Writing in The Guardian, Dan Milmo and Aisha Down mentioned Shumer as having a history of AI hype and stated, "He previously excited the internet by announcing the release of the world's "top open-source model", which it was not". Many workers in the technology sector criticized the article in blog posts shared on Hacker News; Edward Zitron commented that "while coding LLMs can test products, or scan/fix some bugs, this suggests they A) do this autonomously without human input, B) they do this correctly every time (or ever!)." In an article alluding to Shumer's original post, Ari Colaprete wrote "the LLM is fundamentally a writing machine, it does everything via text, and if you make it produce writing that exists purely to serve some sort of mechanical function, and you train it to succeed in that task, then it will tend to do so, even with vast intricacy."

    Read more →
  • Meta-learning (computer science)

    Meta-learning (computer science)

    Meta-learning is a subfield of machine learning where automatic learning algorithms are applied to metadata about machine learning experiments. As of 2017, the term had not found a standard interpretation, however the main goal is to use such metadata to understand how automatic learning can become flexible in solving learning problems, hence to improve the performance of existing learning algorithms or to learn (induce) the learning algorithm itself, hence the alternative term learning to learn. Flexibility is important because each learning algorithm is based on a set of assumptions about the data, its inductive bias. This means that it will only learn well if the bias matches the learning problem. A learning algorithm may perform very well in one domain, but not on the next. This poses strong restrictions on the use of machine learning or data mining techniques, since the relationship between the learning problem (often some kind of database) and the effectiveness of different learning algorithms is not yet understood. By using different kinds of metadata, like properties of the learning problem, algorithm properties (like performance measures), or patterns previously derived from the data, it is possible to learn, select, alter or combine different learning algorithms to effectively solve a given learning problem. Critiques of meta-learning approaches bear a strong resemblance to the critique of metaheuristic, a possibly related problem. A good analogy to meta-learning, and the inspiration for Jürgen Schmidhuber's early work (1987) and Yoshua Bengio et al.'s work (1991), considers that genetic evolution learns the learning procedure encoded in genes and executed in each individual's brain. In an open-ended hierarchical meta-learning system using genetic programming, better evolutionary methods can be learned by meta evolution, which itself can be improved by meta meta evolution, etc. == Definition == A proposed definition for a meta-learning system combines three requirements: The system must include a learning subsystem. Experience is gained by exploiting meta knowledge extracted in a previous learning episode on a single dataset, or from different domains. Learning bias must be chosen dynamically. Bias refers to the assumptions that influence the choice of explanatory hypotheses and not the notion of bias represented in the bias-variance dilemma. Meta-learning is concerned with two aspects of learning bias. Declarative bias specifies the representation of the space of hypotheses, and affects the size of the search space (e.g., represent hypotheses using linear functions only). Procedural bias imposes constraints on the ordering of the inductive hypotheses (e.g., preferring smaller hypotheses). == Common approaches == There are three common approaches: using (cyclic) networks with external or internal memory (model-based) learning effective distance metrics (metrics-based) explicitly optimizing model parameters for fast learning (optimization-based). === Model-Based === Model-based meta-learning models updates its parameters rapidly with a few training steps, which can be achieved by its internal architecture or controlled by another meta-learner model. ==== Memory-Augmented Neural Networks ==== A Memory-Augmented Neural Network, or MANN for short, is claimed to be able to encode new information quickly and thus to adapt to new tasks after only a few examples. ==== Meta Networks ==== Meta Networks (MetaNet) learns a meta-level knowledge across tasks and shifts its inductive biases via fast parameterization for rapid generalization. === Metric-Based === The core idea in metric-based meta-learning is similar to nearest neighbors algorithms, which weight is generated by a kernel function. It aims to learn a metric or distance function over objects. The notion of a good metric is problem-dependent. It should represent the relationship between inputs in the task space and facilitate problem solving. ==== Convolutional Siamese Neural Network ==== Siamese neural network is composed of two twin networks whose output is jointly trained. There is a function above to learn the relationship between input data sample pairs. The two networks are the same, sharing the same weight and network parameters. ==== Matching Networks ==== Matching Networks learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. ==== Relation Network ==== The Relation Network (RN), is trained end-to-end from scratch. During meta-learning, it learns to learn a deep distance metric to compare a small number of images within episodes, each of which is designed to simulate the few-shot setting. ==== Prototypical Networks ==== Prototypical Networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve satisfied results. === Optimization-Based === What optimization-based meta-learning algorithms intend for is to adjust the optimization algorithm so that the model can be good at learning with a few examples. ==== LSTM Meta-Learner ==== LSTM-based meta-learner is to learn the exact optimization algorithm used to train another learner neural network classifier in the few-shot regime. The parametrization allows it to learn appropriate parameter updates specifically for the scenario where a set amount of updates will be made, while also learning a general initialization of the learner (classifier) network that allows for quick convergence of training. ==== Temporal Discreteness ==== Model-Agnostic Meta-Learning (MAML) is a fairly general optimization algorithm, compatible with any model that learns through gradient descent. ==== Reptile ==== Reptile is a remarkably simple meta-learning optimization algorithm, given that both of its components rely on meta-optimization through gradient descent and both are model-agnostic. == Examples == Some approaches which have been viewed as instances of meta-learning: Recurrent neural networks (RNNs) are universal computers. In 1993, Jürgen Schmidhuber showed how "self-referential" RNNs can in principle learn by backpropagation to run their own weight change algorithm, which may be quite different from backpropagation. In 2001, Sepp Hochreiter & A.S. Younger & P.R. Conwell built a successful supervised meta-learner based on Long short-term memory RNNs. It learned through backpropagation a learning algorithm for quadratic functions that is much faster than backpropagation. Researchers at Deepmind (Marcin Andrychowicz et al.) extended this approach to optimization in 2017. In the 1990s, Meta Reinforcement Learning or Meta RL was achieved in Schmidhuber's research group through self-modifying policies written in a universal programming language that contains special instructions for changing the policy itself. There is a single lifelong trial. The goal of the RL agent is to maximize reward. It learns to accelerate reward intake by continually improving its own learning algorithm which is part of the "self-referential" policy. An extreme type of Meta Reinforcement Learning is embodied by the Gödel machine, a theoretical construct which can inspect and modify any part of its own software which also contains a general theorem prover. It can achieve recursive self-improvement in a provably optimal way. Model-Agnostic Meta-Learning (MAML) was introduced in 2017 by Chelsea Finn et al. Given a sequence of tasks, the parameters of a given model are trained such that few iterations of gradient descent with few training data from a new task will lead to good generalization performance on that task. MAML "trains the model to be easy to fine-tune." MAML was successfully applied to few-shot image classification benchmarks and to policy-gradient-based reinforcement learning. Variational Bayes-Adaptive Deep RL (VariBAD) was introduced in 2019. While MAML is optimization-based, VariBAD is a model-based method for meta reinforcement learning, and leverages a variational autoencoder to capture the task information in an internal memory, thus conditioning its decision making on the task. When addressing a set of tasks, most meta learning approaches optimize the average score across all tasks. Hence, certain tasks may be sacrificed in favor of the average score, which is often unacceptable in real-world applications. By contrast, Robust Meta Reinforcement Learning (RoML) focuses on improving low-score tasks, increasing robustness to the selection of task. RoML works as a meta-algorithm, as it can be applied on top of other meta learning algorithms (such as MAML and VariBAD) to increase their robustness. It is applicable to both supervised meta learning and meta reinforcement learning. Discovering meta-knowledge works by inducing knowledge

    Read more →
  • Artificial brain

    Artificial brain

    An artificial brain (or artificial mind) is software and hardware with cognitive abilities similar to those of the animal or human brain. Research investigating "artificial brains" and brain emulation plays three important roles in science: An ongoing attempt by neuroscientists to understand how the human brain works, known as cognitive neuroscience. A thought experiment in the philosophy of artificial intelligence, demonstrating that it is possible, at least in theory, to create a machine that has all the capabilities of a human being. A long-term project to create machines exhibiting behavior comparable to those of animals with complex central nervous system such as mammals and most particularly humans. The ultimate goal of creating a machine exhibiting human-like behavior or intelligence is sometimes called strong AI. An example of the first objective is the project reported by Aston University in Birmingham, England where researchers are using biological cells to create "neurospheres" (small clusters of neurons) in order to develop new treatments for diseases including Alzheimer's, motor neurone and Parkinson's disease. The second objective is a reply to arguments such as John Searle's Chinese room argument, Hubert Dreyfus's critique of AI or Roger Penrose's argument in The Emperor's New Mind. These critics argued that there are aspects of human consciousness or expertise that can not be simulated by machines. One reply to their arguments is that the biological processes inside the brain can be simulated to any degree of accuracy. This reply was made as early as 1950, by Alan Turing in his classic paper "Computing Machinery and Intelligence". The third objective is generally called artificial general intelligence by researchers. However, Ray Kurzweil prefers the term "strong AI". In his book The Singularity is Near, he focuses on whole brain emulation using conventional computing machines as an approach to implementing artificial brains, and claims (on grounds of computer power continuing an exponential growth trend) that this could be done by 2025. Henry Markram, director of the Blue Brain project (which is attempting brain emulation), made a similar claim (2020) at the Oxford TED conference in 2009. == Approaches to brain simulation == W. Ross Ashby's pioneering work in cybernetics provided an early mathematical framework for understanding adaptive brain-like systems. In his 1952 book Design for a Brain, Ashby proposed that the brain could be modeled as an ultrastable system that maintains equilibrium through continuous adaptation to environmental perturbations. His approach used differential equations and state-space models to describe how neural systems could exhibit purposeful behavior through feedback mechanisms. Ashby's homeostat, a physical machine built in 1948, demonstrated these principles through an electromechanical device with four interconnected units that automatically adjusted their parameters to maintain stability when disturbed. The homeostat represented one of the first attempts to build an artificial system exhibiting brain-like adaptive behavior, influencing subsequent work in adaptive systems, neural networks, and artificial intelligence. Although direct human brain emulation using artificial neural networks on a high-performance computing engine is a commonly discussed approach, there are other approaches. An alternative artificial brain implementation could be based on Holographic Neural Technology (HNeT) non linear phase coherence/decoherence principles. The analogy has been made to quantum processes through the core synaptic algorithm which has strong similarities to the quantum mechanical wave equation. EvBrain is a form of evolutionary software that can evolve "brainlike" neural networks, such as the network immediately behind the retina. In November 2008, IBM received a US$4.9 million grant from the Pentagon for research into creating intelligent computers. The Blue Brain project is being conducted with the assistance of IBM in Lausanne. The project is based on the premise that it is possible to artificially link the neurons "in the computer" by placing thirty million synapses in their proper three-dimensional position. Some proponents of strong AI speculated in 2009 that computers in connection with Blue Brain and Soul Catcher may exceed human intellectual capacity by around 2015, and that it is likely that we will be able to download the human brain at some time around 2050. While Blue Brain is able to represent complex neural connections on the large scale, the project does not achieve the link between brain activity and behaviors executed by the brain. In 2012, project Spaun (Semantic Pointer Architecture Unified Network) attempted to model multiple parts of the human brain through large-scale representations of neural connections that generate complex behaviors in addition to mapping. Spaun's design recreates elements of human brain anatomy. The model, consisting of approximately 2.5 million neurons, includes features of the visual and motor cortices, GABAergic and dopaminergic connections, the ventral tegmental area (VTA), substantia nigra, and others. The design allows for several functions in response to eight tasks, using visual inputs of typed or handwritten characters and outputs carried out by a mechanical arm. Spaun's functions include copying a drawing, recognizing images, and counting. There are good reasons to believe that, regardless of implementation strategy, the predictions of realising artificial brains in the near future are optimistic. In particular brains (including the human brain) and cognition are not currently well understood, and the scale of computation required is unknown. Another near term limitation is that all current approaches for brain simulation require orders of magnitude larger power consumption compared with a human brain. The human brain consumes about 20 W of power, whereas current supercomputers may use as much as 1 MW—i.e., an order of 100,000 more. == Artificial brain thought experiment == Some critics of brain simulation believe that it is simpler to create general intelligent action directly without imitating nature. Some commentators have used the analogy that early attempts to construct flying machines modeled them after birds, but that modern aircraft do not look like birds.

    Read more →
  • Inductive programming

    Inductive programming

    Inductive programming (IP) is a special area of automatic programming, covering research from artificial intelligence and programming, which addresses learning of typically declarative (logic or functional) and often recursive programs from incomplete specifications, such as input/output examples or constraints. Depending on the programming language used, there are several kinds of inductive programming. Inductive functional programming, which uses functional programming languages such as Lisp or Haskell, and most especially inductive logic programming, which uses logic programming languages such as Prolog and other logical representations such as description logics, have been more prominent, but other (programming) language paradigms have also been used, such as constraint programming or probabilistic programming. == Definition == Inductive programming incorporates all approaches which are concerned with learning programs or algorithms from incomplete (formal) specifications. Possible inputs in an IP system are a set of training inputs and corresponding outputs or an output evaluation function, describing the desired behavior of the intended program, traces or action sequences which describe the process of calculating specific outputs, constraints for the program to be induced concerning its time efficiency or its complexity, various kinds of background knowledge such as standard data types, predefined functions to be used, program schemes or templates describing the data flow of the intended program, heuristics for guiding the search for a solution or other biases. Output of an IP system is a program in some arbitrary programming language containing conditionals and loop or recursive control structures, or any other kind of Turing-complete representation language. In many applications the output program must be correct with respect to the examples and partial specification, and this leads to the consideration of inductive programming as a special area inside automatic programming or program synthesis, usually opposed to 'deductive' program synthesis, where the specification is usually complete. In other cases, inductive programming is seen as a more general area where any declarative programming or representation language can be used and we may even have some degree of error in the examples, as in general machine learning, the more specific area of structure mining or the area of symbolic artificial intelligence. A distinctive feature is the number of examples or partial specification needed. Typically, inductive programming techniques can learn from just a few examples. The diversity of inductive programming usually comes from the applications and the languages that are used: apart from logic programming and functional programming, other programming paradigms and representation languages have been used or suggested in inductive programming, such as functional logic programming, constraint programming, probabilistic programming, abductive logic programming, modal logic, action languages, agent languages and many types of imperative languages. == History == The early works of Plotkin, and his "relative least general generalization (rlgg)", had an enormous impact in inductive logic programming. There were some encouraging results on learning recursive Prolog programs such as quicksort from examples together with suitable background knowledge, for example with GOLEM. However, after initial success, the community got disappointed by limited progress about the induction of recursive programs with ILP less and less focusing on recursive programs and leaning more and more towards a machine learning setting with applications in relational data mining and knowledge discovery. In parallel to work in ILP, Koza proposed genetic programming in the early 1990s as a generate-and-test based approach to learning programs. The idea of genetic programming was further developed into the inductive programming system ADATE and the systematic-search-based system MagicHaskeller. Here again, functional programs are learned from sets of positive examples together with an output evaluation (fitness) function which specifies the desired input/output behavior of the program to be learned. The early work in grammar induction (also known as grammatical inference) is related to inductive programming, as rewriting systems or logic programs can be used to represent production rules. In fact, early works in inductive inference considered grammar induction and Lisp program inference as basically the same problem. The results in terms of learnability were related to classical concepts, such as identification-in-the-limit, as introduced in the seminal work of Gold. More recently, the language learning problem was addressed by the inductive programming community. In the recent years, the classical approaches have been resumed and advanced with great success. Therefore, the synthesis problem has been reformulated on the background of constructor-based term rewriting systems taking into account modern techniques of functional programming, as well as moderate use of search-based strategies and usage of background knowledge as well as automatic invention of subprograms. Many new and successful applications have recently appeared beyond program synthesis, most especially in the area of data manipulation, programming by example and cognitive modelling (see below). Other ideas have also been explored with the common characteristic of using declarative languages for the representation of hypotheses. For instance, the use of higher-order features, schemes or structured distances have been advocated for a better handling of recursive data types and structures; abstraction has also been explored as a more powerful approach to cumulative learning and function invention. One powerful paradigm that has been recently used for the representation of hypotheses in inductive programming (generally in the form of generative models) is probabilistic programming (and related paradigms, such as stochastic logic programs and Bayesian logic programming). == Application areas == The first workshop on Approaches and Applications of Inductive Programming (AAIP) Archived 2016-03-03 at the Wayback Machine held in conjunction with ICML 2005 identified all applications where "learning of programs or recursive rules are called for, [...] first in the domain of software engineering where structural learning, software assistants and software agents can help to relieve programmers from routine tasks, give programming support for end users, or support of novice programmers and programming tutor systems. Further areas of application are language learning, learning recursive control rules for AI-planning, learning recursive concepts in web-mining or for data-format transformations". Since then, these and many other areas have shown to be successful application niches for inductive programming, such as end-user programming, the related areas of programming by example and programming by demonstration, and intelligent tutoring systems. Other areas where inductive inference has been recently applied are knowledge acquisition, artificial general intelligence, reinforcement learning and theory evaluation, and cognitive science in general. There may also be prospective applications in intelligent agents, games, robotics, personalisation, ambient intelligence and human interfaces.

    Read more →
  • TensorFlow Hub

    TensorFlow Hub

    TensorFlow Hub (also styled TF Hub) is an open-source machine learning library and online repository that provides TensorFlow model components, called modules. It is maintained by Google as part of the TensorFlow ecosystem and allows developers to discover, publish, and reuse pretrained models for tasks such as computer vision, natural language processing, and transfer learning. == Overview == TensorFlow Hub provides a central platform where developers and researchers can access pre-trained models and integrate them directly into TensorFlow workflows. Each module encapsulates a computation graph and its trained weights, with standardized input and output signatures. Modules can be loaded using the hub.load() function or through Keras integration via hub.KerasLayer, enabling users to perform transfer learning or feature extraction. == History == TensorFlow Hub was announced by Google in March 2018, with the first public version released shortly after. Its introduction coincided with the growing adoption of transfer learning techniques and the need for standardized model packaging. Over time, the hub expanded to include models such as the BERT family, MobileNet, EfficientNet, and the Universal Sentence Encoder. In 2020, research on “Regret selection in TensorFlow Hub” explored the problem of identifying optimal models for downstream tasks given a large repository of alternatives. == Applications == TensorFlow Hub hosts a variety of models across machine learning domains: Natural language processing: BERT, ALBERT language model, and Universal Sentence Encoder. Computer vision: ResNet, Inception (deep learning), MobileNet, EfficientNet. Speech and audio: spectrogram feature extractors and automatic speech recognition models. Multilingual embeddings: cross-lingual and sentence-level representations for machine translation and semantic similarity. Modules are widely used in education, academic research, and industry for prototyping and production deployment.

    Read more →
  • Machine-learned interatomic potential

    Machine-learned interatomic potential

    Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs. Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials. Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations. These models thus remained confined to academia. Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results. Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair. Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules. == Gaussian Approximation Potential (GAP) == One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP), which combines compact descriptors of local atomic environments with Gaussian process regression to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon, silicon, phosphorus, and tungsten, as well as for multicomponent systems such as Ge2Sb2Te5 and austenitic stainless steel, Fe7Cr2Ni. == Equivariant graph neural networks == A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP (2021), MACE (2022), and GemNet-OC (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs. == Universal MLIPs and large-scale datasets == Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs. A key driver was the Open Catalyst Project (OC20, OC22), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 2023 and OpenDAC 2025), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech. These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022), addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time. Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA) in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field. == Applications == Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions. Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows. Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25) dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case. Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.

    Read more →
  • Behavior informatics

    Behavior informatics

    Behavior informatics (BI) is the informatics of behaviors so as to obtain behavior intelligence and behavior insights. BI is a research method combining science and technology, specifically in the area of engineering. The purpose of BI includes analysis of current behaviors as well as the inference of future possible behaviors. This occurs through pattern recognition. Different from applied behavior analysis from the psychological perspective, BI builds computational theories, systems and tools to qualitatively and quantitatively model, represent, analyze, and manage behaviors of individuals, groups and/or organizations. BI is built on classic study of behavioral science, including behavior modeling, applied behavior analysis, behavior analysis, behavioral economics, and organizational behavior. Typical BI tasks consist of individual and group behavior formation, representation, computational modeling, analysis, learning, simulation, and understanding of behavior impact, utility, non-occurring behaviors, etc. for behavior intervention and management. The Behavior Informatics approach to data utilizes cognitive as well as behavioral data. By combining the data, BI has the potential to effectively illustrate the big picture when it comes to behavioral decisions and patterns. One of the goals of BI is also to be able to study human behavior while eliminating issues like self-report bias. This creates more reliable and valid information for research studies. == Behavior == From an Informatics perspective, a behavior consists of three key elements: actors (behavioral subjects and objects), operations (actions, activities) and interactions (relationships), and their properties. A behavior can be represented as a behavior vector, all behaviors of an actor or an actor group can be represented as behavior sequences and multi-dimensional behavior matrix. The following table explains some of the elements of behavior. Behavior Informatics takes into account behavior when analyzing business patterns and intelligence. The inclusion of behavior in these analyses provides prominent information on social and driving factors of patterns. == Applications == Behavior Informatics is being used in a variety of settings, including but not limited to health care management, telecommunications, marketing, and security. Behavior Informatics provides a manner in which to analyze and organize the many aspects that go into a person's health care needs and decisions. When it comes to business models, behavior informatics may be utilized for a similar role. Organizations implement behavior informatics to enhance business structure and regime, where it helps moderate ideal business decisions and situations.

    Read more →
  • Learnable function class

    Learnable function class

    In statistical learning theory, a learnable function class is a set of functions for which an algorithm can be devised to asymptotically minimize the expected risk, uniformly over all probability distributions. The concept of learnable classes are closely related to regularization in machine learning, and provides large sample justifications for certain learning algorithms. == Definition == === Background === Let Ω = X × Y = { ( x , y ) } {\displaystyle \Omega ={\mathcal {X}}\times {\mathcal {Y}}=\{(x,y)\}} be the sample space, where y {\displaystyle y} are the labels and x {\displaystyle x} are the covariates (predictors). F = { f : X ↦ Y } {\displaystyle {\mathcal {F}}=\{f:{\mathcal {X}}\mapsto {\mathcal {Y}}\}} is a collection of mappings (functions) under consideration to link x {\displaystyle x} to y {\displaystyle y} . L : Y × Y ↦ R {\displaystyle L:{\mathcal {Y}}\times {\mathcal {Y}}\mapsto \mathbb {R} } is a pre-given loss function (usually non-negative). Given a probability distribution P ( x , y ) {\displaystyle P(x,y)} on Ω {\displaystyle \Omega } , define the expected risk I P ( f ) {\displaystyle I_{P}(f)} to be: I P ( f ) = ∫ L ( f ( x ) , y ) d P ( x , y ) {\displaystyle I_{P}(f)=\int L(f(x),y)dP(x,y)} The general goal in statistical learning is to find the function in F {\displaystyle {\mathcal {F}}} that minimizes the expected risk. That is, to find solutions to the following problem: f ^ = arg ⁡ min f ∈ F I P ( f ) {\displaystyle {\hat {f}}=\arg \min _{f\in {\mathcal {F}}}I_{P}(f)} But in practice the distribution P {\displaystyle P} is unknown, and any learning task can only be based on finite samples. Thus we seek instead to find an algorithm that asymptotically minimizes the empirical risk, i.e., to find a sequence of functions { f ^ n } n = 1 ∞ {\displaystyle \{{\hat {f}}_{n}\}_{n=1}^{\infty }} that satisfies lim n → ∞ P ( I P ( f ^ n ) − inf f ∈ F I P ( f ) > ϵ ) = 0 {\displaystyle \lim _{n\rightarrow \infty }\mathbb {P} (I_{P}({\hat {f}}_{n})-\inf _{f\in {\mathcal {F}}}I_{P}(f)>\epsilon )=0} One usual algorithm to find such a sequence is through empirical risk minimization. === Learnable function class === We can make the condition given in the above equation stronger by requiring that the convergence is uniform for all probability distributions. That is: The intuition behind the more strict requirement is as such: the rate at which sequence { f ^ n } {\displaystyle \{{\hat {f}}_{n}\}} converges to the minimizer of the expected risk can be very different for different P ( x , y ) {\displaystyle P(x,y)} . Because in real world the true distribution P {\displaystyle P} is always unknown, we would want to select a sequence that performs well under all cases. However, by the no free lunch theorem, such a sequence that satisfies (1) does not exist if F {\displaystyle {\mathcal {F}}} is too complex. This means we need to be careful and not allow too "many" functions in F {\displaystyle {\mathcal {F}}} if we want (1) to be a meaningful requirement. Specifically, function classes that ensure the existence of a sequence { f ^ n } {\displaystyle \{{\hat {f}}_{n}\}} that satisfies (1) are known as learnable classes. It is worth noting that at least for supervised classification and regression problems, if a function class is learnable, then the empirical risk minimization automatically satisfies (1). Thus in these settings not only do we know that the problem posed by (1) is solvable, we also immediately have an algorithm that gives the solution. == Interpretations == If the true relationship between y {\displaystyle y} and x {\displaystyle x} is y ∼ f ∗ ( x ) {\displaystyle y\sim f^{}(x)} , then by selecting the appropriate loss function, f ∗ {\displaystyle f^{}} can always be expressed as the minimizer of the expected loss across all possible functions. That is, f ∗ = arg ⁡ min f ∈ F ∗ I P ( f ) {\displaystyle f^{}=\arg \min _{f\in {\mathcal {F}}^{}}I_{P}(f)} Here we let F ∗ {\displaystyle {\mathcal {F}}^{}} be the collection of all possible functions mapping X {\displaystyle {\mathcal {X}}} onto Y {\displaystyle {\mathcal {Y}}} . f ∗ {\displaystyle f^{}} can be interpreted as the actual data generating mechanism. However, the no free lunch theorem tells us that in practice, with finite samples we cannot hope to search for the expected risk minimizer over F ∗ {\displaystyle {\mathcal {F}}^{}} . Thus we often consider a subset of F ∗ {\displaystyle {\mathcal {F}}^{}} , F {\displaystyle {\mathcal {F}}} , to carry out searches on. By doing so, we risk that f ∗ {\displaystyle f^{}} might not be an element of F {\displaystyle {\mathcal {F}}} . This tradeoff can be mathematically expressed as In the above decomposition, part ( b ) {\displaystyle (b)} does not depend on the data and is non-stochastic. It describes how far away our assumptions ( F {\displaystyle {\mathcal {F}}} ) are from the truth ( F ∗ {\displaystyle {\mathcal {F}}^{}} ). ( b ) {\displaystyle (b)} will be strictly greater than 0 if we make assumptions that are too strong ( F {\displaystyle {\mathcal {F}}} too small). On the other hand, failing to put enough restrictions on F {\displaystyle {\mathcal {F}}} will cause it to be not learnable, and part ( a ) {\displaystyle (a)} will not stochastically converge to 0. This is the well-known overfitting problem in statistics and machine learning literature. == Example: Tikhonov regularization == A good example where learnable classes are used is the so-called Tikhonov regularization in reproducing kernel Hilbert space (RKHS). Specifically, let F ∗ {\displaystyle {\mathcal {F^{}}}} be an RKHS, and | | ⋅ | | 2 {\displaystyle ||\cdot ||_{2}} be the norm on F ∗ {\displaystyle {\mathcal {F^{}}}} given by its inner product. It is shown in that F = { f : | | f | | 2 ≤ γ } {\displaystyle {\mathcal {F}}=\{f:||f||_{2}\leq \gamma \}} is a learnable class for any finite, positive γ {\displaystyle \gamma } . The empirical minimization algorithm to the dual form of this problem is arg ⁡ min f ∈ F ∗ { ∑ i = 1 n L ( f ( x i ) , y i ) + λ | | f | | 2 } {\displaystyle \arg \min _{f\in {\mathcal {F}}^{}}\left\{\sum _{i=1}^{n}L(f(x_{i}),y_{i})+\lambda ||f||_{2}\right\}} This was first introduced by Tikhonov to solve ill-posed problems. Many statistical learning algorithms can be expressed in such a form (for example, the well-known ridge regression). The tradeoff between ( a ) {\displaystyle (a)} and ( b ) {\displaystyle (b)} in (2) is geometrically more intuitive with Tikhonov regularization in RKHS. We can consider a sequence of { F γ } {\displaystyle \{{\mathcal {F}}_{\gamma }\}} , which are essentially balls in F ∗ {\displaystyle {\mathcal {F^{}}}} with centers at 0. As γ {\displaystyle \gamma } gets larger, F γ {\displaystyle {\mathcal {F}}_{\gamma }} gets closer to the entire space, and ( b ) {\displaystyle (b)} is likely to become smaller. However we will also suffer smaller convergence rates in ( a ) {\displaystyle (a)} . The way to choose an optimal γ {\displaystyle \gamma } in finite sample settings is usually through cross-validation. == Relationship to empirical process theory == Part ( a ) {\displaystyle (a)} in (2) is closely linked to empirical process theory in statistics, where the empirical risk { ∑ i = 1 n L ( y i , f ( x i ) ) , f ∈ F } {\displaystyle \{\sum _{i=1}^{n}L(y_{i},f(x_{i})),f\in {\mathcal {F}}\}} are known as empirical processes. In this field, the function class F {\displaystyle {\mathcal {F}}} that satisfies the stochastic convergence are known as uniform Glivenko–Cantelli classes. It has been shown that under certain regularity conditions, learnable classes and uniformly Glivenko-Cantelli classes are equivalent. Interplay between ( a ) {\displaystyle (a)} and ( b ) {\displaystyle (b)} in statistics literature is often known as the bias-variance tradeoff. However, note that in the authors gave an example of stochastic convex optimization for General Setting of Learning where learnability is not equivalent with uniform convergence.

    Read more →
  • Neural scaling law

    Neural scaling law

    In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size, and training cost. Some models also exhibit performance gains by scaling inference through increased test-time compute (TTC), extending neural scaling laws beyond training to the deployment phase. == Introduction == In general, a deep learning model can be characterized by four parameters: model size, training dataset size, training cost, and the post-training error rate (e.g., the test set error rate). Each of these variables can be defined as a real number, usually written as N , D , C , L {\displaystyle N,D,C,L} (respectively: parameter count, dataset size, computing cost, and loss). A neural scaling law is a theoretical or empirical statistical law between these parameters. There are also other parameters with other scaling laws. === Size of the model === In most cases, the model's size is simply the number of parameters. However, one complication arises with the use of sparse models, such as mixture-of-expert models. With sparse models, during inference, only a fraction of their parameters are used. In comparison, most other kinds of neural networks, such as transformer models, always use all their parameters during inference. === Size of the training dataset === The size of the training dataset is usually quantified by the number of data points within it. Larger training datasets are typically preferred, as they provide a richer and more diverse source of information from which the model can learn. This can lead to improved generalization performance when the model is applied to new, unseen data. However, increasing the size of the training dataset also increases the computational resources and time required for model training. With the "pretrain, then finetune" method used for most large language models, there are two kinds of training dataset: the pretraining dataset and the finetuning dataset. Their sizes have different effects on model performance. Generally, the finetuning dataset is less than 1% the size of pretraining dataset. In some cases, a small amount of high quality data suffices for finetuning, and more data does not necessarily improve performance. Many scaling laws, due to their inherent diminishing returns nature, value data based on a submodular set function which was shown in a paper on this topic. === Cost of training === Training cost is typically measured in terms of time (how long it takes to train the model) and computational resources (how much processing power and memory are required). It is important to note that the cost of training can be significantly reduced with efficient training algorithms, optimized software libraries, and parallel computing on specialized hardware such as GPUs or TPUs. The cost of training a neural network model is a function of several factors, including model size, training dataset size, the training algorithm complexity, and the computational resources available. In particular, doubling the training dataset size does not necessarily double the cost of training, because one may train the model for several times over the same dataset (each being an "epoch"). === Performance === The performance of a neural network model is evaluated based on its ability to accurately predict the output given some input data. Common metrics for evaluating model performance include: Negative log-likelihood per token (logarithm of perplexity) for language modeling; Accuracy, precision, recall, and F1 score for classification tasks; Mean squared error (MSE) or mean absolute error (MAE) for regression tasks; Elo rating in a competition against other models, such as gameplay or preference by a human judge. Performance can be improved by using more data, larger models, different training algorithms, regularizing the model to prevent overfitting, and early stopping using a validation set. When the performance is a number bounded within the range of [ 0 , 1 ] {\displaystyle [0,1]} , such as accuracy, precision, etc., it often scales as a sigmoid function of cost, as seen in the figures. == Examples == === (Hestness, Narang, et al, 2017) === The 2017 paper is a common reference point for neural scaling laws fitted by statistical analysis on experimental data. Previous works before the 2000s, as cited in the paper, were either theoretical or orders of magnitude smaller in scale. Whereas previous works generally found the scaling exponent to scale like L ∝ D − α {\displaystyle L\propto D^{-\alpha }} , with α ∈ { 0.5 , 1 , 2 } {\displaystyle \alpha \in \{0.5,1,2\}} , the paper found that α ∈ [ 0.07 , 0.35 ] {\displaystyle \alpha \in [0.07,0.35]} . Of the factors they varied, only task can change the exponent α {\displaystyle \alpha } . Changing the architecture optimizers, regularizers, and loss functions, would only change the proportionality factor, not the exponent. For example, for the same task, one architecture might have L = 1000 D − 0.3 {\displaystyle L=1000D^{-0.3}} while another might have L = 500 D − 0.3 {\displaystyle L=500D^{-0.3}} . They also found that for a given architecture, the number of parameters necessary to reach lowest levels of loss, given a fixed dataset size, grows like N ∝ D β {\displaystyle N\propto D^{\beta }} for another exponent β {\displaystyle \beta } . They studied machine translation with LSTM ( α ∼ 0.13 {\displaystyle \alpha \sim 0.13} ), generative language modelling with LSTM ( α ∈ [ 0.06 , 0.09 ] , β ≈ 0.7 {\displaystyle \alpha \in [0.06,0.09],\beta \approx 0.7} ), ImageNet classification with ResNet ( α ∈ [ 0.3 , 0.5 ] , β ≈ 0.6 {\displaystyle \alpha \in [0.3,0.5],\beta \approx 0.6} ), and speech recognition with two hybrid (LSTMs complemented by either CNNs or an attention decoder) architectures ( α ≈ 0.3 {\displaystyle \alpha \approx 0.3} ). === (Henighan, Kaplan, et al, 2020) === A 2020 analysis studied statistical relations between C , N , D , L {\displaystyle C,N,D,L} over a wide range of values and found similar scaling laws, over the range of N ∈ [ 10 3 , 10 9 ] {\displaystyle N\in [10^{3},10^{9}]} , C ∈ [ 10 12 , 10 21 ] {\displaystyle C\in [10^{12},10^{21}]} , and over multiple modalities (text, video, image, text to image, etc.). In particular, the scaling laws it found are (Table 1 of ): For each modality, they fixed one of the two C , N {\displaystyle C,N} , and varying the other one ( D {\displaystyle D} is varied along using D = C / 6 N {\displaystyle D=C/6N} ), the achievable test loss satisfies L = L 0 + ( x 0 x ) α {\displaystyle L=L_{0}+\left({\frac {x_{0}}{x}}\right)^{\alpha }} where x {\displaystyle x} is the varied variable, and L 0 , x 0 , α {\displaystyle L_{0},x_{0},\alpha } are parameters to be found by statistical fitting. The parameter α {\displaystyle \alpha } is the most important one. When N {\displaystyle N} is the varied variable, α {\displaystyle \alpha } ranges from 0.037 {\displaystyle 0.037} to 0.24 {\displaystyle 0.24} depending on the model modality. This corresponds to the α = 0.34 {\displaystyle \alpha =0.34} from the Chinchilla scaling paper. When C {\displaystyle C} is the varied variable, α {\displaystyle \alpha } ranges from 0.048 {\displaystyle 0.048} to 0.19 {\displaystyle 0.19} depending on the model modality. This corresponds to the β = 0.28 {\displaystyle \beta =0.28} from the Chinchilla scaling paper. Given fixed computing budget, optimal model parameter count is consistently around N o p t ( C ) = ( C 5 × 10 − 12 petaFLOP-day ) 0.7 = 9.0 × 10 − 7 C 0.7 {\displaystyle N_{opt}(C)=\left({\frac {C}{5\times 10^{-12}{\text{petaFLOP-day}}}}\right)^{0.7}=9.0\times 10^{-7}C^{0.7}} The parameter 9.0 × 10 − 7 {\displaystyle 9.0\times 10^{-7}} varies by a factor of up to 10 for different modalities. The exponent parameter 0.7 {\displaystyle 0.7} varies from 0.64 {\displaystyle 0.64} to 0.75 {\displaystyle 0.75} for different modalities. This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. It's "strongly suggested" (but not statistically checked) that D o p t ( C ) ∝ N o p t ( C ) 0.4 ∝ C 0.28 {\displaystyle D_{opt}(C)\propto N_{opt}(C)^{0.4}\propto C^{0.28}} . This exponent corresponds to the ≈ 0.5 {\displaystyle \approx 0.5} from the Chinchilla scaling paper. The scaling law of L = L 0 + ( C 0 / C ) 0.048 {\displaystyle L=L_{0}+(C_{0}/C)^{0.048}} was confirmed during the training of GPT-3 (Figure 3.1 ). === Chinchilla scaling (Hoffmann, et al, 2022) === One particular scaling law ("Chinchilla scaling") states that, for a large language model (LLM) autoregressively trained for one epoch, with a cosine learning rate schedule, we have: { C = C 0 N D L = A N α + B D β + L 0 {\displaystyle {\begin{cases}C=C_{0}ND\\L={\frac {A}{N^{\alpha }}}+{\frac {B}{D^{\beta }}}+L_{0}\end{cases}}} where the variables are C {\displaystyle C} is the cost o

    Read more →
  • SwissCovid

    SwissCovid

    SwissCovid is a COVID-19 contact tracing app used for digital contact tracing in Switzerland. Use of the app is voluntary and based on a decentralized approach using Bluetooth Low Energy and Decentralized Privacy-Preserving Proximity Tracing (dp3t). == Development == The app was developed in collaboration with the FOPH by Federal Office for Information Technology, Systems and Communications FOITT, École polytechnique fédérale de Lausanne (EPFL) and the Swiss Federal Institute of Technology in Zurich (ETH) as well as other experts. == Non-interoperability with applications in European countries == There is an agreement between EU countries to make applications compatible. However, there is no legal basis for the SwissCovid application to be part of this portal even though technically speaking it is ready, according to Sang-Ill Kim, head of the digital transformation department of the Federal Office of Public Health. == Criticism == === Not full open source and dependence on Google and Apple === In June 2020, researchers Serge Vaudenay and Martin Vuagnoux published a critical analysis of the application, noting that it relies heavily on Google and Apple's exposure notification system, which is integrated into their respective Android and iOS operating systems. Since Google and Apple have not released the full source code of this system, this would call into question the truly open source nature of the application. The researchers note that the dp3t collective, which includes the developers of the application, has asked Google and Apple to release their code. Moreover, they criticize the official description of the application and its functionalities, as well as the adequacy of the legal basis for its effective operation. === Cyber attacks === Professor Serge Vaudenay and Martin Vuagnoux identify also various security vulnerabilities in the application. The system would thus allow a third party to trace the movements of a phone using the application by means of Bluetooth sensors scattered along its path, for example in a building. Another possible attack would be to copy identifiers from the phones of people who may be ill (for example, in a hospital), and to reproduce those identifiers in order to receive notification of exposure to COVID-19 and illegitimately benefit from quarantine (thus entitling them to paid leave, a postponed examination, or other benefits). The system would also allow a third party to use a phone using the application by means of Bluetooth sensors scattered along the way. Paul-Olivier Dehaye of Personaldata.io and professor Joel Reardon of the University of Calgary published in June 2020 several examples of AEM (Associated Encrypted Metadata) replay and manipulation attacks via software development kits (SDKs) found in benign third-party mobile applications downloaded by the general public and having the phone's Bluetooth access permissions and in September 2020 a paper indicating that "Bluetooth-based proximity tracing apps are fundamentally insecure with respect to an attacker leveraging a malevolent app or SDK". === Costs === According to a publication by the federal administration, "the costs of developing the software for the mobile phone application, the GR back-end and the code management system as well as the costs for access management for the cantonal doctors' services are estimated at a one-off amount of 1.65 million francs. However, the Zurich-based company Ubique, responsible for the development of the application, was finally awarded the mandate to develop the application for an amount of 1.8 million francs. Through the Botnar Foundation based in Basel, École polytechnique fédérale de Lausanne received 3.5 million Swiss francs for the development of the application

    Read more →
  • Empowerment (artificial intelligence)

    Empowerment (artificial intelligence)

    Empowerment in the field of artificial intelligence formalises and quantifies (via information theory) the potential an agent perceives that it has to influence its environment. An agent which follows an empowerment maximising policy, acts to maximise future options (typically up to some limited horizon). Empowerment can be used as a (pseudo) utility function that depends only on information gathered from the local environment to guide action, rather than seeking an externally imposed goal, thus is a form of intrinsic motivation. The empowerment formalism depends on a probabilistic model commonly used in artificial intelligence. An autonomous agent operates in the world by taking in sensory information and acting to change its state, or that of the environment, in a cycle of perceiving and acting known as the perception-action loop. Agent state and actions are modelled by random variables ( S : s ∈ S , A : a ∈ A {\displaystyle S:s\in {\mathcal {S}},A:a\in {\mathcal {A}}} ) and time ( t {\displaystyle t} ). The choice of action depends on the current state, and the future state depends on the choice of action, thus the perception-action loop unrolled in time forms a causal bayesian network. == Definition == Empowerment ( E {\displaystyle {\mathfrak {E}}} ) is defined as the channel capacity ( C {\displaystyle C} ) of the actuation channel of the agent, and is formalised as the maximal possible information flow between the actions of the agent and the effect of those actions some time later. Empowerment can be thought of as the future potential of the agent to affect its environment, as measured by its sensors. E := C ( A t ⟶ S t + 1 ) ≡ max p ( a t ) I ( A t ; S t + 1 ) {\displaystyle {\mathfrak {E}}:=C(A_{t}\longrightarrow S_{t+1})\equiv \max _{p(a_{t})}I(A_{t};S_{t+1})} In a discrete time model, Empowerment can be computed for a given number of cycles into the future, which is referred to in the literature as 'n-step' empowerment. E ( A t n ⟶ S t + n ) = max p ( a t , . . . , a t + n − 1 ) I ( A t , . . . , A t + n − 1 ; S t + n ) {\displaystyle {\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n})=\max _{p(a_{t},...,a_{t+n-1})}I(A_{t},...,A_{t+n-1};S_{t+n})} The unit of empowerment depends on the logarithm base. Base 2 is commonly used in which case the unit is bits. === Contextual Empowerment === In general the choice of action (action distribution) that maximises empowerment varies from state to state. Knowing the empowerment of an agent in a specific state is useful, for example to construct an empowerment maximising policy. State-specific empowerment can be found using the more general formalism for 'contextual empowerment'. C {\displaystyle C} is a random variable describing the context (e.g. state). E ( A t n ⟶ S t + n ∣ C ) = ∑ c ∈ C p ( c ) E ( A t n ⟶ S t + n ∣ C = c ) {\displaystyle {\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C)=\sum _{c{\in }C}p(c){\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C=c)} == Application == Empowerment maximisation can be used as a pseudo-utility function to enable agents to exhibit intelligent behaviour without requiring the definition of external goals, for example balancing a pole in a cart-pole balancing scenario where no indication of the task is provided to the agent. Empowerment has been applied in studies of collective behaviour and in continuous domains. As is the case with Bayesian methods in general, computation of empowerment becomes computationally expensive as the number of actions and time horizon extends, but approaches to improve efficiency have led to usage in real-time control. Empowerment has been used for intrinsically motivated reinforcement learning agents playing video games, and in the control of underwater vehicles.

    Read more →
  • Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective. This method has enabled broad applications across multiple domains, including cross-modal retrieval, text-to-image generation, and aesthetic ranking. == Algorithm == The CLIP method trains a pair of models contrastively. One model takes in a piece of text as input and outputs a single vector representing its semantic content. The other model takes in an image and similarly outputs a single vector representing its visual content. The models are trained so that the vectors corresponding to semantically similar text-image pairs are close together in the shared vector space, while those corresponding to dissimilar pairs are far apart. To train a pair of CLIP models, one would start by preparing a large dataset of image-caption pairs. During training, the models are presented with batches of N {\displaystyle N} image-caption pairs. Let the outputs from the text and image models be respectively v 1 , . . . , v N , w 1 , . . . , w N {\displaystyle v_{1},...,v_{N},w_{1},...,w_{N}} . Two vectors are considered "similar" if their dot product is large. The loss incurred on this batch is the multi-class N-pair loss, which is a symmetric cross-entropy loss over similarity scores: − 1 N ∑ i ln ⁡ e v i ⋅ w i / T ∑ j e v i ⋅ w j / T − 1 N ∑ j ln ⁡ e v j ⋅ w j / T ∑ i e v i ⋅ w j / T {\displaystyle -{\frac {1}{N}}\sum _{i}\ln {\frac {e^{v_{i}\cdot w_{i}/T}}{\sum _{j}e^{v_{i}\cdot w_{j}/T}}}-{\frac {1}{N}}\sum _{j}\ln {\frac {e^{v_{j}\cdot w_{j}/T}}{\sum _{i}e^{v_{i}\cdot w_{j}/T}}}} In essence, this loss function encourages the dot product between matching image and text vectors ( v i ⋅ w i {\displaystyle v_{i}\cdot w_{i}} ) to be high, while discouraging high dot products between non-matching pairs. The parameter T > 0 {\displaystyle T>0} is the temperature, which is parameterized in the original CLIP model as T = e − τ {\displaystyle T=e^{-\tau }} where τ ∈ R {\displaystyle \tau \in \mathbb {R} } is a learned parameter. Other loss functions are possible. For example, Sigmoid CLIP (SigLIP) proposes the following loss function: L = 1 N ∑ i , j ∈ 1 : N f ( ( 2 δ i , j − 1 ) ( e τ w i ⋅ v j + b ) ) {\displaystyle L={\frac {1}{N}}\sum _{i,j\in 1:N}f((2\delta _{i,j}-1)(e^{\tau }w_{i}\cdot v_{j}+b))} where f ( x ) = ln ⁡ ( 1 + e − x ) {\displaystyle f(x)=\ln(1+e^{-x})} is the negative log sigmoid loss, and the Dirac delta symbol δ i , j {\displaystyle \delta _{i,j}} is 1 if i = j {\displaystyle i=j} else 0. == CLIP models == While the original model was developed by OpenAI, subsequent models have been trained by other organizations as well. === Image model === The image encoding models used in CLIP are typically vision transformers (ViT). The naming convention for these models often reflects the specific ViT architecture used. For instance, "ViT-L/14" means a "vision transformer large" (compared to other models in the same series) with a patch size of 14, meaning that the image is divided into 14-by-14 pixel patches before being processed by the transformer. The size indicator ranges from B, L, H, G (base, large, huge, giant), in that order. Other than ViT, the image model is typically a convolutional neural network, such as ResNet (in the original series by OpenAI), or ConvNeXt (in the OpenCLIP model series by LAION). Since the output vectors of the image model and the text model must have exactly the same length, both the image model and the text model have fixed-length vector outputs, which in the original report is called "embedding dimension". For example, in the original OpenAI model, the ResNet models have embedding dimensions ranging from 512 to 1024, and for the ViTs, from 512 to 768. Its implementation of ViT was the same as the original one, with one modification: after position embeddings are added to the initial patch embeddings, there is a LayerNorm. Its implementation of ResNet was the same as the original one, with 3 modifications: In the start of the CNN (the "stem"), they used three stacked 3x3 convolutions instead of a single 7x7 convolution, as suggested by. There is an average pooling of stride 2 at the start of each downsampling convolutional layer (they called it rect-2 blur pooling according to the terminology of ). This has the effect of blurring images before downsampling, for antialiasing. The final convolutional layer is followed by a multiheaded attention pooling. ALIGN a model with similar capabilities, trained by researchers from Google used EfficientNet, a kind of convolutional neural network. === Text model === The text encoding models used in CLIP are typically Transformers. In the original OpenAI report, they reported using a Transformer (63M-parameter, 12-layer, 512-wide, 8 attention heads) with lower-cased byte pair encoding (BPE) with 49152 vocabulary size. Context length was capped at 76 for efficiency. Like GPT, it was decoder-only, with only causally-masked self-attention. Its architecture is the same as GPT-2. Like BERT, the text sequence is bracketed by two special tokens [SOS] and [EOS] ("start of sequence" and "end of sequence"). Take the activations of the highest layer of the transformer on the [EOS], apply LayerNorm, then a final linear map. This is the text encoding of the input sequence. The final linear map has output dimension equal to the embedding dimension of whatever image encoder it is paired with. These models all had context length 77 and vocabulary size 49408. ALIGN used BERT of various sizes. == Dataset == === WebImageText === The CLIP models released by OpenAI were trained on a dataset called "WebImageText" (WIT) containing 400 million pairs of images and their corresponding captions scraped from the internet. The total number of words in this dataset is similar in scale to the WebText dataset used for training GPT-2, which contains about 40 gigabytes of text data. The dataset contains 500,000 text-queries, with up to 20,000 (image, text) pairs per query. The text-queries were generated by starting with all words occurring at least 100 times in English Wikipedia, then extended by bigrams with high mutual information, names of all Wikipedia articles above a certain search volume, and WordNet synsets. The dataset is private and has not been released to the public, and there is no further information on it. ==== Data preprocessing ==== For the CLIP image models, the input images are preprocessed by first dividing each of the R, G, B values of an image by the maximum possible value, so that these values fall between 0 and 1, then subtracting by [0.48145466, 0.4578275, 0.40821073], and dividing by [0.26862954, 0.26130258, 0.27577711]. The rationale was that these are the mean and standard deviations of the images in the WebImageText dataset, so this preprocessing step roughly whitens the image tensor. These numbers slightly differ from the standard preprocessing for ImageNet, which uses [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225]. If the input image does not have the same resolution as the native resolution (224×224 for all except ViT-L/14@336px, which has 336×336 resolution), then the input image is first scaled by bicubic interpolation, so that its shorter side is the same as the native resolution, then the central square of the image is cropped out. === Others === ALIGN used over one billion image-text pairs, obtained by extracting images and their alt-tags from online crawling. The method was described as similar to how the Conceptual Captions dataset was constructed, but instead of complex filtering, they only applied a frequency-based filtering. Later models trained by other organizations had published datasets. For example, LAION trained OpenCLIP with published datasets LAION-400M, LAION-2B, and DataComp-1B. == Training == In the original OpenAI CLIP report, they reported training 5 ResNet and 3 ViT (ViT-B/32, ViT-B/16, ViT-L/14). Each was trained for 32 epochs. The largest ResNet model took 18 days to train on 592 V100 GPUs. The largest ViT model took 12 days on 256 V100 GPUs. All ViT models were trained on 224×224 image resolution. The ViT-L/14 was then boosted to 336×336 resolution by FixRes, resulting in a model. They found this was the best-performing model. In the OpenCLIP series, the ViT-L/14 model was trained on 384 A100 GPUs on the LAION-2B dataset, for 160 epochs for a total of 32B samples seen. == Applications == === Cross-modal retrieval === CLIP's cross-modal retrieval enables the alignment of visual and textual data in a shared latent space, allowing users to retrieve images based on text descriptions and vice versa, without the need for explicit image annotations. In text-to-image retrieval, users input descriptive text, and CLIP retrieves images with matching embeddings. In image-to-text retrieval, images are used to find related text content. CLIP’s ability to connect vis

    Read more →