Run-time algorithm specialization

Run-time algorithm specialization

In computer science, run-time algorithm specialization is a methodology for creating efficient algorithms for costly computation tasks of certain kinds. The methodology originates in the field of automated theorem proving and, more specifically, in the Vampire theorem prover project. The idea is inspired by the use of partial evaluation in optimising program translation. Many core operations in theorem provers exhibit the following pattern. Suppose that we need to execute some algorithm a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} in a situation where a value of A {\displaystyle A} is fixed for potentially many different values of B {\displaystyle B} . In order to do this efficiently, we can try to find a specialization of a l g {\displaystyle {\mathit {alg}}} for every fixed A {\displaystyle A} , i.e., such an algorithm a l g A {\displaystyle {\mathit {alg}}_{A}} , that executing a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} is equivalent to executing a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} . The specialized algorithm may be more efficient than the generic one, since it can exploit some particular properties of the fixed value A {\displaystyle A} . Typically, a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} can avoid some operations that a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} would have to perform, if they are known to be redundant for this particular parameter A {\displaystyle A} . In particular, we can often identify some tests that are true or false for A {\displaystyle A} , unroll loops and recursion, etc. == Difference from partial evaluation == The key difference between run-time specialization and partial evaluation is that the values of A {\displaystyle A} on which a l g {\displaystyle {\mathit {alg}}} is specialised are not known statically, so the specialization takes place at run-time. There is also an important technical difference. Partial evaluation is applied to algorithms explicitly represented as codes in some programming language. At run-time, we do not need any concrete representation of a l g {\displaystyle {\mathit {alg}}} . We only have to imagine a l g {\displaystyle {\mathit {alg}}} when we program the specialization procedure. All we need is a concrete representation of the specialized version a l g A {\displaystyle {\mathit {alg}}_{A}} . This also means that we cannot use any universal methods for specializing algorithms, which is usually the case with partial evaluation. Instead, we have to program a specialization procedure for every particular algorithm a l g {\displaystyle {\mathit {alg}}} . An important advantage of doing so is that we can use some powerful ad hoc tricks exploiting peculiarities of a l g {\displaystyle {\mathit {alg}}} and the representation of A {\displaystyle A} and B {\displaystyle B} , which are beyond the reach of any universal specialization methods. == Specialization with compilation == The specialized algorithm has to be represented in a form that can be interpreted. In many situations, usually when a l g A ( B ) {\displaystyle {\mathit {alg}}_{A}(B)} is to be computed on many values of B {\displaystyle B} in a row, a l g A {\displaystyle {\mathit {alg}}_{A}} can be written as machine code instructions for a special abstract machine, and it is typically said that A {\displaystyle A} is compiled. The code itself can then be additionally optimized by answer-preserving transformations that rely only on the semantics of instructions of the abstract machine. The instructions of the abstract machine can usually be represented as records. One field of such a record, an instruction identifier (or instruction tag), would identify the instruction type, e.g. an integer field may be used, with particular integer values corresponding to particular instructions. Other fields may be used for storing additional parameters of the instruction, e.g. a pointer field may point to another instruction representing a label, if the semantics of the instruction require a jump. All instructions of the code can be stored in a traversable data structure such as an array, linked list, or tree. Interpretation (or execution) proceeds by fetching instructions in some order, identifying their type, and executing the actions associated with said type. In many programming languages, such as C and C++, a simple switch statement may be used to associate actions with different instruction identifiers. Modern compilers usually compile a switch statement with constant (e.g. integer) labels from a narrow range by storing the address of the statement corresponding to a value i {\displaystyle i} in the i {\displaystyle i} -th cell of a special array, as a means of efficient optimisation. This can be exploited by taking values for instruction identifiers from a small interval of values. == Data-and-algorithm specialization == There are situations when many instances of A {\displaystyle A} are intended for long-term storage and the calls of a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} occur with different B {\displaystyle B} in an unpredictable order. For example, we may have to check a l g ( A 1 , B 1 ) {\displaystyle {\mathit {alg}}(A_{1},B_{1})} first, then a l g ( A 2 , B 2 ) {\displaystyle {\mathit {alg}}(A_{2},B_{2})} , then a l g ( A 1 , B 3 ) {\displaystyle {\mathit {alg}}(A_{1},B_{3})} , and so on. In such circumstances, full-scale specialization with compilation may not be suitable due to excessive memory usage. However, we can sometimes find a compact specialized representation A ′ {\displaystyle A^{\prime }} for every A {\displaystyle A} , that can be stored with, or instead of, A {\displaystyle A} . We also define a variant a l g ′ {\displaystyle {\mathit {alg}}^{\prime }} that works on this representation and any call to a l g ( A , B ) {\displaystyle {\mathit {alg}}(A,B)} is replaced by a l g ′ ( A ′ , B ) {\displaystyle {\mathit {alg}}^{\prime }(A^{\prime },B)} , intended to do the same job faster.

Owain Evans

Owain Rhys Evans is a British artificial intelligence researcher who works on AI alignment and machine learning safety. He founded Truthful AI, a research group based in Berkeley, California, and is an affiliate of the Center for Human Compatible AI (CHAI) at the University of California, Berkeley. His research addresses AI truthfulness, emergent behaviors in large language models, and the alignment of AI systems with human values. == Education == Evans earned a Bachelor of Arts in philosophy and mathematics from Columbia University in 2008 and a PhD in philosophy from the Massachusetts Institute of Technology in 2015. His doctoral research focused on Bayesian computational models of human preferences and decision-making. == Career == After completing his doctorate, Evans held positions at the Future of Humanity Institute (FHI) at the University of Oxford, first as a postdoctoral research fellow and later as a research scientist. While at FHI, he co-authored a survey of machine learning researchers on timelines for human-level AI, published in the Journal of Artificial Intelligence Research. The survey was reported on by Newsweek, New Scientist, the BBC, and The Economist. He was also among the co-authors of a 2018 report on the potential for misuse of AI technologies, published by researchers at Oxford, Cambridge, and other institutions. Since 2022, Evans has been based in Berkeley, where he founded Truthful AI, a non-profit research group that studies AI truthfulness, deception, and emergent behaviors in large language models. == Research == Evans's early work examined challenges in inverse reinforcement learning when human behavior is irrational or biased, proposing methods for AI systems to infer preferences from imperfect human demonstrations. He co-developed TruthfulQA (2021), a benchmark that tests whether language models give truthful answers rather than repeating common misconceptions. Initial evaluations found that larger models were not more truthful, suggesting that scaling alone does not improve factual accuracy. The benchmark has since been used by AI developers to evaluate large language models. He also co-authored a paper proposing design and governance strategies for building AI systems that do not deceive or hallucinate. In 2023, Evans and collaborators described the "reversal curse", showing that language models trained on a fact in one direction (e.g. "A is B") often cannot answer the corresponding reverse query ("B is A"). His group also developed a benchmark for evaluating situational awareness in language models. In 2025, Evans and colleagues published a study in Nature on what they termed "emergent misalignment": fine-tuning a language model on a narrow task (writing insecure code) caused it to produce unrelated harmful outputs without explicit instruction to do so. Later that year, Evans and collaborators (including researchers at Anthropic) reported that hidden behavioral traits can transfer between language models through training data, even when those traits are not explicitly present in the data, a phenomenon they called "subliminal learning". == Public engagement == In November 2025, Evans delivered the Hinton Lectures, a keynote lecture series on AI safety co-founded by Geoffrey Hinton and the Global Risk Institute.

Jubatus

Jubatus is an open-source online machine learning and distributed computing framework developed at Nippon Telegraph and Telephone and Preferred Infrastructure. Its features include classification, recommendation, regression, anomaly detection and graph mining. It supports many client languages, including C++, Java, Ruby and Python. It uses Iterative Parameter Mixture for distributed machine learning. == Notable Features == Jubatus supports: Multi-classification algorithms: Perceptron Passive Aggressive Confidence Weighted Adaptive Regularization of Weight Vectors Normal Herd Recommendation algorithms using: Inverted index Minhash Locality-sensitive hashing Regression algorithms: Passive Aggressive feature extraction method for natural language: n-gram Text segmentation

Ilastik

ilastik is free open source software for image classification and segmentation. No previous experience in image processing is required to run the software. Since 2018 ilastik is further developed and maintained by Anna Kreshuk's group at European Molecular Biology Laboratory. == Features == ilastik allows user to annotate an arbitrary number of classes in images with a mouse interface. Using these user annotations and the generic (nonlinear) image features, the user can train a random forest classifier. Trained ilastik classifiers can be applied new data not included in the training set in ilastik via its batch processing functionality, or without using the graphical user interface, in headless mode. ilastik can be integrated into various related tools: Pre-trained workflows can be executed directly from ImageJ/Fiji using the ilastik-ImageJ plugin. Pre-trained ilastik Pixel Classification workflows can be run directly in Python with the ilastik Python package, which is available via conda. ilastik has a CellProfiler module to use ilastik classifiers to process images within a CellProfiler framework. == History == ilastik was first released in 2011 by scientists at the Heidelberg Collaboratory for Image Processing (HCI), University of Heidelberg. == Application == The Interactive Learning and Segmentation Toolkit Carving Cell classification and neuron classification Synapse detection Cell tracking Neural Network Classification == Resources == ilastik project is hosted on GitHub. It is a collaborative project, any contributions such as comments, bug reports, bug fixes or code contributions are welcome. The ilastik team can be contacted for user support on the image.sc forum.

Prefrontal cortex basal ganglia working memory

Prefrontal cortex basal ganglia working memory (PBWM) is an algorithm that models working memory in the prefrontal cortex and the basal ganglia. It can be compared to long short-term memory (LSTM) in functionality, but is more biologically explainable. It uses the primary value learned value model to train prefrontal cortex working-memory updating system, based on the biology of the prefrontal cortex and basal ganglia. It is used as part of the Leabra framework and was implemented in Emergent in 2019. == Abstract == The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and "executive" functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive. PBWM is a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia and amygdala, which together form an actor/critic architecture. The critic system learns which prefrontal representations are task-relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model's performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task, and other benchmark working memory tasks. == Model == First, there are multiple separate stripes (groups of units) in the prefrontal cortex and striatum layers. Each stripe can be independently updated, such that this system can remember several different things at the same time, each with a different "updating policy" of when memories are updated and maintained. The active maintenance of the memory is in prefrontal cortex (PFC), and the updating signals (and updating policy more generally) come from the striatum units (a subset of basal ganglia units). PVLV provides reinforcement learning signals to train up the dynamic gating system in the basal ganglia. === Sensory input and motor output === The sensory input is connected to the posterior cortex which is connected to the motor output. The sensory input is also linked to the PVLV system. === Posterior cortex === The posterior cortex form the hidden layers of the input/output mapping. The PFC is connected with the posterior cortex to contextualize this input/output mapping. === PFC === The PFC (for output gating) has a localist one-to-one representation of the input units for every stripe. Thus, you can look at these PFC representations and see directly what the network is maintaining. The PFC maintains the working memory needed to perform the task. === Striatum === This is the dynamic gating system representing the striatum units of the basal ganglia. Every even-index unit within a stripe represents "Go", while the odd-index units represent "NoGo." The Go units cause updating of the PFC, while the NoGo units cause the PFC to maintain its existing memory representation. There are groups of units for every stripe. In the PBWM model in Emergent, the matrices represent the striatum. === PVLV === All of these layers are part of PVLV system. The PVLV system controls the dopaminergic modulation of the basal ganglia (BG). Thus, BG/PVLV form an actor-critic architecture where the PVLV system learns when to update. ==== SNrThal ==== SNrThal represents the substantia nigra pars reticulata (SNr) and the associated area of the thalamus, which produce a competition among the Go/NoGo units within a given stripe and mediates competition using k-winners-take-all dynamics. If there is more overall Go activity in a given stripe, then the associated SNrThal unit gets activated, and it drives updating in PFC. For every stripe, there is one unit in SNrThal. ==== VTA and SNc ==== Ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) are part of the dopamine layer. This layer models midbrain dopamine neurons. They control the dopaminergic modulation of the basal ganglia.

Percept (artificial intelligence)

A percept is the input that an intelligent agent is perceiving at any given moment. It is essentially the same concept as a percept in psychology, except that it is being perceived not by the brain but by the agent. A percept is detected by a sensor, often a camera, processed accordingly, and acted upon by an actuator. Each percept is added to a "percept sequence", which is a complete history of each percept ever detected. The agent's action at any instant point may depend on the entire percept sequence up to that particular instant point. An intelligent agent chooses how to act not only based on the current percept, but the percept sequence. The next action is chosen by the agent function, which maps every percept to an action. For example, if a camera were to record a gesture, the agent would process the percepts, calculate the corresponding spatial vectors, examine its percept history, and use the agent program (the application of the agent function) to act accordingly. == Examples == Examples of percepts include inputs from touch sensors, cameras, infrared sensors, sonar, microphones, mice, and keyboards. A percept can also be a higher-level feature of the data, such as lines, depth, objects, faces, or gestures.

Curriculum learning

Curriculum learning is a technique in machine learning in which a model is trained on examples of increasing difficulty, where the definition of "difficulty" may be provided externally or discovered as part of the training process. This is intended to attain good performance more quickly, or to converge to a better local optimum if the global optimum is not found. == Approach == Most generally, curriculum learning is the technique of successively increasing the difficulty of examples in the training set that is presented to a model over multiple training iterations. This can produce better results than exposing the model to the full training set immediately under some circumstances; most typically, when the model is able to learn general principles from easier examples, and then gradually incorporate more complex and nuanced information as harder examples are introduced, such as edge cases. This has been shown to work in many domains, most likely as a form of regularization. There are several major variations in how the technique is applied: A concept of "difficulty" must be defined. This may come from human annotation or an external heuristic; for example in language modeling, shorter sentences might be classified as easier than longer ones. Another approach is to use the performance of another model, with examples accurately predicted by that model being classified as easier (providing a connection to boosting). Difficulty can be increased steadily or in distinct epochs, and in a deterministic schedule or according to a probability distribution. This may also be moderated by a requirement for diversity at each stage, in cases where easier examples are likely to be disproportionately similar to each other. Applications must also decide the schedule for increasing the difficulty. Simple approaches may use a fixed schedule, such as training on easy examples for half of the available iterations and then all examples for the second half. Other approaches use self-paced learning to increase the difficulty in proportion to the performance of the model on the current set. Since curriculum learning only concerns the selection and ordering of training data, it can be combined with many other techniques in machine learning. The success of the method assumes that a model trained for an easier version of the problem can generalize to harder versions, so it can be seen as a form of transfer learning. Some authors also consider curriculum learning to include other forms of progressively increasing complexity, such as increasing the number of model parameters. It is frequently combined with reinforcement learning, such as learning a simplified version of a game first. Some domains have shown success with anti-curriculum learning: training on the most difficult examples first. One example is the ACCAN method for speech recognition, which trains on the examples with the lowest signal-to-noise ratio first. == History == The term "curriculum learning" was introduced by Yoshua Bengio et al in 2009, with reference to the psychological technique of shaping in animals and structured education for humans: beginning with the simplest concepts and then building on them. The authors also note that the application of this technique in machine learning has its roots in the early study of neural networks such as Jeffrey Elman's 1993 paper Learning and development in neural networks: the importance of starting small. Bengio et al showed good results for problems in image classification, such as identifying geometric shapes with progressively more complex forms, and language modeling, such as training with a gradually expanding vocabulary. They conclude that, for curriculum strategies, "their beneficial effect is most pronounced on the test set", suggesting good generalization. The technique has since been applied to many other domains: Natural language processing: Part-of-speech tagging Intent detection Sentiment analysis Machine translation Speech recognition Language model pre-training Image recognition: Facial recognition Object detection Reinforcement learning: Game-playing Graph learning Matrix factorization