A.i Paraphrasing Online

A.i Paraphrasing Online — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Convolutional layer

In artificial neural networks, a convolutional layer is a type of network layer that applies a convolution operation to the input. Convolutional layers are some of the primary building blocks of convolutional neural networks (CNNs), a class of neural network most commonly applied to images, video, audio, and other data that have the property of uniform translational symmetry. The convolution operation in a convolutional layer involves sliding a small window (called a kernel or filter) across the input data and computing the dot product between the values in the kernel and the input at each position. This process creates a feature map that represents detected features in the input. == Concepts == === Kernel === Kernels, also known as filters, are small matrices of weights that are learned during the training process. Each kernel is responsible for detecting a specific feature in the input data. The size of the kernel is a hyperparameter that affects the network's behavior. === Convolution === For a 2D input x {\displaystyle x} and a 2D kernel w {\displaystyle w} , the 2D convolution operation can be expressed as: y [ i , j ] = ∑ m = 0 k h − 1 ∑ n = 0 k w − 1 x [ i + m , j + n ] ⋅ w [ m , n ] {\displaystyle y[i,j]=\sum _{m=0}^{k_{h}-1}\sum _{n=0}^{k_{w}-1}x[i+m,j+n]\cdot w[m,n]} where k h {\displaystyle k_{h}} and k w {\displaystyle k_{w}} are the height and width of the kernel, respectively. This generalizes immediately to nD convolutions. Commonly used convolutions are 1D (for audio and text), 2D (for images), and 3D (for spatial objects, and videos). === Stride === Stride determines how the kernel moves across the input data. A stride of 1 means the kernel shifts by one pixel at a time, while a larger stride (e.g., 2 or 3) results in less overlap between convolutions and produces smaller output feature maps. === Padding === Padding involves adding extra pixels around the edges of the input data. It serves two main purposes: Preserving spatial dimensions: Without padding, each convolution reduces the size of the feature map. Handling border pixels: Padding ensures that border pixels are given equal importance in the convolution process. Common padding strategies include: No padding/valid padding. This strategy typically causes the output to shrink. Same padding: Any method that ensures the output size same as input size is a same padding strategy. Full padding: Any method that ensures each input entry is convolved over for the same number of times is a full padding strategy. Common padding algorithms include: Zero padding: Add zero entries to the borders of input. Mirror/reflect/symmetric padding: Reflect the input array on the border. Circular padding: Cycle the input array back to the opposite border, like a torus. The exact numbers used in convolutions is complicated, for which we refer to (Dumoulin and Visin, 2018) for details. == Variants == === Standard === The basic form of convolution as described above, where each kernel is applied to the entire input volume. === Depthwise separable === Depthwise separable convolution separates the standard convolution into two steps: depthwise convolution and pointwise convolution. The depthwise separable convolution decomposes a single standard convolution into two convolutions: a depthwise convolution that filters each input channel independently and a pointwise convolution ( 1 × 1 {\displaystyle 1\times 1} convolution) that combines the outputs of the depthwise convolution. This factorization significantly reduces computational cost. It was first developed by Laurent Sifre during an internship at Google Brain in 2013 as an architectural variation on AlexNet to improve convergence speed and model size. === Dilated === Dilated convolution, or atrous convolution, introduces gaps between kernel elements, allowing the network to capture a larger receptive field without increasing the kernel size. === Transposed === Transposed convolution, also known as deconvolution, fractionally strided convolution, and upsampling convolution, is a convolution where the output tensor is larger than its input tensor. It's often used in encoder-decoder architectures for upsampling. It's used in image generation, semantic segmentation, and super-resolution tasks. == History == The concept of convolution in neural networks was inspired by the visual cortex in biological brains. Early work by Hubel and Wiesel in the 1960s on the cat's visual system laid the groundwork for artificial convolution networks. An early convolution neural network was developed by Kunihiko Fukushima in 1969. It had mostly hand-designed kernels inspired by convolutions in mammalian vision. In 1979 he improved it to the Neocognitron, which learns all convolutional kernels by unsupervised learning (in his terminology, "self-organized by 'learning without a teacher'"). During the 1988 to 1998 period, a series of CNN were introduced by Yann LeCun et al., ending with LeNet-5 in 1998. It was an early influential CNN architecture for handwritten digit recognition, trained on the MNIST dataset, and was used in ATM. (Olshausen & Field, 1996) discovered that simple cells in the mammalian primary visual cortex implement localized, oriented, bandpass receptive fields, which could be recreated by fitting sparse linear codes for natural scenes. This was later found to also occur in the lowest-level kernels of trained CNNs. The field saw a resurgence in the 2010s with the development of deeper architectures and the availability of large datasets and powerful GPUs. AlexNet, developed by Alex Krizhevsky et al. in 2012, was a catalytic event in modern deep learning. In that year’s ImageNet competition, the AlexNet model achieved a 16% top-five error rate, significantly outperforming the next best entry, which had a 26% error rate. The network used eight trainable layers, approximately 650,000 neurons, and around 60 million parameters, highlighting the impact of deeper architectures and GPU acceleration on image recognition performance. From the 2013 ImageNet competition, most entries adopted deep convolutional neural networks, building on the success of AlexNet. Over the following years, performance steadily improved, with the top-five error rate falling from 16% in 2012 and 12% in 2013 to below 3% by 2017, as networks grew increasingly deep.
Read more →
REEM

REEM is a prototype humanoid robot built by PAL Robotics in Spain. It is a 1.70 m high humanoid robot with 22 degrees of freedom, with a mobile base with wheels, allowing it to move at 4 km/hour. The upper part of the robot consists of a torso with a touch screen, two motorized arms, which give it a high degree of expression, and a head, which is also motorized. REEM-A and REEM-B are the first and second prototypes of humanoid robots created by PAL Robotics. REEM-B can recognize, grasp and lift objects and walk by itself, avoiding obstacles through simultaneous localization and mapping. The robot accepts voice commands and can recognize faces. == Specifications ==
Read more →
Infomax

Infomax', or the principle of maximum information preservation, is an optimization principle for artificial neural networks and other information processing systems. It prescribes that a function that maps a set of input values x {\displaystyle x} to a set of output values z ( x ) {\displaystyle z(x)} should be chosen or learned so as to maximize the average Shannon mutual information between x {\displaystyle x} and z ( x ) {\displaystyle z(x)} , subject to a set of specified constraints and/or noise processes. Infomax algorithms are learning algorithms that perform this optimization process. The principle was described by Linsker in 1988. The objective function is called the InfoMax objective. As the InfoMax objective is difficult to compute exactly, a related notion uses two models giving two outputs z 1 ( x ) , z 2 ( x ) {\displaystyle z_{1}(x),z_{2}(x)} , and maximizes the mutual information between these. This contrastive InfoMax objective is a lower bound to the InfoMax objective. Infomax, in its zero-noise limit, is related to the principle of redundancy reduction proposed for biological sensory processing by Horace Barlow in 1961, and applied quantitatively to retinal processing by Atick and Redlich. == Applications == (Becker and Hinton, 1992) showed that the contrastive InfoMax objective allows a neural network to learn to identify surfaces in random dot stereograms (in one dimension). One of the applications of infomax has been to an independent component analysis algorithm that finds independent signals by maximizing entropy. Infomax-based ICA was described by (Bell and Sejnowski, 1995), and (Nadal and Parga, 1995).
Read more →
Information Coding Classification

The Information Coding Classification (ICC) is a classification system covering almost all extant 6500 knowledge fields (knowledge domains). Its conceptualization goes beyond the scope of the well known library classification systems, such as Dewey Decimal Classification (DDC), Universal Decimal Classification (UDC), and Library of Congress Classification (LCC), by extending also to knowledge systems that so far have not afforded to classify literature. ICC actually presents a flexible universal ordering system for both literature and other kinds of information, set out as knowledge fields. From a methodological point of view, ICC differs from the above-mentioned systems along the following three lines: Its main classes are not based on disciplines but on nine live stages of development, so-called ontical levels. It breaks them roughly down into hierarchical steps by further nine categories which makes decimal number coding possible. The contents of a knowledge field is earmarked via a digital position scheme, which makes the first hierarchical step refer to the nine ontical levels (object areas as subject categories), and the second hierarchical step refer to nine functionally ordered form categories. Respective knowledge fields permit to step down by the same principle to a third and forth level, and even further to a fifth and sixth level. Finally, knowledge field subdivisions will have to conform to said digital position scheme. Hence, for a given knowledge field identical codes will mark identical categories under respective numbers of the coding system. This mnemotechnical aspect of the system helps memorizing and straightaway retrieving the whereabouts of respective interdisciplinary and transdisciplinary fields. The first two hierarchical levels may be regarded as a top- or upper ontology for ontologies and other applications. The terms of the first three hierarchical levels were set out in German and English in Wissensorganisation. Entwicklung, Aufgabe, Anwendung, Zukunft, on pp. 82 to 100. It was published in 2014 and available so far only in German. In the meantime, also the French terms of the knowledge fields have been collected. Competence for maintenance and further development rests with the German Chapter of the International Society for Knowledge Organization (ISKO) e.V. == Historical development == At the end of 1970, Prof. Alwin Diemer, Univ.of Düsseldorf proposed to Ingetraut Dahlberg to undertake a philosophical dissertation on The universal classification system of knowledge, its ontological, epistemological, and information theoretical foundations. Diemer had in mind an innovating ontological approach for such a system based on the whole spectrum of kinds of being and complying with epistemological requirements. The third requirement had already been taken up somehow in the Indian Colon Classification, yet it still called for explanations and additions. In 1974, the dissertation was published in German entitled Grundlagen universaler Wissensordnung. It started with conceptual clarifications, and why and how the term „universal“ was linked to knowledge, including knowledge fields, such as commodity science, artefacts, statistics, patents, standardization, communication, utility services et al. In chapter 3, six universal classification systems (DDC, UDC, LCC, BC, CC and BBK) were presented, analyzed and compared. While preparing the dissertation, Dahlberg started with elaborating the new universal system by first gleaning a lot of extant designations of knowledge fields from whatever available reference works. This was funded by the German Documentation Society (DGD) (1971-2) under the title of Order system of knowledge fields. In addition, the syllabuses of German universities and polytechniques were explored for relevant terms and documented (1975). Thereafter, it seemed necessary to add definitions from special dictionaries and encyclopediae; it soon appeared that the 12.500 terms included numerous synonyms, so that the whole collection boiled down to about 6.500 concept designations (Project Logstruktur, supported by the German Science Foundation (DFG) 1976-78). The outcome of this work was the formulation of 30 theses which ended up in 12 principles for the new system, published 40 years later under. These principles refer not only to theoretical foundations but also to structure and other organizational aspects of the whole array of knowledge fields. In 1974, the digital position scheme for field subdivision had already been developed to allow for classifying classification literature in the bibliographical section of the first issue of the Journal International Classification. In 1977, the entire ICC was ready for presentation at a seminar in Bangalore, India. A publication of the first three hierarchical levels appeared however only in 1982. It was applied to the bibliography of classification systems and thesauri in vol.1 of the International Classification and Indexing Bibliography; it has been updated. == Governing principles == These were published in full length in the book Wissensorganisation. Entwicklung, Aufgabe, Anwendung, Zukunft and the article Information Coding Classification. Geschichtliches, Prinzipien, Inhaltliches, hence it suffices to just mention their topics with some necessary additions. Principle 1: Concept theoretical approaches. Concepts are the contents of ICC, they are understood as being units of knowledge. The „birth“ of a concept. Where do the characteristics, the knowledge elements come from? How do conceptual relations arise? Principle 2: The four kinds of concept relations and their applications. Principle 3: Decimal numbers form the ICC codes as its universal language. Principle 4: The nine ontical levels of ICC. They were grouped under three captions: Prolegomena (1-3), life sciences (4-6) and human output (7-9): Structure and form Matter and energy Cosmos and earth Biosphere Anthroposphere Sociosphere Material products (economics and technology) Intellectual products (knowledge and information) Spiritual products (products of mind and culture) Principle 5: Knowledge fields are structured by categories, based on the Aristotelian form-categories, under a digital position scheme, a kind of scaling rule for subdividing a given field as follows: General area: problems, theories, principles (axiom and structure) Object area: objects, kinds, parts, properties of objects Activity area: methods, processes, activities Field properties or first characterization Persons or secondary characterization Societies or tertiary characterization Influences from outside Applications of the field to other fields Field information and synthesizing tasks The digital position scheme, called Systematifier, has also been used for structuring the entire system via the categories figuring on the upper zero level. An example of its application is the structure of the classification system for knowledge organization literature Gliederung der Klassifikationsliteratur. (A simplified version with an additional introduction is given in, p. 71) Principle 6: The ontical levels outlined under principle 4 conform to the „integrative level theory“ which means that every level is integrated in the following one. In addition, each knowledge area presumes the following one. Principle 7: The combination potential of knowledge fields (interdisciplinarity and transdisciplinarity)is determined by the digital position scheme. (Examples are given in, p. 103-4) Principle 8: The categories of the zero-level are general concepts, their possible subdivisions could once be used for classificatory statements. (These subdivisions still need elaboration) Principle 9 and 10: These relate to the combination potential of classificatory statements with space and time concepts. (Still to be elaborated) Principle 11: The system's mnemotechnical aspect relies on the fixed system position codes and on the 3x3 form- and subject-categories. Principle 12: The combination potential of system position 1, 8 and 9 make ICC to a self-networking system which complies with the present scientific development. == In matrix form == The first two levels of ICC can be represented by following matrix. The first hierarchical level of the 9 subject categories results from the first vertical array under codes 1-9. The second hierarchical level of subject categories is structured by the 9 functionally ordered form categories, listed in the first horizontal line under codes 01-09. Some exceptions are mentioned in principle 7. == Research == === Exploration of automatic classification === For classifying web documents as conceived by Jens Hartmann, University of Karlsruhe, Prof.Walter Koch, University of Graz, has explored in his Institute for Applied Information Technology Research Society (AIT) the application of ICC to automatically classifying metadata of some 350.000 documents. This was facilitated by data generated within the framework of an E
Read more →
Steerable filter

In image processing, a steerable filter is an orientation-selective filter that can be computationally rotated to any direction. Rather than designing a new filter for each orientation, a steerable filter is synthesized from a linear combination of a small, fixed set of "basis filters". This approach is efficient and is widely used for tasks that involve directionality, such as edge detection, texture analysis, and shape-from-shading. The principle of steerability has been generalized in deep learning to create equivariant neural networks, which can recognize features in data regardless of their orientation or position. == Example == A common example of a steerable filter is the first derivative of a two-dimensional Gaussian function. This filter responds strongly to oriented image features like edges. It is constructed from two basis filters: the partial derivative of the Gaussian with respect to the horizontal direction ( x {\displaystyle x} ) and the vertical direction ( y {\displaystyle y} ). If G ( x , y ) {\displaystyle G(x,y)} is the Gaussian function, and G x {\displaystyle G_{x}} and G y {\displaystyle G_{y}} are its partial derivatives (which measure the rate of change in the x {\displaystyle x} and y {\displaystyle y} directions, respectively), a new filter G θ {\displaystyle G_{\theta }} oriented at an angle θ {\displaystyle \theta } can be synthesized with the formula: G θ = cos ⁡ ( θ ) G x + sin ⁡ ( θ ) G y {\displaystyle G_{\theta }=\cos(\theta )G_{x}+\sin(\theta )G_{y}} Here, the basis filters G x {\displaystyle G_{x}} and G y {\displaystyle G_{y}} are weighted by cos ⁡ ( θ ) {\displaystyle \cos(\theta )} and sin ⁡ ( θ ) {\displaystyle \sin(\theta )} to "steer" the filter's sensitivity to the desired orientation. This is equivalent to taking the dot product of the direction vector ( cos ⁡ θ , sin ⁡ θ ) {\displaystyle (\cos \theta ,\sin \theta )} with the filter's gradient, ( G x , G y ) {\displaystyle (G_{x},G_{y})} . == Generalization in deep learning: Equivariant neural networks == The concept of steerability is foundational to equivariant neural networks, a class of models in deep learning designed to understand symmetries in data. A network is considered equivariant to a transformation (like a rotation) if transforming the input and then passing it through the network produces the same result as passing the input through the network first and then transforming the output. Formally, for a transformation T {\displaystyle T} and a network f {\displaystyle f} , this property is defined as f ( T ( input ) ) = T ( f ( input ) ) {\displaystyle f(T({\text{input}}))=T(f({\text{input}}))} . This built-in understanding of geometry makes models more data-efficient. For example, a network equivariant to rotation does not need to be shown an object in multiple orientations to learn to recognize it; it inherently understands that a rotated object is still the same object. This leads to better generalization and performance, particularly in scientific applications. === Mathematical foundation === Equivariant neural networks use principles from group theory to create operations that respect geometric symmetries, such as the SO(3) group for 3D rotations or the E(3) group for rotations and translations. Instead of learning standard filter kernels, these networks learn how to combine a fixed set of basis kernels. These basis functions are chosen so that they have well-defined behaviors under transformation groups. Spherical harmonics are frequently used as basis functions because they form a complete set of functions that behave predictably under rotation, making them ideal for creating steerable 3D kernels. Features within the network are treated as geometric tensors, which are mathematical objects (like scalars or vectors) that are "typed" by their behavior under transformations. These types correspond to the irreducible representations (irreps) of the group. The tensor product is the fundamental operation used to combine these typed features in a way that preserves equivariance, guaranteeing that the network as a whole respects the desired symmetry. Frameworks like e3nn simplify the construction of these networks by automating the complex mathematics of irreducible representations and tensor products. === Applications === Steerable and equivariant models are highly effective for problems with inherent geometric symmetries. Examples include: Protein structure analysis: SE(3)-equivariant networks can process 3D molecular structures while respecting their rotational and translational symmetries. 3D Point cloud processing: Rotation-equivariant filters built from steerable spherical functions can perform tasks like 3D shape classification. Computational chemistry: E(3)-equivariant graph neural networks are used to model interatomic potentials for molecular dynamics simulations, creating highly accurate and data-efficient models of physical systems.
Read more →
Lumpers and splitters

Lumpers and splitters are opposing factions in any academic discipline that has to place individual examples into rigorously defined categories. The lumper–splitter problem occurs when there is the desire to create classifications and assign examples to them, for example, schools of literature, biological taxa, and so on. A "lumper" is a person who assigns examples broadly, judging that differences are not as important as signature similarities. A "splitter" makes precise definitions, and creates new categories to classify samples that differ in key ways. == Origin of the terms == The earliest known use of these terms was thought to be by Charles Darwin, in a letter to Joseph Dalton Hooker in 1857: "It is good to have hair-splitters & lumpers". But according to research done by the deputy director at NCSE, Glenn Branch, the credit is due to naturalist Edward Newman who wrote in 1845, "The time has arrived for discarding imaginary species, and the duty of doing this is as imperative as the admission of new ones when such are really discovered. The talents described under the respective names of 'hair-splitting' and 'lumping' are unquestionably yielding their power to the mightier power of Truth." They were then introduced more widely by George G. Simpson in his 1945 work The Principles of Classification and a Classification of Mammals. As he put it: splitters make very small units – their critics say that if they can tell two animals apart, they place them in different genera ... and if they cannot tell them apart, they place them in different species. ... Lumpers make large units – their critics say that if a carnivore is neither a dog nor a bear, they call it a cat. A later use can be found in the title of a 1969 paper "On lumpers and splitters ..." by the medical geneticist Victor McKusick. Reference to lumpers and splitters in the humanities appeared in a debate in 1975 between J. H. Hexter and Christopher Hill, in the Times Literary Supplement. It followed from Hexter's detailed review of Hill's book Change and Continuity in Seventeenth Century England, in which Hill developed Max Weber's argument that the rise of capitalism was facilitated by Calvinist Puritanism. Hexter objected to Hill's "mining" of sources to find evidence that supported his theories. Hexter argued that Hill plucked quotations from sources in a way that distorted their meaning. Hexter explained this as a mental habit that he called "lumping". According to him, "lumpers" rejected differences and chose to emphasise similarities. Any evidence that did not fit their arguments was ignored as aberrant. Splitters, by contrast, emphasised differences, and resisted simple schemes. While lumpers consistently tried to create coherent patterns, splitters preferred incoherent complexity. == Usage in various fields == === Biology === The categorisation and naming of a particular species should be regarded as a hypothesis about the evolutionary relationships and distinguishability of that group of organisms. As further information comes to hand, the hypothesis may be confirmed or refuted. Sometimes, especially in the past when communication was more difficult, taxonomists working in isolation have given two distinct names to individual organisms later identified as the same species. When two named species are agreed to be of the same species, the older species name is almost always retained dropping the newer species name honouring a convention known as "priority of nomenclature". This form of lumping is technically called synonymisation. Dividing a taxon into multiple, often new, taxa is called splitting. Taxonomists are often referred to as "lumpers" or "splitters" by their colleagues, depending on their personal approach to recognizing differences or commonalities between organisms. For example, the number of genera used in Pteridophyte Phylogeny Group I (PPG I) has proved controversial. PPG I uses 18 lycophyte and 319 fern genera. The earlier system put forward by Smith et al. (2006) had suggested a range of 274 to 312 genera for ferns alone. By contrast, the system of Christenhusz & Chase (2014) used 5 lycophyte and about 212 fern genera. The number of fern genera was further reduced to 207 in a subsequent publication. Defending PPG I, Schuettpelz et al. (2018) argue that the larger number of genera is a result of "the gradual accumulation of new collections and new data" and hence "a greater appreciation of fern diversity and ... an improved ability to distinguish taxa". They also argue that the number of species per genus in the PPG I system is already higher than in other groups of organisms (about 33 species per genus for ferns as opposed to about 22 species per genus for angiosperms) and that reducing the number of genera as Christenhusz and Chase propose yields the excessive number of about 50 species per genus for ferns. In response, Christenhusz and Chase (2018) argue that the excessive splitting of genera destabilises the usage of names and will lead to greater instability in future, and that the highly split genera have few if any characters that can be used to recognise them, making identification difficult, even to generic level. They further argue that comparing numbers of species per genus in different groups is "fundamentally meaningless". === History === In history, lumpers are those who tend to create broad definitions that cover large periods of time and many disciplines, whereas splitters want to assign names to tight groups of inter-relationships. Lumping tends to create a more and more unwieldy definition, with members having less and less mutually in common. This can lead to definitions which are little more than conventionalities, or groups which join fundamentally different examples. Splitting often leads to "distinctions without difference", ornate and fussy categories, and failure to see underlying similarities. For example, in the arts, "Romantic" can refer specifically to a period of German poetry roughly from 1780 to 1810, but would exclude the later work of Goethe, among other writers. In music it can mean every composer from Hummel through Rachmaninoff, plus many that came after. === Software modelling === Software engineering often proceeds by building models (sometimes known as model-driven architecture). A lumper is keen to generalise, and produces models with a small number of broadly defined objects. A splitter is reluctant to generalise, and produces models with a large number of narrowly defined objects. Conversion between the two styles is not necessarily symmetrical. For example, if error messages in two narrowly defined classes behave in the same way, the classes can be easily combined. But if some messages in a broad class behave differently, every object in the class must be examined before the class can be split. This illustrates the principle that "splits can be lumped more easily than lumps can be split". === Language classification === There is no agreement among historical linguists about what amount of evidence is needed for two languages to be safely classified in the same language family. For this reason, many proposed language families have had lumper–splitter controversies, including Altaic, Pama–Nyungan, Nilo-Saharan, and most of the larger families of the Americas. At a completely different level, the splitting of a mutually intelligible dialect continuum into different languages, or lumping them into one, is also an issue that continually comes up, though the consensus in contemporary linguistics is that there is no completely objective way to settle the question. Splitters regard the comparative method (meaning not comparison in general, but only reconstruction of a common ancestor or protolanguage) as the only valid proof of kinship, and consider genetic relatedness to be the question of interest. American linguists of recent decades tend to be splitters. Lumpers are more willing to admit techniques like mass lexical comparison or lexicostatistics, and mass typological comparison, and to tolerate the uncertainty of whether relationships found by these methods are the result of linguistic divergence (descent from common ancestor) or language convergence (borrowing). Much long-range comparison work has been from Russian linguists belonging to the Moscow School of Comparative Linguistics, most notably Vladislav Illich-Svitych and Sergei Starostin. In the United States, Greenberg and Ruhlen's work has been met with little acceptance from linguists. Earlier American linguists like Morris Swadesh and Edward Sapir also pursued large-scale classifications like Sapir's 1929 scheme for the Americas, accompanied by controversy similar to that today. === Religious studies === Paul F. Bradshaw suggests that the same principles of lumping and splitting apply to the study of early Christian liturgy. Lumpers, who tend to predominate in this field, try to find a single line of successive texts from the apostolic age to the
Read more →
Cognitive tutor

A cognitive tutor is a particular kind of intelligent tutoring system that utilizes a cognitive model to provide feedback to students as they are working through problems. This feedback will immediately inform students of the correctness, or incorrectness, of their actions in the tutor interface; however, cognitive tutors also have the ability to provide context-sensitive hints and instruction to guide students towards reasonable next steps. == Introduction == The name of Cognitive Tutor now usually refers to a particular type of intelligent tutoring system produced by Carnegie Learning for high school mathematics based on John Anderson's ACT-R theory of human cognition. However, cognitive tutors were originally developed to test ACT-R theory for research purposes since the early 1980s and they are developed also for other areas and subjects such as computer programming and science. Cognitive Tutors can be implemented into classrooms as a part of blended learning that combines textbook and software activities. The Cognitive Tutor programs utilize cognitive model and are based on model tracing and knowledge tracing. Model tracing means that the cognitive tutor checks every action performed by students such as entering a value or clicking a button, while knowledge tracing is used to calculate the required skills students learned by measuring them on a bar chart called Skillometer. Model tracing and knowledge tracing are essentially used to monitor students' learning progress, guide students to correct path to problem solving, and provide feedback. The Institute of Education Sciences published several reports regarding the effectiveness of Carnegie Cognitive Tutor. A 2013 report concluded that Carnegie Learning Curricula and Cognitive Tutor was found to have mixed effects on mathematics achievement for high school students. The report identified 27 studies that investigate the effectiveness of Cognitive Tutor, and the conclusion is based on 6 studies that meet What Works Clearinghouse standards. Among the 6 studies included, 5 of them show intermediate to significant positive effect, while 1 study shows statistically significant negative effect. Another report published by Institute of Education Sciences in 2009 found that Cognitive Tutor Algebra I to have potentially positive effects on math achievement based on only 1 study out of 14 studies that meets What Works Clearinghouse standards. It should be understood that What Works Clearinghouse standards call for relatively large numbers of participants, true random assignments to groups, and for a control group receiving either no treatment or a different treatment. Such experimental conditions are difficult to meet in schools, and thus only a small percentage of studies in education meet the standards of this clearinghouse, even though they may still be of value. == Theoretical foundations == === Four-component architecture === Intelligent tutoring systems (ITS) have a four-component architecture: a domain model, a student model, a tutoring model and an interface component. The domain model contains the rules, concepts, and knowledge related to the domain to be learned. It helps to evaluate students' performance and detect students' errors by setting a standard of domain expertise. The student model, the central component of an ITS, is expected to contain knowledge about the students: their cognitive and affective states, and their progress as they learn. The function of the student model is threefold: to gather data from and about the learner, to represent the learner's knowledge and learning process, and to perform diagnostics of a student's knowledge and select optimal pedagogical strategies. The tutoring model uses the data gained from the domain model and student model to make decisions about tutoring strategies such as whether or not to intervene, or when and how to intervene. Functions of the tutoring model include instruction delivery and content planning. The interface component reflects the decisions made by the tutoring model in different forms such as Socratic dialogs, feedback and hints. Students interact with the tutor through the learning interface, also known as communication. The interface provides domain knowledge elements. === Cognitive model === A cognitive model replicates the domain knowledge and skills comparable to that of a human expert or an advanced student of the domain. A cognitive model enables intelligent tutoring systems to respond to problem-solving situations in a way similar to a human tutor. A tutoring system adopting a cognitive model is called a cognitive tutor. A cognitive model is an expert system that generates a multitude of solutions to the problems presented to students. The cognitive model is used to trace each student's solution through complex alternative solution paths, enabling the tutor to provide step-by-step feedback and advice, and to maintain a targeted model of the student's knowledge based on student performance. === Cognitive Tutors === Cognitive Tutors provide step-by-step guidance as a learner develops a complex problem-solving skill through practice. Typically, cognitive tutors provide such forms of support as: (a) a problem-solving environment that is designed rich and "thinking visible"; (b) step-by-step feedback on student performance; (c) feedback messages specific to errors; (d) context-specific next-step hints at student's request, and (e) individualized problem selection. Cognitive Tutors accomplish two of the principal tasks characteristic of human tutoring: (1) monitors the student's performance and providing context-specific individual instruction, and (2) monitors the student's learning and selects appropriate problem-solving activities. Both cognitive model and two underlying algorithms, model tracing and knowledge tracing, are used to monitor the student's learning. In model tracing, the cognitive tutor uses the cognitive model in complex problems to follow the student's individual path and provide prompt accuracy feedback and context-specific advice. In knowledge tracing, the cognitive tutor uses a Bayesian Knowledge Tracing method of evaluating the student's knowledge and uses this student model to select appropriate problems for each student. === Cognitive architecture === Cognitive tutor development is guided by ACT-R cognitive architecture, which specifies the underlying framework developing the cognitive model or expert component of a cognitive tutor. ACT-R, a member of the ACT family, is the most recent cognitive architecture, devoted primarily to modelling human behavior. ACT-R includes a declarative memory of factual knowledge and a procedural memory of production rules. The architecture functions by matching productions on perceptions and facts, mediated by the real-valued activation levels of objects, and executing them to affect the environment or alter declarative memory. ACT-R has been used to model psychological aspects such as memory, attention, reasoning, problem solving, and language processing. == Application and utilization == The first real world applications of cognitive tutors were in the 1980s and involved a geometry proof tutor used by high school students and a LISP programming tutor used by college students in a mini course in introductory programming course at Carnegie Mellon University. Since then, cognitive tutors have been used in a variety of scenarios, with a few organizations developing their own cognitive tutor programs. These programs have been used with students spanning elementary school through university level, though primarily in the subject areas of Computer Programming, Mathematics, and Science. One of the first organizations to develop a system for use within the school system was the PACT Center at Carnegie Mellon University. Their aim was to "...develop systems that provide individualized assistance to students as they work on challenging real-world problems in complex domains such as computer programming, algebra and geometry". PACT's most successful product was the Cognitive Tutor Algebra course. Originally created in the early 1990s, this course was in use in 75 schools through the U.S. by 1999, and then its spin-off company, Carnegie Learning, now offers tutors to thousands of schools in the U.S. The Carnegie Mellon Cognitive Tutor has been shown to raise students' math test scores in high school and middle-school classrooms, and their Algebra course was designated one of five exemplary curricula for K-12 mathematics educated by the US Department of Education. There were several research projects conducted by the PACT Center to utilize Cognitive tutor for courses in Excel and to develop an intelligent tutoring system for algebra expression writing, called Ms. Lindquist. Further, in 2005, Carnegie Learning released Bridge to Algebra, a product intended for middle schools that was piloted in over 100 schools. Cognitive tutoring software is continuing to be used.
Read more →
Agent Communications Language

Agent Communication Language (ACL) consists of computer communication protocols that are intended for AI agents to communicate with each other. In 2007, protocols of this nature were proposed which include: FIPA-ACL (by the Foundation for Intelligent Physical Agents, a standardization consortium) KQML (Knowledge Query and Manipulation Language) After the surge in Generative AI with the use of Transformers and Large language models, the definition of agent has shifted away from physical agents to signify software systems built using the principles of Agentic AI. A new protocol to emerge in this area is Natural Language Interaction Protocol (NLIP). NLIP is an application-level communication protocol defined between AI Agents or between a human and an AI agent. Ecma International; a standards body which develops and publishes international standards for the information and communication industry; published on 10 December 2025 five new standards and one technical report defining the Natural Language Interaction Protocol (NLIP). As a result, we can define agent communication protocols into two categories: ontology based agent communication protocols and generative AI based agent communication protocols. Ontology based agent communication protocols use a common ontology to be used between agents. An ontology is a part of the agent's knowledge base that describes what kind of things an agent can deal with and how they are related to each other. FIPA-ACL and KQML are examples of such protocols. These protocols rely on speech act theory developed by Searle in the 1960s and enhanced by Winograd and Flores in the 1970s. They define a set of performatives, also called Communicative Acts, and their meaning (e.g. ask-one). The content of the performative is not standardized, but varies from system to system. Implementation support of FIPA-ACL is included in FIPA-OS and Jade. Generative AI based agent communication protocols such as NLIP do not require a shared ontology among communicating agents. In its stead, they use generative AI models to translate natural language text, images, videos or other modalities of data into a local ontology. This provides for hot-extensibility where the same protocol can be used for multiple communication needs, and simplifies version control since different agents can use different versions of a shared ontology. NLIP has been designed with security considerations in mind. The specification and standards comprising NLIP are developed and maintained by Ecma Technical Community 56.
Read more →
Sentence embedding

In natural language processing, a sentence embedding is a representation of a sentence as a vector of numbers which encodes meaningful semantic information. State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token prepended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks. In practice however, BERT's sentence embedding with the [CLS] token achieves poor performance, often worse than simply averaging non-contextual word embeddings. SBERT later achieved superior sentence embedding performance by fine tuning BERT's [CLS] token embeddings through the usage of a siamese neural network architecture on the SNLI dataset. Other approaches are loosely based on the idea of distributional semantics applied to sentences. Skip-Thought trains an encoder-decoder structure for the task of neighboring sentences predictions; this has been shown to achieve worse performance than approaches such as InferSent or SBERT. An alternative direction is to aggregate word embeddings, such as those returned by Word2vec, into sentence embeddings. The most straightforward approach is to simply compute the average of word vectors, known as continuous bag-of-words (CBOW). However, more elaborate solutions based on word vector quantization have also been proposed. One such approach is the vector of locally aggregated word embeddings (VLAWE), which demonstrated performance improvements in downstream text classification tasks. == Applications == In recent years, sentence embedding has seen a growing level of interest due to its applications in natural language queryable knowledge bases through the usage of vector indexing for semantic search. LangChain for instance utilizes sentence transformers for purposes of indexing documents. In particular, an indexing is generated by generating embeddings for chunks of documents and storing (document chunk, embedding) tuples. Then given a query in natural language, the embedding for the query can be generated. A top k similarity search algorithm is then used between the query embedding and the document chunk embeddings to retrieve the most relevant document chunks as context information for question answering tasks. This approach is also known formally as retrieval-augmented generation. Though not as predominant as BERTScore, sentence embeddings are commonly used for sentence similarity evaluation which sees common use for the task of optimizing a Large language model's generation parameters is often performed via comparing candidate sentences against reference sentences. By using the cosine-similarity of the sentence embeddings of candidate and reference sentences as the evaluation function, a grid-search algorithm can be utilized to automate hyperparameter optimization. == Evaluation == A way of testing sentence encodings is to apply them on Sentences Involving Compositional Knowledge (SICK) corpus for both entailment (SICK-E) and relatedness (SICK-R). In the best results are obtained using a BiLSTM network trained on the Stanford Natural Language Inference (SNLI) Corpus. The Pearson correlation coefficient for SICK-R is 0.885 and the result for SICK-E is 86.3. A slight improvement over previous scores is presented in: SICK-R: 0.888 and SICK-E: 87.8 using a concatenation of bidirectional Gated recurrent unit.
Read more →
RuleML

RuleML is a global initiative, led by a non-profit organization RuleML Inc., that is devoted to advancing research and industry standards design activities in the technical area of rules that are semantic and highly inter-operable. The standards design takes the form primarily of a markup language, also known as RuleML. The research activities include an annual research conference, the RuleML Symposium, also known as RuleML for short. Founded in fall 2000 by Harold Boley, Benjamin Grosof, and Said Tabet, RuleML was originally devoted purely to standards design, but then quickly branched out into the related activities of coordinating research and organizing an annual research conference starting in 2002. The M in RuleML is sometimes interpreted as standing for Markup and Modeling. The markup language was developed to express both forward (bottom-up) and backward (top-down) rules in XML for deduction, rewriting, and further inferential-transformational tasks. It is defined by the Rule Markup Initiative, an open network of individuals and groups from both industry and academia that was formed to develop a canonical Web language for rules using XML markup and transformations from and to other rule standards/systems. Markup standards and initiatives related to RuleML include: Rule Interchange Format (RIF): The design and overall purpose of W3C's Rule Interchange Format (RIF) industry standard is based primarily on the RuleML industry standards design. Like RuleML, RIF embraces a multiplicity of potentially useful rule dialects that nevertheless share common characteristics. RuleML Technical Committee from Oasis-Open: An industry standards effort devoted to legal automation utilizing RuleML. Semantic Web Rule Language (SWRL): An industry standards design, based primarily on an early version of RuleML, whose development was funded in part by the DARPA Agent Markup Language (DAML) research program. Semantic Web Services Framework, particularly its Semantic Web Services Language: An industry standards design, based primarily on a medium-mature version of RuleML, whose development was funded in part by the DARPA Agent Markup Language (DAML) research program and the WSMO research effort of the EU. Mathematical Markup Language (MathML): However, MathML's Content Markup is better suited for defining functions rather than relations or general rules Predictive Model Markup Language (PMML): With this XML-based language one can define and share various models for data-mining results, including association rules Attribute Grammars in XML (AG-markup): For AG's semantic rules, there are various possible XML markups that are similar to Horn-rule markup Extensible Stylesheet Language Transformations (XSLT): This is a restricted term-rewriting system of rules, written in XML, for transforming XML documents into other text documents
Read more →
Jan Leike

Jan Leike (born 1986 or 1987) is an AI alignment researcher who has worked at DeepMind and OpenAI. He joined Anthropic in May 2024. == Education == Jan Leike obtained his undergraduate degree from the University of Freiburg in Germany. After earning a master's degree in computer science, he pursued a PhD in machine learning at the Australian National University under the supervision of Marcus Hutter. == Career == Leike made a six-month postdoctoral fellowship at the Future of Humanity Institute before joining DeepMind to focus on empirical AI safety research, where he collaborated with Shane Legg. === OpenAI === In 2021, Leike joined OpenAI. In June 2023, he and Ilya Sutskever became the co-leaders of the newly introduced "superalignment" project, which aimed to determine how to align future artificial superintelligences within four years to ensure their safety. This project involved automating AI alignment research using relatively advanced AI systems. At the time, Sutskever was OpenAI's Chief Scientist, and Leike was the Head of Alignment. Leike was featured in Time's list of the 100 most influential personalities in AI, both in 2023 and in 2024. In May 2024, Leike announced his resignation from OpenAI, following the departure of Sutskever, Daniel Kokotajlo and several other AI safety employees from the company. Leike wrote that "Over the past years, safety culture and processes have taken a backseat to shiny products", and that he "gradually lost trust" in OpenAI's leadership. In May 2024, Leike joined Anthropic, an AI company founded by former OpenAI employees.
Read more →
Seeing AI

Seeing AI is an artificial intelligence application developed by Microsoft for iOS. Seeing AI uses the device camera to identify people and objects, and then the app audibly describes those objects for visually impaired people. == Capabilities == Seeing AI is primarily used to describe short text, documents, products, people, currency scenery, colors, handwriting and light. The app can scan a barcode to describe a product and uses sounds to assist the user in focusing on the barcode. When the app describes people, it attempts to estimate the person's age, gender, and emotional status. Additionally, in a test run by German journalists in December 2019, Seeing AI apparently used some sort of facial recognition system to identify people on photographs by name. Some functions are performed on the device, however more complex functions such as describing a scene or recognizing handwriting require an Internet connection. In December 2017, Seeing AI introduced the ability for currency recognition for US and Canadian dollar, British pounds and Euros. In December 2019, Seeing AI added support for five more languages, Dutch, French, German, Japanese, Spanish. Seeing AI is available in 70 countries such as Brazil, Argentina, Australia, Canada, Egypt, Albania, Bhutan, etc. Supported on iPhone 5C, 5S and later best performance with iPhone 6S, SE and later models
Read more →
Perusall

Perusall is a social web annotation tool intended for use by students at schools and universities. It allows users to annotate the margins of a text in a virtual group setting that is similar to social media—with upvoting, emojis, chat functionality, and notification. It also includes automatic AI grading. == History == Perusall began as a research project at Harvard University. It later became an educational product for students and teachers. As of 2024, Perusall states more than 5 million students have used the tool at over 5,000 educational institutions in 112 countries." == Functionality == Perusall integrates with learning management systems such as Moodle, Canvas and Blackboard to aid with collaborative annotation. The tool supports annotation of a range of media including text, images, equations, videos, PDFs and snapshots of webpages.
Read more →
Sarvam AI

Sarvam AI is an Indian artificial intelligence company headquartered in Bengaluru, Karnataka. Founded in 2023, the company develops large language models (LLMs) and multimodal AI systems with a focus on Indian languages and region-specific use cases. The company has received venture capital backing and has participated in government-supported AI initiatives, including India's sovereign large language model programme under the IndiaAI Mission. == History == Sarvam AI was founded in August 2023 by Vivek Raghavan and Pratyush Kumar, who were previously associated with AI4Bharat at the Indian Institute of Technology Madras. In December 2023, the company announced a combined seed and Series A funding round of approximately US$41 million. The round was led by Lightspeed Venture Partners, with participation from Peak XV Partners and Khosla Ventures. In April 2025, the Ministry of Electronics and Information Technology (MeitY) selected Sarvam AI as one of the companies to develop an indigenous foundational model under the IndiaAI Mission. As part of the initiative, the company received access to government-supported computing infrastructure, including GPUs allocated for model training over a specified period. In February 2026, Sarvam AI introduced two large language models at the AI Impact Summit held at Bharat Mandapam, New Delhi. == Products and technology == Sarvam AI develops language models trained on datasets that include multiple Indian languages and code-mixed text. The company uses mixture-of-experts (MoE) architectures in some of its models. === Foundational language models === On 18 February 2026, the company announced the release of two foundational models: Sarvam-30B – A 30-billion parameter model based on a mixture-of-experts design. According to company disclosures reported by the media, the model activates approximately 1 billion parameters per token and supports a 32,000-token context window. Sarvam-105B – A 105-billion parameter model activating approximately 9 billion parameters per token, with a 128,000-token context window. The model is positioned for complex reasoning and enterprise applications. On 20th February 2026, the company released a beta version of the Sarvam-105B model which is named Indus. It is available on the Apple App Store, Google Play Store and the web. === Speech and vision systems === Sarvam AI has also developed multimodal systems including speech-to-text and vision-language models. Its speech model, referred to as Saaras V3 in company materials, supports multiple Indian languages. The company has also introduced a vision-language model known as Sarvam Vision, intended for document understanding and optical character recognition (OCR) in Indian scripts. === Devices === 'Sarvam Kaze' is an indigenous AI-powered wearable glass that listens, understands, and captures what users see the world through their eyes in real time. The device supports more than 10 Indian languages, enabling voice-based interaction and potentially real-time translation. The company plans to launch the device in May 2026. == Startup support == In March 2026, Sarvam AI launched the Sarvam Startup Program, an initiative providing selected early-stage companies with 6–12 months of API credits scaled to their needs, priority engineering support, and access to production infrastructure for developing multilingual AI applications in areas such as speech, translation, and large language models. == Open-source release == In February 2026, Sarvam AI announced and open-sourced two large language models: Sarvam 30B (30 billion parameters) and Sarvam 105B (105 billion parameters, using a Mixture-of-Experts architecture with 10.3 billion active parameters). Both models were trained from scratch on datasets focused on Indian languages and support advanced reasoning, multilingual tasks, mathematics, and coding. The models are hosted on Hugging Face under the Apache License and are intended for enterprise and developer applications in Indian languages. The models were subsequently released as open source under the Apache License 2.0, with model weights made available on Hugging Face (sarvamai/sarvam-30b and sarvamai/sarvam-105b) and AIKosh in early March 2026. == Government and institutional collaborations == In 2025, Sarvam AI was selected to contribute to India's sovereign AI model initiative under the IndiaAI Mission. The initiative aims to support domestic AI infrastructure and model development. In March 2025, the Unique Identification Authority of India (UIDAI) announced a collaboration with Sarvam AI to integrate AI-based voice interactions and multilingual support into Aadhaar-related services. Sarvam AI has also worked with AI4Bharat and academic institutions on language datasets and speech research projects. == Industry participation == Sarvam AI presented its foundational models at the India AI Impact Summit 2026 in New Delhi. The company has also been listed among Indian members of the AI Alliance, a consortium focused on open-source artificial intelligence initiatives. == List of models ==
Read more →
Cortica

Headquartered in Tel Aviv Cortica utilizes unsupervised learning methods to recognize and analyze digital images and video. The technology developed by the Cortica team is based on research of the function of the human brain. == Company Founding == Cortica was founded in 2007 by Igal Raichelgauz, Karina Odinaev and Yehoshua Zeevi. Together, the founders developed the company’s core technology while at Technion – Israel Institute of Technology. By combining discoveries in neuroscience with developments in computer programming, the team created technology that possesses the ability to interpret large amounts of visual data with increased accuracy. This technology, called Image2Text, is based on the founders’ work in digitally replicating cortical neural networks’ ability to identify complex patterns within massive quantities of ambiguous and noisy data. Cortica’s offerings have application in the automotive industry, media industries, as well as the smart city and medical industries. Industry experts suggest that the self-driving automotive industry alone will be worth upwards of $7 trillion while each connected car is expected to generate 4,000 GB of data per day. Beyond that, industry analysts expect the proliferation of surveillance cameras to continue leading to an expected 2,500 Petabytes of data being generated daily by new surveillance cameras. Cortica operates in these high scale industries. The company currently employs professionals from many domains including AI researchers as well as veterans of intelligence units within the Israeli Defense Forces. == Research and Technology == In 2006, Founders Raichelgauz, Odinaev, and Zeevi shared their findings with the 28th IEEE EMBS Annual International Conference in New York in a paper titled, “Natural Signal Classification by Neural Cliques and Phase-Locked Attractors”. That same year, the team also published “Cliques in Neural Ensembles as Perception Carriers" CB Insights recently identified Cortica as the number one patent holder among AI companies. Cortica is researching to develop a machine-learning driving system which can identify objects and pedestrians. Connecting to it, Elon Musk has been rumored to partner with Cortica for his electric car company, Tesla. However, Tesla denies it stating that Musk did not discuss a collaboration with artificial intelligence firm Cortica. == Funding == Cortica raised $7 million in its Series A funding round, announced in August 2012. Investors included Horizons Ventures (the investment firm of Hong Kong billionaire Li Ka-Shing), and Ynon Kreiz, the former chairman and CEO of the Endemol Group. In May 2013, it was announced that Cortica had raised $1.5 million from Russian firm Mail.ru Group. It later transpired that this was a part of Cortica's Series B funding round for $6.4 million, announced in June 2013. The round was led by Horizons Ventures, with participation from the Russian firm Mail.ru Group and other angel investors. In its fourth funding round, Cortica has raised $20 million, bringing the total investments to $38 million. According to a report from The Israeli lead Daily economic newspaper, TheMarker, the fourth round was led by a strategic Chinese investor who will probably help the company expand into the Asian market. == Media coverage == GigaOm listed Cortica as one of the top deep learning startups in a November 2013 article surveying the field, along with AlchemyAPI, Ersatz, and Semantria. Business Insider ranked Cortica as one of the coolest tech companies in Israel. CB Insights has identified Cortica as the top patent holding AI company. In 2017 several leading automotive media outlets covered the launch of Cortica's automotive business unit
Read more →