AI Chatbot Interface

AI Chatbot Interface — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Message queuing service

    Message queuing service

    A message queueing service is a message-oriented middleware or MOM deployed in a compute cloud using software as a service model. Service subscribers access queues and or topics to exchange data using point-to-point or publish and subscribe patterns. It's important to differentiate between event-driven and message-driven (aka queue driven) services: Event-driven services (e.g. AWS SNS) are decoupled from their consumers. Whereas queue / message driven services (e.g. AWS SQS) are coupled with their consumers. Message queues can be a good buffer to handle spiky workloads but they have a finite capacity. According to Gregor Hohpe, message queues require proper mechanisms (aka flow controls) to avoid filling the queue beyond its manageable capacity and to keep the system stable. == Ordering Guarantees in Message Queues == Amazon SQS FIFO and Azure Service Bus sessions are queue-based messaging systems that provide ordering guarantees within a message group or session attempt but do not necessarily guarantee ordered delivery in cases of retries or failures. In SQS FIFO, messages in the same message group are processed in order, with subsequent messages held until the preceding message is successfully processed or moved to the dead-letter queue (DLQ). Once a message is placed in the DLQ, it is no longer retried, creating a gap in the sequence. However, the remaining messages continue to be delivered in order. Azure Service Bus sessions function similarly by maintaining ordering within a session, provided a single consumer processes messages sequentially. The implementation differs from SQS FIFO but follows the same fundamental ordering principle. In contrast, Apache Kafka is a distributed log-based messaging system that guarantees ordering within individual partitions rather than across the entire topic. Unlike queue-based systems, Kafka retains messages in a durable, append-only log, allowing multiple consumers to read at different offsets. Kafka uses manual offset management, giving consumers control over retries and failure handling. If a consumer fails to process a message, it can delay committing the offset, preventing further progress in that partition while other partitions remain unaffected. This partition-based design enables fault isolation and parallel processing while allowing ordering to be maintained within partitions, depending on consumer handling. == Vendors == Apache Kafka Apache Kafka is a distributed system consisting of servers that store and forward messages between producer client and consumer applications. IBM MQ IBM MQ offers a managed service that can be used on IBM Cloud and Amazon Web Services. Microsoft Azure Service Bus Service Bus offers queues, topics & subscriptions, and rules/actions in order to support publish-subscribe, temporal decoupling, and load balancing scenarios. Azure Service Bus is built on AMQP allowing any existing AMQP 1.0 client stack to interact with Service Bus directly or via existing .Net, Java, Node, and Python clients. Standard and Premium tiers allow for pay as you go or isolated resources at massive scale. Oracle Messaging Cloud Service This service provides a messaging solution for applications for asynchronous communication and is influenced by the Java Message Service (JMS) API specification. Any application platform that understands HTTP can also use Oracle Messaging Cloud Service through the REST interface. For Java applications, Oracle Messaging Cloud Service provides a Java library that implements and extends the JMS 1.1 interface. The Java library implements the JMS API by acting as a client of the REST API. Amazon Simple Queue Service Supports messages natively up to 256K, or up to 2GB by transmitting payload via S3. Highly scalable, durable and resilient. Provides loose-FIFO and 'at least once' delivery in order to provide massive scale. Supports REST API and optional Java Message Service client. Low latency. Utilizes Amazon Web Services. IronMQ Supports messages up to 64k; guarantees order; guarantees once only delivery; no delays retrieving messages. Supports REST API and beanstalkd open source protocol. Runs on multiple clouds including AWS and Rackspace. Scaling must be managed by user. RabbitMQ RabbitMQ is a reliable and mature messaging and streaming broker, which is easy to deploy on cloud environments, on-premises, and on your local machine. Supports AMQP, STOMP, MQTT StormMQ Open platform supports messages up to 50Mb. Uses AMQP to avoid vendor lock-in and provide language neutrality. Locate-It Option allows customers to audit the location of their data at all times and satisfy data protection principles. AnypointMQ An enterprise multi-tenant, cloud messaging service that performs advanced asynchronous messaging scenarios between applications. Anypoint MQ is fully integrated with Anypoint Platform, offering role based access control, client application management, and connectors.

    Read more →
  • Vladimir Batagelj

    Vladimir Batagelj

    Vladimir Batagelj (born June 14, 1948 in Idrija, Yugoslavia) is a Slovenian mathematician and an emeritus professor of mathematics at the University of Ljubljana. He is known for his work in discrete mathematics and combinatorial optimization, particularly analysis of social networks and other large networks (blockmodeling). == Education and career == Vladimir Batagelj completed his Ph.D. at the University of Ljubljana in 1986 under the direction of Tomaž Pisanski. He stayed at the University of Ljubljana as a professor until his retirement, where he was a professor of sociology and statistics, while also being a chair of the Department of Sociology of the Faculty of Social Sciences. As visiting professor, he was taught at the University of Pittsburgh (1990-91) and at the University of Konstanz (2002). He was also a member of editorial boards of two journals: Informatica and Journal of Social Structure. His work has been cited over 11000 times. His book Exploratory Social Network Analysis with Pajek on blockmodeling, coauthored with Wouter de Nooy and Andrej Mrvar, is Batagelj's most cited work and has over 3300 citations. The book was translated into Chinese and Japanese. The revised and expanded third edition has been published by Cambridge University Press. In 1975, 11 years before completing his PhD, Batagelj published a solo paper in Communications of the ACM. Batagelj authored more than 20 textbooks in Slovenian, covering topics like TeX, combinatorics and discrete mathematics. He has also written extensively in the Slovenian popular science journal Presek. Batagelj has advised 9 Ph.D. students. == Pajek == Batagelj is particularly known for his work on Pajek, a freely available software for analysis and visualization of large networks. He began work on Pajek in 1996 with Andrej Mrvar, who was then his PhD student. == Awards and honors == First prizes for contributions (with Andrej Mrvar) to Graph Drawing Contests in years: 1995, 1996, 1997, 1998, 1999, 2000 and 2005 / Graph Drawing Hall of Fame. In 2007 the book Generalized blockmodeling was awarded the Harrison White Outstanding Book Award by the Mathematical Sociology Section of American Sociological Association In 2007 he was awarded (together with Anuška Ferligoj) the Simmel Award by INSNA. In 2013, Vladimir Batagelj and Andrej Mrvar received the INSNA's William D. Richards Software award for their work on Pajek. == Selected bibliography == Vladimir Batagelj, Social Network Analysis, Large-Scale [1]. in R.A. Meyers, ed., Encyclopedia of Complexity and Systems Science, Springer 2009: 8245–8265. Vladimir Batagelj, Complex Networks, Visualization of [2]. in R.A. Meyers, ed., Encyclopedia of Complexity and Systems Science, Springer 2009: 1253–1268. Wouter de Nooy, Andrej Mrvar, Vladimir Batagelj, Mark Granovetter (Series Editor), Exploratory Social Network Analysis with Pajek (Structural Analysis in the Social Sciences), Cambridge University Press 2005 (ISBN 0-521-60262-9). ESNA in Japanese, TDU, 2010. Patrick Doreian, Vladimir Batagelj, Anuška Ferligoj, Mark Granovetter (Series Editor), Generalized Blockmodeling (Structural Analysis in the Social Sciences), Cambridge University Press 2004 (ISBN 0-521-84085-6)

    Read more →
  • Prefrontal cortex basal ganglia working memory

    Prefrontal cortex basal ganglia working memory

    Prefrontal cortex basal ganglia working memory (PBWM) is an algorithm that models working memory in the prefrontal cortex and the basal ganglia. It can be compared to long short-term memory (LSTM) in functionality, but is more biologically explainable. It uses the primary value learned value model to train prefrontal cortex working-memory updating system, based on the biology of the prefrontal cortex and basal ganglia. It is used as part of the Leabra framework and was implemented in Emergent in 2019. == Abstract == The prefrontal cortex has long been thought to subserve both working memory (the holding of information online for processing) and "executive" functions (deciding how to manipulate working memory and perform processing). Although many computational models of working memory have been developed, the mechanistic basis of executive function remains elusive. PBWM is a computational model of the prefrontal cortex to control both itself and other brain areas in a strategic, task-appropriate manner. These learning mechanisms are based on subcortical structures in the midbrain, basal ganglia and amygdala, which together form an actor/critic architecture. The critic system learns which prefrontal representations are task-relevant and trains the actor, which in turn provides a dynamic gating mechanism for controlling working memory updating. Computationally, the learning mechanism is designed to simultaneously solve the temporal and structural credit assignment problems. The model's performance compares favorably with standard backpropagation-based temporal learning mechanisms on the challenging 1-2-AX working memory task, and other benchmark working memory tasks. == Model == First, there are multiple separate stripes (groups of units) in the prefrontal cortex and striatum layers. Each stripe can be independently updated, such that this system can remember several different things at the same time, each with a different "updating policy" of when memories are updated and maintained. The active maintenance of the memory is in prefrontal cortex (PFC), and the updating signals (and updating policy more generally) come from the striatum units (a subset of basal ganglia units). PVLV provides reinforcement learning signals to train up the dynamic gating system in the basal ganglia. === Sensory input and motor output === The sensory input is connected to the posterior cortex which is connected to the motor output. The sensory input is also linked to the PVLV system. === Posterior cortex === The posterior cortex form the hidden layers of the input/output mapping. The PFC is connected with the posterior cortex to contextualize this input/output mapping. === PFC === The PFC (for output gating) has a localist one-to-one representation of the input units for every stripe. Thus, you can look at these PFC representations and see directly what the network is maintaining. The PFC maintains the working memory needed to perform the task. === Striatum === This is the dynamic gating system representing the striatum units of the basal ganglia. Every even-index unit within a stripe represents "Go", while the odd-index units represent "NoGo." The Go units cause updating of the PFC, while the NoGo units cause the PFC to maintain its existing memory representation. There are groups of units for every stripe. In the PBWM model in Emergent, the matrices represent the striatum. === PVLV === All of these layers are part of PVLV system. The PVLV system controls the dopaminergic modulation of the basal ganglia (BG). Thus, BG/PVLV form an actor-critic architecture where the PVLV system learns when to update. ==== SNrThal ==== SNrThal represents the substantia nigra pars reticulata (SNr) and the associated area of the thalamus, which produce a competition among the Go/NoGo units within a given stripe and mediates competition using k-winners-take-all dynamics. If there is more overall Go activity in a given stripe, then the associated SNrThal unit gets activated, and it drives updating in PFC. For every stripe, there is one unit in SNrThal. ==== VTA and SNc ==== Ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) are part of the dopamine layer. This layer models midbrain dopamine neurons. They control the dopaminergic modulation of the basal ganglia.

    Read more →
  • Blockmodeling

    Blockmodeling

    Blockmodeling is a set or a coherent framework, that is used for analyzing social structure and also for setting procedure(s) for partitioning (clustering) social network's units (nodes, vertices, actors), based on specific patterns, which form a distinctive structure through interconnectivity. It is primarily used in statistics, machine learning and network science. As an empirical procedure, blockmodeling assumes that all the units in a specific network can be grouped together to such extent to which they are equivalent. Regarding equivalency, it can be structural, regular or generalized. Using blockmodeling, a network can be analyzed using newly created blockmodels, which transforms large and complex network into a smaller and more comprehensible one. At the same time, the blockmodeling is used to operationalize social roles. While some contend that the blockmodeling is just clustering methods, Bonacich and McConaghy state that "it is a theoretically grounded and algebraic approach to the analysis of the structure of relations". Blockmodeling's unique ability lies in the fact that it considers the structure not just as a set of direct relations, but also takes into account all other possible compound relations that are based on the direct ones. The principles of blockmodeling were first introduced by Francois Lorrain and Harrison C. White in 1971. Blockmodeling is considered as "an important set of network analytic tools" as it deals with delineation of role structures (the well-defined places in social structures, also known as positions) and the discerning the fundamental structure of social networks. According to Batagelj, the primary "goal of blockmodeling is to reduce a large, potentially incoherent network to a smaller comprehensible structure that can be interpreted more readily". Blockmodeling was at first used for analysis in sociometry and psychometrics, but has now spread also to other sciences. == Definition == A network as a system is composed of (or defined by) two different sets: one set of units (nodes, vertices, actors) and one set of links between the units. Using both sets, it is possible to create a graph, describing the structure of the network. During blockmodeling, the researcher is faced with two problems: how to partition the units (e.g., how to determine the clusters (or classes), that then form vertices in a blockmodel) and then how to determine the links in the blockmodel (and at the same time the values of these links). In the social sciences, the networks are usually social networks, composed of several individuals (units) and selected social relationships among them (links). Real-world networks can be large and complex; blockmodeling is used to simplify them into smaller structures that can be easier to interpret. Specifically, blockmodeling partitions the units into clusters and then determines the ties among the clusters. At the same time, blockmodeling can be used to explain the social roles existing in the network, as it is assumed that the created cluster of units mimics (or is closely associated with) the units' social roles. Blockmodeling can thus be defined as a set of approaches for partitioning units into clusters (also known as positions) and links into blocks, which are further defined by the newly obtained clusters. A block (also blockmodel) is defined as a submatrix, that shows interconnectivity (links) between nodes, present in the same or different clusters. Each of these positions in the cluster is defined by a set of (in)direct ties to and from other social positions. These links (connections) can be directed or undirected; there can be multiple links between the same pair of objects or they can have weights on them. If there are not any multiple links in a network, it is called a simple network. A matrix representation of a graph is composed of ordered units, in rows and columns, based on their names. The ordered units with similar patterns of links are partitioned together in the same clusters. Clusters are then arranged together so that units from the same clusters are placed next to each other, thus preserving interconnectivity. In the next step, the units (from the same clusters) are transformed into a blockmodel. With this, several blockmodels are usually formed, one being core cluster and others being cohesive; a core cluster is always connected to cohesive ones, while cohesive ones cannot be linked together. Clustering of nodes is based on the equivalence, such as structural and regular. The primary objective of the matrix form is to visually present relations between the persons included in the cluster. These ties are coded dichotomously (as present or absent), and the rows in the matrix form indicate the source of the ties, while the columns represent the destination of the ties. Equivalence can have two basic approaches: the equivalent units have the same connection pattern to the same neighbors or these units have same or similar connection pattern to different neighbors. If the units are connected to the rest of network in identical ways, then they are structurally equivalent. Units can also be regularly equivalent, when they are equivalently connected to equivalent others. With blockmodeling, it is necessary to consider the issue of results being affected by measurement errors in the initial stage of acquiring the data. == Different approaches == Regarding what kind of network is undergoing blockmodeling, a different approach is necessary. Networks can be one–mode or two–mode. In the former all units can be connected to any other unit and where units are of the same type, while in the latter the units are connected only to the unit(s) of a different type. Regarding relationships between units, they can be single–relational or multi–relational networks. Further more, the networks can be temporal or multilevel and also binary (only 0 and 1) or signed (allowing negative ties)/values (other values are possible) networks. Different approaches to blockmodeling can be grouped into two main classes: deterministic blockmodeling and stochastic blockmodeling approaches. Deterministic blockmodeling is then further divided into direct and indirect blockmodeling approaches. Among direct blockmodeling approaches are: structural equivalence and regular equivalence. Structural equivalence is a state, when units are connected to the rest of the network in an identical way(s), while regular equivalence occurs when units are equally related to equivalent others (units are not necessarily sharing neighbors, but have neighbour that are themselves similar). Indirect blockmodeling approaches, where partitioning is dealt with as a traditional cluster analysis problem (measuring (dis)similarity results in a (dis)similarity matrix), are: conventional blockmodeling, generalized blockmodeling: generalized blockmodeling of binary networks, generalized blockmodeling of valued networks and generalized homogeneity blockmodeling, prespecified blockmodeling. According to Brusco and Steinley (2011), the blockmodeling can be categorized (using a number of dimensions): deterministic or stochastic blockmodeling, one–mode or two–mode networks, signed or unsigned networks, exploratory or confirmatory blockmodeling. == Blockmodels == Blockmodels (sometimes also block models) are structures in which: vertices (e.g., units, nodes) are assembled within a cluster, with each cluster identified as a vertex; from such vertices a graph can be constructed; combinations of all the links (ties), represented in a block as a single link between positions, while at the same time constructing one tie for each block. In a case, when there are no ties in a block, there will be no ties between the two positions that define the block. Computer programs can partition the social network according to pre-set conditions. When empirical blocks can be reasonably approximated in terms of ideal blocks, such blockmodels can be reduced to a blockimage, which is a representation of the original network, capturing its underlying 'functional anatomy'. Thus, blockmodels can "permit the data to characterize their own structure", and at the same time not seek to manifest a preconceived structure imposed by the researcher. Blockmodels can be created indirectly or directly, based on the construction of the criterion function. Indirect construction refers to a function, based on "compatible (dis)similarity measure between paris of units", while the direct construction is "a function measuring the fit of real blocks induced by a given clustering to the corresponding ideal blocks with perfect relations within each cluster and between clusters according to the considered types of connections (equivalence)". === Types === Blockmodels can be specified regarding the intuition, substance or the insight into the nature of the studied network; this can result in such models as follows: parent-child role systems, organizational hierarchies, systems of

    Read more →
  • Embodied agent

    Embodied agent

    In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. Agents that are represented graphically with a body, for example a human or a cartoon animal, are also called embodied agents, although they have only virtual, not physical, embodiment. A branch of artificial intelligence focuses on empowering such agents to interact autonomously with human beings and the environment. Mobile robots are one example of physically embodied agents; Ananova and Microsoft Agent are examples of graphically embodied agents. Embodied conversational agents are embodied agents (usually with a graphical front-end as opposed to a robotic body) that are capable of engaging in conversation with one another and with humans employing the same verbal and nonverbal means that humans do (such as gesture, facial expression, and so forth). == Embodied conversational agents == Embodied conversational agents are a form of intelligent user interface. Graphically embodied agents aim to unite gesture, facial expression and speech to enable face-to-face communication with users, providing a powerful means of human-computer interaction. == Advantages == Face-to-face communication allows communication protocols that give a much richer communication channel than other means of communicating. It enables pragmatic communication acts such as conversational turn-taking, facial expression of emotions, information structure and emphasis, visualization and iconic gestures, and orientation in a three-dimensional environment. This communication takes place through both verbal and non-verbal channels such as gaze, gesture, spoken intonation and body posture. Research has found that users prefer a non-verbal visual indication of an embodied system's internal state to a verbal indication, demonstrating the value of additional non-verbal communication channels. As well as this, the face-to-face communication involved in interacting with an embodied agent can be conducted alongside another task without distracting the human participants, instead improving the enjoyment of such an interaction. Furthermore, the use of an embodied presentation agent results in improved recall of the presented information. Embodied agents also provide a social dimension to the interaction. Humans willingly ascribe social awareness to computers, and thus interaction with embodied agents follows social conventions, similar to human to human interactions. This social interaction both raises the believably and perceived trustworthiness of agents, and increases the user's engagement with the system. Rickenberg and Reeves found that the presence of an embodied agent on a website increased the level of user trust in that website, but also increased users' anxiety and affected their performance, as if they were being watched by a real human. Another effect of the social aspect of agents is that presentations given by an embodied agent are perceived as being more entertaining and less difficult than similar presentations given without an agent. Research shows that perceived enjoyment, followed by perceived usefulness and ease of use, is the major factor influencing user adoption of embodied agents. A study in January 2004 by Byron Reeves at Stanford demonstrated how digital characters could "enhance online experiences" through explaining how virtual characters essentially add a sense of familiarity to the user experience and make it more approachable. This increase in likability in turn helps make the products better, which benefits both the end users and those creating the product. === Applications === The rich style of communication that characterizes human conversation makes conversational interaction with embodied conversational agents ideal for many non-traditional interaction tasks. A familiar application of graphically embodied agents is computer games; embodied agents are ideal for this setting because the richer communication style makes interacting with the agent enjoyable. Embodied conversational agents have also been used in virtual training environments, portable personal navigation guides, interactive fiction and storytelling systems, interactive online characters and automated presenters and commentators. Major virtual assistants like Siri, Amazon Alexa and Google Assistant do not come with any visual embodied representation, which is believed to limit the sense of human presence by users. The U.S. Department of Defense utilizes a software agent called SGT STAR on U.S. Army-run Web sites and Web applications for site navigation, recruitment and propaganda purposes. Sgt. Star is run by the Army Marketing and Research Group, a division operated directly from The Pentagon. Sgt. Star is based upon the ActiveSentry technology developed by Next IT, a Washington-based information technology services company. Other such bots in the Sgt. Star "family" are utilized by the Federal Bureau of Investigation and the Central Intelligence Agency for intelligence gathering purposes.

    Read more →
  • Evolutionary programming

    Evolutionary programming

    Evolutionary programming is an evolutionary algorithm, where a share of new population is created by mutation of previous population without crossover. Evolutionary programming differs from evolution strategy ES( μ + λ {\displaystyle \mu +\lambda } ) in one detail. All individuals are selected for the new population, while in ES( μ + λ {\displaystyle \mu +\lambda } ), every individual has the same probability to be selected. It is one of the four major evolutionary algorithm paradigms. == History == It was first used by Lawrence J. Fogel in the US in 1960 in order to use simulated evolution as a learning process aiming to generate artificial intelligence. It was used to evolve finite-state machines as predictors.

    Read more →
  • Fitness approximation

    Fitness approximation

    Fitness approximation aims to approximate the objective or fitness functions in evolutionary optimization by building up machine learning models based on data collected from numerical simulations or physical experiments. The machine learning models for fitness approximation are also known as meta-models or surrogates, and evolutionary optimization based on approximated fitness evaluations are also known as surrogate-assisted evolutionary approximation. Fitness approximation in evolutionary optimization can be seen as a sub-area of data-driven evolutionary optimization. == Approximate models in function optimization == === Motivation === In many real-world optimization problems including engineering problems, the number of fitness function evaluations needed to obtain a good solution dominates the optimization cost. In order to obtain efficient optimization algorithms, it is crucial to use prior information gained during the optimization process. Conceptually, a natural approach to utilizing the known prior information is building a model of the fitness function to assist in the selection of candidate solutions for evaluation. A variety of techniques for constructing such a model, often also referred to as surrogates, metamodels or approximation models – for computationally expensive optimization problems have been considered. === Approaches === Common approaches to constructing approximate models based on learning and interpolation from known fitness values of a small population include: Low-degree polynomials and regression models Fourier surrogate modeling Artificial neural networks including Multilayer perceptrons Radial basis function network Support vector machines Due to the limited number of training samples and high dimensionality encountered in engineering design optimization, constructing a globally valid approximate model remains difficult. As a result, evolutionary algorithms using such approximate fitness functions may converge to local optima. Therefore, it can be beneficial to selectively use the original fitness function together with the approximate model.

    Read more →
  • Induction of regular languages

    Induction of regular languages

    In computational learning theory, induction of regular languages refers to the task of learning a formal description (e.g. grammar) of a regular language from a given set of example strings. Although E. Mark Gold has shown that not every regular language can be learned this way (see language identification in the limit), approaches have been investigated for a variety of subclasses. They are sketched in this article. For learning of more general grammars, see Grammar induction. == Definitions == A regular language is defined as a (finite or infinite) set of strings that can be described by one of the mathematical formalisms called "finite automaton", "regular grammar", or "regular expression", all of which have the same expressive power. Since the latter formalism leads to shortest notations, it shall be introduced and used here. Given a set Σ of symbols (a.k.a. alphabet), a regular expression can be any of ∅ (denoting the empty set of strings), ε (denoting the singleton set containing just the empty string), a (where a is any character in Σ; denoting the singleton set just containing the single-character string a), r + s (where r and s are, in turn, simpler regular expressions; denoting their set's union) r ⋅ s (denoting the set of all possible concatenations of strings from r's and s's set), r + (denoting the set of n-fold repetitions of strings from r's set, for any n ≥ 1), or r (similarly denoting the set of n-fold repetitions, but also including the empty string, seen as 0-fold repetition). For example, using Σ = {0,1}, the regular expression (0+1+ε)⋅(0+1) denotes the set of all binary numbers with one or two digits (leading zero allowed), while 1⋅(0+1)⋅0 denotes the (infinite) set of all even binary numbers (no leading zeroes). Given a set of strings (also called "positive examples"), the task of regular language induction is to come up with a regular expression that denotes a set containing all of them. As an example, given {1, 10, 100}, a "natural" description could be the regular expression 1⋅0, corresponding to the informal characterization "a 1 followed by arbitrarily many (maybe even none) 0's". However, (0+1) and 1+(1⋅0)+(1⋅0⋅0) is another regular expression, denoting the largest (assuming Σ = {0,1}) and the smallest set containing the given strings, and called the trivial overgeneralization and undergeneralization, respectively. Some approaches work in an extended setting where also a set of "negative example" strings is given; then, a regular expression is to be found that generates all of the positive, but none of the negative examples. == Lattice of automata == Dupont et al. have shown that the set of all structurally complete finite automata generating a given input set of example strings forms a lattice, with the trivial undergeneralized and the trivial overgeneralized automaton as bottom and top element, respectively. Each member of this lattice can be obtained by factoring the undergeneralized automaton by an appropriate equivalence relation. For the above example string set {1, 10, 100}, the picture shows at its bottom the undergeneralized automaton Aa,b,c,d in grey, consisting of states a, b, c, and d. On the state set {a,b,c,d}, a total of 15 equivalence relations exist, forming a lattice. Mapping each equivalence E to the corresponding quotient automaton language L(Aa,b,c,d / E) obtains the partially ordered set shown in the picture. Each node's language is denoted by a regular expression. The language may be recognized by quotient automata w.r.t. different equivalence relations, all of which are shown below the node. An arrow between two nodes indicates that the lower node's language is a proper subset of the higher node's. If both positive and negative example strings are given, Dupont et al. build the lattice from the positive examples, and then investigate the separation border between automata that generate some negative example and such that do not. Most interesting are those automata immediately below the border. In the picture, separation borders are shown for the negative example strings 11 (green), 1001 (blue), 101 (cyan), and 0 (red). Coste and Nicolas present an own search method within the lattice, which they relate to Mitchell's version space paradigm. To find the separation border, they use a graph coloring algorithm on the state inequality relation induced by the negative examples. Later, they investigate several ordering relations on the set of all possible state fusions. Kudo and Shimbo use the representation by automaton factorizations to give a unique framework for the following approaches (sketched below): k-reversible languages and the "tail clustering" follow-up approach, Successor automata and the predecessor-successor method, and pumping-based approaches (framework-integration challenged by Luzeaux, however). Each of these approaches is shown to correspond to a particular kind of equivalence relations used for factorization. == Approaches == === k-reversible languages === Angluin considers so-called "k-reversible" regular automata, that is, deterministic automata in which each state can be reached from at most one state by following a transition chain of length k. Formally, if Σ, Q, and δ denote the input alphabet, the state set, and the transition function of an automaton A, respectively, then A is called k-reversible if: ∀a0, ..., ak ∈ Σ ∀s1, s2 ∈ Q: δ(s1, a0...ak) = δ(s2, a0...ak) ⇒ s1 = s2, where δ means the homomorphic extension of δ to arbitrary words. Angluin gives a cubic algorithm for learning of the smallest k-reversible language from a given set of input words; for k = 0, the algorithm has even almost linear complexity. The required state uniqueness after k + 1 given symbols forces unifying automaton states, thus leading to a proper generalization different from the trivial undergeneralized automaton. This algorithm has been used to learn simple parts of English syntax; later, an incremental version has been provided. Another approach based on k-reversible automata is the tail clustering method. === Successor automata === From a given set of input strings, Vernadat and Richetin build a so-called successor automaton, consisting of one state for each distinct character and a transition between each two adjacent characters' states. For example, the singleton input set {aabbaabb} leads to an automaton corresponding to the regular expression (a+⋅b+). An extension of this approach is the predecessor-successor method which generalizes each character repetition immediately to a Kleene + and then includes for each character the set of its possible predecessors in its state. Successor automata can learn exactly the class of local languages. Since each regular language is the homomorphic image of a local language, grammars from the former class can be learned by lifting, if an appropriate (depending on the intended application) homomorphism is provided. In particular, there is such a homomorphism for the class of languages learnable by the predecessor-successor method. The learnability of local languages can be reduced to that of k-reversible languages. === Early approaches === Chomsky and Miller (1957) used the pumping lemma: they guess a part v of an input string uvw and try to build a corresponding cycle into the automaton to be learned; using membership queries they ask, for appropriate k, which of the strings uw, uvvw, uvvvw, ..., uvkw also belongs to the language to be learned, thereby refining the structure of their automaton. In 1959, Solomonoff generalized this approach to context-free languages, which also obey a pumping lemma. === Cover automata === Câmpeanu et al. learn a finite automaton as a compact representation of a large finite language. Given such a language F, they search a so-called cover automaton A such that its language L(A) covers F in the following sense: L(A) ∩ Σ≤ l = F, where l is the length of the longest string in F, and Σ≤ l denotes the set of all strings not longer than l. If such a cover automaton exists, F is uniquely determined by A and l. For example, F = {ad, read, reread } has l = 6 and a cover automaton corresponding to the regular expression (r⋅e)⋅a⋅d. For two strings x and y, Câmpeanu et al. define x ~ y if xz ∈ F ⇔ yz ∈ F for all strings z of a length such that both xz and yz are not longer than l. Based on this relation, whose lack of transitivity causes considerable technical problems, they give an O(n4) algorithm to construct from F a cover automaton A of minimal state count. Moreover, for union, intersection, and difference of two finite languages they provide corresponding operations on their cover automata. Păun et al. improve the time complexity to O(n2). === Residual automata === For a set S of strings and a string u, the Brzozowski derivative u−1S is defined as the set of all rest-strings obtainable from a string in S by cutting off its prefix u (if possible), formally: u−1S = {v ∈ Σ: uv ∈ S}, cf. picture. Denis et al. define a

    Read more →
  • NumPy

    NumPy

    NumPy (pronounced NUM-py) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is fiscally sponsored by NumFOCUS. == History == === matrix-sig === The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier. === Numeric === An implementation of a matrix package was completed by Jim Fulton, then expanded to support multi-dimensional arrays by Jim Hugunin and called Numeric (also variously known as the "Numerical Python extensions" or "NumPy"), with influences from the APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others. Hugunin, a graduate student at the Massachusetts Institute of Technology (MIT), joined the Corporation for National Research Initiatives (CNRI) in 1997 to work on JPython, leaving Paul Dubois of Lawrence Livermore National Laboratory (LLNL) to take over as maintainer. Other early contributors include David Ascher, Konrad Hinsen and Travis Oliphant. === Numarray === A new package called Numarray was written as a more flexible replacement for Numeric. Like Numeric, it too is now deprecated. Numarray had faster operations for large arrays, but was slower than Numeric on small ones, so for a time both packages were used in parallel for different use cases. The last version of Numeric (v24.2) was released on 11 November 2005, while the last version of numarray (v1.5.2) was released on 24 August 2006. There was a desire to get Numeric into the Python standard library, but Guido van Rossum decided that the code was not maintainable in its state then. === NumPy === In early 2005, NumPy developer Travis Oliphant wanted to unify the community around a single array package and ported Numarray's features to Numeric, releasing the result as NumPy 1.0 in 2006. This new project was part of SciPy. To avoid installing the large SciPy package just to get an array object, this new package was separated and called NumPy. Support for Python 3 was added in 2011 with NumPy version 1.5.0. In 2011, PyPy started development on an implementation of the NumPy API for PyPy. As of 2023, it is not yet fully compatible with NumPy. == Features == NumPy targets the CPython reference implementation of Python, which is a non-optimizing bytecode interpreter. Mathematical algorithms written for this version of Python often run much slower than compiled equivalents due to the absence of compiler optimization. NumPy addresses the slowness problem partly by providing multidimensional arrays and functions and operators that operate efficiently on arrays; using these requires rewriting some code, mostly inner loops, using NumPy. Using NumPy in Python gives functionality comparable to MATLAB since they are both interpreted, and they both allow the user to write fast programs as long as most operations work on arrays or matrices instead of scalars. In comparison, MATLAB boasts a large number of additional toolboxes, notably Simulink, whereas NumPy is intrinsically integrated with Python, a more modern and complete programming language. Moreover, complementary Python packages are available; SciPy is a library that adds more MATLAB-like functionality and Matplotlib is a plotting package that provides MATLAB-like plotting functionality. Although MATLAB can perform sparse matrix operations, NumPy alone cannot perform such operations and requires the use of the scipy.sparse library. Internally, both MATLAB and NumPy rely on BLAS and LAPACK for efficient linear algebra computations. Python bindings of the widely used computer vision library OpenCV utilize NumPy arrays to store and operate on data. Since images with multiple channels are simply represented as three-dimensional arrays, indexing, slicing or masking with other arrays are very efficient ways to access specific pixels of an image. The NumPy array as universal data structure in OpenCV for images, extracted feature points, filter kernels and many more vastly simplifies the programming workflow and debugging. Importantly, many NumPy operations release the global interpreter lock, which allows for multithreaded processing. NumPy also provides a C API, which allows Python code to interoperate with external libraries written in low-level languages. === The ndarray data structure === The core functionality of NumPy is its "ndarray", for n-dimensional array, data structure. These arrays are strided views on memory. In contrast to Python's built-in list data structure, these arrays are homogeneously typed: all elements of a single array must be of the same type. Such arrays can also be views into memory buffers allocated by C/C++, Python, and Fortran extensions to the CPython interpreter without the need to copy data around, giving a degree of compatibility with existing numerical libraries. This functionality is exploited by the SciPy package, which wraps a number of such libraries (notably BLAS and LAPACK). NumPy has built-in support for memory-mapped ndarrays. === Limitations === Inserting or appending entries to an array is not as trivially possible as it is with Python's lists. The np.pad(...) routine to extend arrays actually creates new arrays of the desired shape and padding values, copies the given array into the new one and returns it. NumPy's np.concatenate([a1,a2]) operation does not actually link the two arrays but returns a new one, filled with the entries from both given arrays in sequence. Reshaping the dimensionality of an array with np.reshape(...) is only possible as long as the number of elements in the array does not change. These circumstances originate from the fact that NumPy's arrays must be views on contiguous memory buffers. Algorithms that are not expressible as a vectorized operation will typically run slowly because they must be implemented in "pure Python", while vectorization may increase memory complexity of some operations from constant to linear, because temporary arrays must be created that are as large as the inputs. Runtime compilation of numerical code has been implemented by several groups to avoid these problems; open source solutions that interoperate with NumPy include numexpr and Numba. Cython and Pythran are static-compiling alternatives to these. Many modern large-scale scientific computing applications have requirements that exceed the capabilities of the NumPy arrays. For example, NumPy arrays are usually loaded into a computer's memory, which might have insufficient capacity for the analysis of large datasets. Further, NumPy operations are executed on a single CPU. However, many linear algebra operations can be accelerated by executing them on clusters of CPUs or of specialized hardware, such as GPUs and TPUs, which many deep learning applications rely on. As a result, several alternative array implementations have arisen in the scientific python ecosystem over the recent years, such as Dask for distributed arrays and TensorFlow or JAX for computations on GPUs. Because of its popularity, these often implement a subset of NumPy's API or mimic it, so that users can change their array implementation with minimal changes to their code required. A library named CuPy, accelerated by Nvidia's CUDA framework, has also shown potential for faster computing, being a 'drop-in replacement' of NumPy. == Examples == NumPy is conventionally imported as np. === Basic operations === === Universal functions === === Linear algebra === === Multidimensional arrays === === Incorporation with OpenCV === === Nearest-neighbor search === Functional Python and vectorized NumPy version. === F2PY === Quickly wrap native code for faster scripts.

    Read more →
  • List of text mining software

    List of text mining software

    Text mining computer programs are available from many commercial and open source companies and sources. == Commercial == Angoss – Angoss Text Analytics provides entity and theme extraction, topic categorization, sentiment analysis and document summarization capabilities via the embedded AUTINDEX – is a commercial text mining software package based on sophisticated linguistics by IAI (Institute for Applied Information Sciences), Saarbrücken. DigitalMR – social media listening & text+image analytics tool for market research. FICO Score – leading provider of analytics. General Sentiment – Social Intelligence platform that uses natural language processing to discover affinities between the fans of brands with the fans of traditional television shows in social media. Stand alone text analytics to capture social knowledge base on billions of topics stored to 2004. IBM LanguageWare – the IBM suite for text analytics (tools and Runtime). IBM SPSS – provider of Modeler Premium (previously called IBM SPSS Modeler and IBM SPSS Text Analytics), which contains advanced NLP-based text analysis capabilities (multi-lingual sentiment, event and fact extraction), that can be used in conjunction with Predictive Modeling. Text Analytics for Surveys provides the ability to categorize survey responses using NLP-based capabilities for further analysis or reporting. Inxight – provider of text analytics, search, and unstructured visualization technologies. (Inxight was bought by Business Objects that was bought by SAP AG in 2008). Language Computer Corporation – text extraction and analysis tools, available in multiple languages. Lexalytics – provider of a text analytics engine used in Social Media Monitoring, Voice of Customer, Survey Analysis, and other applications. Salience Engine. The software provides the unique capability of merging the output of unstructured, text-based analysis with structured data to provide additional predictive variables for improved predictive models and association analysis. Linguamatics – provider of natural language processing (NLP) based enterprise text mining and text analytics software, I2E, for high-value knowledge discovery and decision support. Mathematica – provides built in tools for text alignment, pattern matching, clustering and semantic analysis. See Wolfram Language, the programming language of Mathematica. MATLAB offers Text Analytics Toolbox for importing text data, converting it to numeric form for use in machine and deep learning, sentiment analysis and classification tasks. Medallia – offers one system of record for survey, social, text, written and online feedback. NetMiner – software for network analysis and text mining. Supports social media and bibliographic data collection, NLP for english and chinese, sentiment analysis, work co-occurrence network(text network analysis) and visualization. NetOwl – suite of multilingual text and entity analytics products, including entity extraction, link and event extraction, sentiment analysis, geotagging, name translation, name matching, and identity resolution, among others. PolyAnalyst - text analytics environment. PoolParty Semantic Suite - graph-based text mining platform. RapidMiner with its Text Processing Extension – data and text mining software. SAS – SAS Text Miner and Teragram; commercial text analytics, natural language processing, and taxonomy software used for Information Management. Sketch Engine – a corpus manager and analysis software which providing creating text corpora from uploaded texts or the Web including part-of-speech tagging and lemmatization or detecting a particular website. Sysomos – provider social media analytics software platform, including text analytics and sentiment analysis on online consumer conversations. WordStat – Content analysis and text mining add-on module of QDA Miner for analyzing large amounts of text data. == Open source == Carrot2 – text and search results clustering framework. GATE – general Architecture for Text Engineering, an open-source toolbox for natural language processing and language engineering. Gensim – large-scale topic modelling and extraction of semantic information from unstructured text (Python). KH Coder – for Quantitative Content Analysis or Text Mining The KNIME Text Processing extension. Natural Language Toolkit (NLTK) – a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. OpenNLP – natural language processing. Orange with its text mining add-on. The PLOS Text Mining Collection. The programming language R provides a framework for text mining applications in the package tm. The Natural Language Processing task view contains tm and other text mining library packages. spaCy – open-source Natural Language Processing library for Python Stanbol – an open source text mining engine targeted at semantic content management. Voyant Tools – a web-based text analysis environment, created as a scholarly project.

    Read more →
  • Word2vec

    Word2vec

    Word2vec is a technique in natural language processing for obtaining vector representations of words. These vectors capture information about the meaning of the word based on the surrounding words. The word2vec algorithm estimates these representations by modeling text in a large corpus. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. Word2vec was developed by Tomáš Mikolov, Kai Chen, Greg Corrado, Ilya Sutskever and Jeff Dean at Google, and published in 2013. Word2vec represents a word as a high-dimension vector of numbers which capture relationships between words. In particular, words which appear in similar contexts are mapped to vectors which are nearby as measured by cosine similarity. This indicates the level of semantic similarity between the words, so for example the vectors for walk and ran are nearby, as are those for "but" and "however", and "Berlin" and "Germany". == Approach == Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a mapping of the set of words to a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a vector in the space. Word2vec can use either of two model architectures to produce these distributed representations of words: continuous bag of words (CBOW) or continuously sliding skip-gram. In both architectures, word2vec considers both individual words and a sliding context window as it iterates over the corpus. The CBOW can be viewed as a 'fill in the blank' task, where the word embedding represents the way the word influences the relative probabilities of other words in the context window. Words which are semantically similar should influence these probabilities in similar ways, because semantically similar words should be used in similar contexts. The order of context words does not influence prediction (bag of words assumption). In the continuous skip-gram architecture, the model uses the current word to predict the surrounding window of context words. The skip-gram architecture weighs nearby context words more heavily than more distant context words. According to the authors' note, CBOW is faster while skip-gram does a better job for infrequent words. After the model is trained, the learned word embeddings are positioned in the vector space such that words that share common contexts in the corpus — that is, words that are semantically and syntactically similar — are located close to one another in the space. More dissimilar words are located farther from one another in the space. == Mathematical details == This section is based on expositions. A corpus is a sequence of words. Both CBOW and skip-gram are methods to learn one vector per word appearing in the corpus. Let V {\displaystyle V} ("vocabulary") be the set of all words appearing in the corpus C {\displaystyle C} . Our goal is to learn one vector v w ∈ R d {\displaystyle v_{w}\in \mathbb {R} ^{d}} for each word w ∈ V {\displaystyle w\in V} . The idea of skip-gram is that the vector of a word should be close to the vector of each of its neighbors. The idea of CBOW is that the vector-sum of a word's neighbors should be close to the vector of the word. === Continuous bag-of-words (CBOW) === The idea of CBOW is to represent each word with a vector, such that it is possible to predict a word using the sum of the vectors of its neighbors. Specifically, for each word w i {\displaystyle w_{i}} in the corpus, the one-hot encoding of the word is used as the input to the neural network. The output of the neural network is a probability distribution over the dictionary, representing a prediction of individual words in the neighborhood of w i {\displaystyle w_{i}} . The objective of training is to maximize ∑ i ln ⁡ Pr ( w i ∣ w i + j : j ∈ N ) {\displaystyle \sum _{i}\ln \Pr(w_{i}\mid w_{i+j}\colon j\in N)} where N {\displaystyle N} is a set of (non-zero) indices representing the relative locations of nearby words considered to be in w i {\displaystyle w_{i}} 's neighborhood. For example, if we want each word in the corpus to be predicted by every other word in a small span of 4 words. The set of relative indexes of neighbor words will be: N = { − 2 , − 1 , + 1 , + 2 } {\displaystyle N=\{-2,-1,+1,+2\}} , and the objective is to maximize ∑ i ln ⁡ Pr ( w i ∣ w i − 2 , w i − 1 , w i + 1 , w i + 2 ) {\displaystyle \sum _{i}\ln \Pr(w_{i}\mid w_{i-2},w_{i-1},w_{i+1},w_{i+2})} . In standard bag-of-words, a word's context is represented by a word-count (aka a word histogram) of its neighboring words. For example, the "sat" in "the cat sat on the mat" is represented as {"the": 2, "cat": 1, "on": 1}. Note that the last word "mat" is not used to represent "sat", because it is outside the neighborhood N = { − 2 , − 1 , + 1 , + 2 } {\displaystyle N=\{-2,-1,+1,+2\}} . In continuous bag-of-words, the histogram is multiplied by a matrix V {\displaystyle V} to obtain a continuous representation of the word's context. The matrix V {\displaystyle V} is also called a dictionary. Its columns are the word vectors. It has D {\displaystyle D} columns, where D {\displaystyle D} is the size of the dictionary. Let d {\displaystyle d} be the length of each word vector. We have V ∈ R d × D {\displaystyle V\in \mathbb {R} ^{d\times D}} . For example, multiplying the word histogram {"the": 2, "cat": 1, "on": 1} with V {\displaystyle V} , we obtain 2 v the + v cat + v on {\displaystyle 2v_{\text{the}}+v_{\text{cat}}+v_{\text{on}}} . This is then multiplied with another matrix V ′ {\displaystyle V'} of shape R D × d {\displaystyle \mathbb {R} ^{D\times d}} . Each row of it is a word vector v ′ {\displaystyle v'} . This results in a vector of length D {\displaystyle D} , one entry per dictionary entry. Then, apply the softmax to obtain a probability distribution over the dictionary. This system can be visualized as a neural network, similar in spirit to an autoencoder, of architecture linear-linear-softmax, as depicted in the diagram. The system is trained by gradient descent to minimize the cross-entropy loss. In full formula, the cross-entropy loss is: − ∑ i ln ⁡ e v w i ′ ⋅ ( ∑ j ∈ N v w j + i ) ∑ w ′ e v w ′ ′ ⋅ ( ∑ j ∈ N v w j + i ) {\displaystyle -\sum _{i}\ln {\frac {e^{v_{w_{i}}'\cdot (\sum _{j\in N}v_{w_{j+i}})}}{\sum _{w'}e^{v_{w'}'\cdot (\sum _{j\in N}v_{w_{j+i}})}}}} where the outer summation ∑ i {\displaystyle \sum _{i}} is over the words in a corpus, the quantity ∑ j ∈ N v w j + i {\displaystyle \sum _{j\in N}v_{w_{j+i}}} is the sum of a word's neighbors' vectors, etc. Once such a system is trained, we have two trained matrices V , V ′ {\displaystyle V,V'} . Either the column vectors of V {\displaystyle V} or the row vectors of V ′ {\displaystyle V'} can serve as the dictionary. For example, the word "sat" can be represented as either the "sat"-th column of V {\displaystyle V} or the "sat"-th row of V ′ {\displaystyle V'} . It is also possible to simply define V ′ = V ⊤ {\displaystyle V'=V^{\top }} , in which case there would no longer be a choice. === Skip-gram === The idea of skip-gram is to represent each word with a vector, such that it is possible to predict the vectors of its neighbors using the vector of a word. The architecture is still linear-linear-softmax, the same as CBOW, but the input and the output are switched. Specifically, for each word w i {\displaystyle w_{i}} in the corpus, the one-hot encoding of the word is used as the input to the neural network. The output of the neural network is a probability distribution over the dictionary, representing a prediction of individual words in the neighborhood of w i {\displaystyle w_{i}} . The objective of training is to maximize ∑ i ∑ j ∈ N ln ⁡ Pr ( w j + i ∣ w i ) {\displaystyle \sum _{i}\sum _{j\in N}\ln \Pr(w_{j+i}\mid w_{i})} . In full formula, the loss function is − ∑ i ∑ j ∈ N ln ⁡ e v w j + i ′ ⋅ v w i ∑ w ′ e v w ′ ′ ⋅ v w i {\displaystyle -\sum _{i}\sum _{j\in N}\ln {\frac {e^{v_{w_{j+i}}'\cdot v_{w_{i}}}}{\sum _{w'}e^{v_{w'}'\cdot v_{w_{i}}}}}} Same as CBOW, once such a system is trained, we have two trained matrices V , V ′ {\displaystyle V,V'} . Either the column vectors of V {\displaystyle V} or the row vectors of V ′ {\displaystyle V'} can serve as the dictionary. It is also possible to simply define V ′ = V ⊤ {\displaystyle V'=V^{\top }} , in which case there would no longer be a choice. Essentially, skip-gram and CBOW are exactly the same in architecture. They only differ in the objective function during training. == History == During the 1980s, there were some early attempts at using neural networks to represent words and concepts as vectors. In 2010, Tomáš Mikolov (then at Brno University of Technology) with co-authors applied a simple recurrent neural network with a single hidden

    Read more →
  • Occam learning

    Occam learning

    In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set. Occam learnability implies PAC learning, and for a wide variety of concept classes, the converse is also true: PAC learnability implies Occam learnability. == Introduction == Occam Learning is named after Occam's razor, which is a principle stating that, given all other things being equal, a shorter explanation for observed data should be favored over a lengthier explanation. The theory of Occam learning is a formal and mathematical justification for this principle. It was first shown by Blumer, et al. that Occam learning implies PAC learning, which is the standard model of learning in computational learning theory. In other words, parsimony (of the output hypothesis) implies predictive power. == Definition of Occam learning == The succinctness of a concept c {\displaystyle c} in concept class C {\displaystyle {\mathcal {C}}} can be expressed by the length s i z e ( c ) {\displaystyle size(c)} of the shortest bit string that can represent c {\displaystyle c} in C {\displaystyle {\mathcal {C}}} . Occam learning connects the succinctness of a learning algorithm's output to its predictive power on unseen data. Let C {\displaystyle {\mathcal {C}}} and H {\displaystyle {\mathcal {H}}} be concept classes containing target concepts and hypotheses respectively. Then, for constants α ≥ 0 {\displaystyle \alpha \geq 0} and 0 ≤ β < 1 {\displaystyle 0\leq \beta <1} , a learning algorithm L {\displaystyle L} is an ( α , β ) {\displaystyle (\alpha ,\beta )} -Occam algorithm for C {\displaystyle {\mathcal {C}}} using H {\displaystyle {\mathcal {H}}} iff, given a set S = { x 1 , … , x m } {\displaystyle S=\{x_{1},\dots ,x_{m}\}} of m {\displaystyle m} samples labeled according to a concept c ∈ C {\displaystyle c\in {\mathcal {C}}} , L {\displaystyle L} outputs a hypothesis h ∈ H {\displaystyle h\in {\mathcal {H}}} such that h {\displaystyle h} is consistent with c {\displaystyle c} on S {\displaystyle S} (that is, h ( x ) = c ( x ) , ∀ x ∈ S {\displaystyle h(x)=c(x),\forall x\in S} ), and s i z e ( h ) ≤ ( n ⋅ s i z e ( c ) ) α m β {\displaystyle size(h)\leq (n\cdot size(c))^{\alpha }m^{\beta }} where n {\displaystyle n} is the maximum length of any sample x ∈ S {\displaystyle x\in S} . An Occam algorithm is called efficient if it runs in time polynomial in n {\displaystyle n} , m {\displaystyle m} , and s i z e ( c ) . {\displaystyle size(c).} We say a concept class C {\displaystyle {\mathcal {C}}} is Occam learnable with respect to a hypothesis class H {\displaystyle {\mathcal {H}}} if there exists an efficient Occam algorithm for C {\displaystyle {\mathcal {C}}} using H . {\displaystyle {\mathcal {H}}.} == The relation between Occam and PAC learning == Occam learnability implies PAC learnability, as the following theorem of Blumer, et al. shows: === Theorem (Occam learning implies PAC learning) === Let L {\displaystyle L} be an efficient ( α , β ) {\displaystyle (\alpha ,\beta )} -Occam algorithm for C {\displaystyle {\mathcal {C}}} using H {\displaystyle {\mathcal {H}}} . Then there exists a constant a > 0 {\displaystyle a>0} such that for any 0 < ϵ , δ < 1 {\displaystyle 0<\epsilon ,\delta <1} , for any distribution D {\displaystyle {\mathcal {D}}} , given m ≥ a ( 1 ϵ log ⁡ 1 δ + ( ( n ⋅ s i z e ( c ) ) α ϵ ) 1 1 − β ) {\displaystyle m\geq a\left({\frac {1}{\epsilon }}\log {\frac {1}{\delta }}+\left({\frac {(n\cdot size(c))^{\alpha }}{\epsilon }}\right)^{\frac {1}{1-\beta }}\right)} samples drawn from D {\displaystyle {\mathcal {D}}} and labelled according to a concept c ∈ C {\displaystyle c\in {\mathcal {C}}} of length n {\displaystyle n} bits each, the algorithm L {\displaystyle L} will output a hypothesis h ∈ H {\displaystyle h\in {\mathcal {H}}} such that e r r o r ( h ) ≤ ϵ {\displaystyle error(h)\leq \epsilon } with probability at least 1 − δ {\displaystyle 1-\delta } .Here, e r r o r ( h ) {\displaystyle error(h)} is with respect to the concept c {\displaystyle c} and distribution D {\displaystyle {\mathcal {D}}} . This implies that the algorithm L {\displaystyle L} is also a PAC learner for the concept class C {\displaystyle {\mathcal {C}}} using hypothesis class H {\displaystyle {\mathcal {H}}} . A slightly more general formulation is as follows: === Theorem (Occam learning implies PAC learning, cardinality version) === Let 0 < ϵ , δ < 1 {\displaystyle 0<\epsilon ,\delta <1} . Let L {\displaystyle L} be an algorithm such that, given m {\displaystyle m} samples drawn from a fixed but unknown distribution D {\displaystyle {\mathcal {D}}} and labeled according to a concept c ∈ C {\displaystyle c\in {\mathcal {C}}} of length n {\displaystyle n} bits each, outputs a hypothesis h ∈ H n , m {\displaystyle h\in {\mathcal {H}}_{n,m}} that is consistent with the labeled samples. Then, there exists a constant b {\displaystyle b} such that if log ⁡ | H n , m | ≤ b ϵ m − log ⁡ 1 δ {\displaystyle \log |{\mathcal {H}}_{n,m}|\leq b\epsilon m-\log {\frac {1}{\delta }}} , then L {\displaystyle L} is guaranteed to output a hypothesis h ∈ H n , m {\displaystyle h\in {\mathcal {H}}_{n,m}} such that e r r o r ( h ) ≤ ϵ {\displaystyle error(h)\leq \epsilon } with probability at least 1 − δ {\displaystyle 1-\delta } . While the above theorems show that Occam learning is sufficient for PAC learning, it doesn't say anything about necessity. Board and Pitt show that, for a wide variety of concept classes, Occam learning is in fact necessary for PAC learning. They proved that for any concept class that is polynomially closed under exception lists, PAC learnability implies the existence of an Occam algorithm for that concept class. Concept classes that are polynomially closed under exception lists include Boolean formulas, circuits, deterministic finite automata, decision-lists, decision-trees, and other geometrically defined concept classes. A concept class C {\displaystyle {\mathcal {C}}} is polynomially closed under exception lists if there exists a polynomial-time algorithm A {\displaystyle A} such that, when given the representation of a concept c ∈ C {\displaystyle c\in {\mathcal {C}}} and a finite list E {\displaystyle E} of exceptions, outputs a representation of a concept c ′ ∈ C {\displaystyle c'\in {\mathcal {C}}} such that the concepts c {\displaystyle c} and c ′ {\displaystyle c'} agree except on the set E {\displaystyle E} . == Proof that Occam learning implies PAC learning == We first prove the Cardinality version. Call a hypothesis h ∈ H {\displaystyle h\in {\mathcal {H}}} bad if e r r o r ( h ) ≥ ϵ {\displaystyle error(h)\geq \epsilon } , where again e r r o r ( h ) {\displaystyle error(h)} is with respect to the true concept c {\displaystyle c} and the underlying distribution D {\displaystyle {\mathcal {D}}} . The probability that a set of samples S {\displaystyle S} is consistent with h {\displaystyle h} is at most ( 1 − ϵ ) m {\displaystyle (1-\epsilon )^{m}} , by the independence of the samples. By the union bound, the probability that there exists a bad hypothesis in H n , m {\displaystyle {\mathcal {H}}_{n,m}} is at most | H n , m | ( 1 − ϵ ) m {\displaystyle |{\mathcal {H}}_{n,m}|(1-\epsilon )^{m}} , which is less than δ {\displaystyle \delta } if log ⁡ | H n , m | ≤ O ( ϵ m ) − log ⁡ 1 δ {\displaystyle \log |{\mathcal {H}}_{n,m}|\leq O(\epsilon m)-\log {\frac {1}{\delta }}} . This concludes the proof of the second theorem above. Using the second theorem, we can prove the first theorem. Since we have a ( α , β ) {\displaystyle (\alpha ,\beta )} -Occam algorithm, this means that any hypothesis output by L {\displaystyle L} can be represented by at most ( n ⋅ s i z e ( c ) ) α m β {\displaystyle (n\cdot size(c))^{\alpha }m^{\beta }} bits, and thus log ⁡ | H n , m | ≤ ( n ⋅ s i z e ( c ) ) α m β {\displaystyle \log |{\mathcal {H}}_{n,m}|\leq (n\cdot size(c))^{\alpha }m^{\beta }} . This is less than O ( ϵ m ) − log ⁡ 1 δ {\displaystyle O(\epsilon m)-\log {\frac {1}{\delta }}} if we set m ≥ a ( 1 ϵ log ⁡ 1 δ + ( ( n ⋅ s i z e ( c ) ) α ) ϵ ) 1 1 − β ) {\displaystyle m\geq a\left({\frac {1}{\epsilon }}\log {\frac {1}{\delta }}+\left({\frac {(n\cdot size(c))^{\alpha })}{\epsilon }}\right)^{\frac {1}{1-\beta }}\right)} for some constant a > 0 {\displaystyle a>0} . Thus, by the Cardinality version Theorem, L {\displaystyle L} will output a consistent hypothesis h {\displaystyle h} with probability at least 1 − δ {\displaystyle 1-\delta } . This concludes the proof of the first theorem above. == Improving sample complexity for common problems == Though Occam and PAC learnability are equivalent, the Occam framework can be used to produce tighter bounds on the sample complexity of classical problems including conjunctions, co

    Read more →
  • Cloud Native Computing Foundation

    Cloud Native Computing Foundation

    The Cloud Native Computing Foundation (CNCF) is a subsidiary of the Linux Foundation founded in 2015 to support cloud-native computing. == History == It was announced alongside Kubernetes 1.0, an open source container cluster manager, which was contributed to the Linux Foundation by Google as a seed technology. Founding members include Google, CoreOS, Mesosphere, Red Hat, Twitter, Huawei, Intel, RX-M, Cisco, IBM, Docker, Univa, and VMware. Today, CNCF is supported by over 450 members. In August 2018 Google announced that it was handing over operational control of Kubernetes to the community. == Projects == Argo is a collection of tools for getting work done with Kubernetes. Among its main features are Workflows and Events. It was accepted to CNCF on March 26, 2020 at the Incubating maturity level and then moved to the Graduated maturity level on December 6, 2022. cert-manager provisions and manages TLS certificates in Kubernetes. It was accepted to CNCF on November 10, 2020, moved to the Incubating maturity level on September 19, 2022, and then moved to the Graduated maturity level on September 29, 2024. Cilium provides networking, security, and observability for Kubernetes deployments using eBPF technology. It joined the CNCF at incubation level in October 2021 and the CNCF announced its graduation in October 2023. containerd is an industry-standard core container runtime. It is currently available as a daemon for Linux and Windows, which can manage the complete container lifecycle of its host system. In 2015, Docker donated the OCI Specification to The Linux Foundation with a reference implementation called runc. Since February 28, 2019 it is an official CNCF project. Its general availability and intention to donate the project to CNCF was announced by Docker in 2017. CoreDNS is a DNS server that chains plugins. Its graduation was announced in 2019. Dapr, the distributed application runtime, provides APIs for building secure and reliable microservices and agentic AI systems. Dapr was donated to the CNCF in November 2021 and joined at incubation level. The CNCF announced its graduation in November 2024. Envoy: Originally built at Lyft to move their architecture away from a monolith, Envoy is a high-performance open source edge and service proxy that makes the network transparent to applications. Lyft contributed Envoy to Cloud Native Computing Foundation in September 2017. etcd is a distributed key value store, providing a method of storing data across a cluster of machines. It became a CNCF incubating project in 2018 at KubeCon+CloudNativeCon North America in Seattle that year. Falco is an open source and cloud native runtime security initiative. It is the "de facto Kubernetes threat detection engine". It became an incubating project in January 2020 and graduated in February 2024. Flux is an open source project for powering GitOps in Kubernetes clusters. It provides the GitOps Toolkit, a set of Kubernetes APIs that allow you to define how configuration source code is securely pulled into your cluster and deployed by popular Kubernetes manifests rendering engines like Kustomize and Helm. The most recommended source mechanism is the OCIRepository API, which provides enhanced security and benefits from container image tooling out there. Flux has also notification integrations with popular services like Prometheus Alertmanager, PagerDuty, Slack and so on. Flux has graduated in CNCF in 2022. Harbor is an "open source trusted cloud native registry project that stores, signs, and scans content." It became an incubating project in September 2019 and graduated in June 2020. Helm is a package manager that helps developers "easily manage and deploy applications onto the Kubernetes cluster." It joined the incubating level in June 2018 and graduated in April 2020. Istio is a service mesh technology. It was accepted by CNCF in September 2022 and graduated on July 12, 2023. Jaeger, Created by Uber Engineering, Jaeger is an open source distributed tracing system inspired by Google Dapper paper and OpenZipkin community. It can be used for tracing microservice-based architectures, including distributed context propagation, distributed transaction monitoring, root cause analysis, service dependency analysis, and performance/latency optimization. The Cloud Native Computing Foundation Technical Oversight Committee voted to accept Jaeger as the 12th hosted project in September 2017 and became a graduated project in 2019. In 2020 it became an approved and fully integrated part of the CNCF ecosystem. Kubernetes is an open source framework for automating deployment and managing applications in a containerized and clustered environment. "It aims to provide better ways of managing related, distributed components across the varied infrastructure." It was originally designed by Google and donated to The Linux Foundation to form the Cloud Native Computing Foundation with Kubernetes as the seed technology. The "large and diverse" community supporting the project has made its staying power more robust than other, older technologies of the same ilk. In January 2020, the CNCF annual report showed significant growth in interest, training, event attendance and investment related to Kubernetes. Linkerd is CNCF's fifth member project, and the project that coined the term "service mesh". Linkerd adds observability, security, and reliability features to applications by adding them to the platform rather than the application layer, and features a "micro-proxy" to maximize speed and security of its data plane. Linkerd graduated from CNCF in July 2021. Open Policy Agent (OPA) is "an open source general-purpose policy engine and language for cloud infrastructure." It became a CNCF incubating project in April 2019. OPA graduated from CNCF in February 2021. Prometheus is a cloud monitoring tool sponsored by SoundCloud in early iterations. In August 2018, the tool was designated a graduated project by the Cloud Native Computing Foundation. It is now a Cloud Native Computing Foundation member project. Rook is CNCF's first cloud native storage project. It became an incubation level project in 2018 and graduated in October 2020. SPIFFE is an open standard and framework for workload identity, much the same way that OAuth is an open standard and framework for human identity. It is built from the ground up to accommodate modern computing environments, which operate with systems scale and velocity (as opposed to human scale and velocity), while still maintaining interoperability with existing technologies like OAuth and X.509 Public key infrastructure. Unlike other identity standards, SPIFFE supports multiple credential types for a single identity, ensuring that the highly varied needs of production environments are consistently met without compromise. SPIFFE joined the CNCF as a sandbox project in 2018, was accepted to incubation in 2020, and graduated in 2022. SPIRE is an open source identity provider for workloads based on the SPIFFE framework. It is highly pluggable, and fills the attestation and issuance needs required by any workload identity solution. The plugin interfaces it exposes allows users to write integrations with in-house systems, build internal self-service portals, and more. It is a very powerful building block for issuing short-lived identity credentials to dynamic cloud workloads. SPIRE became a CNCF Graduated project in 2022. The Update Framework (TUF) helps developers to secure new or existing software update systems, which are often found to be vulnerable to many known attacks. TUF addresses this widespread problem by providing a comprehensive, flexible security framework that developers can integrate with any software update system. TUF was CNCF's first security-focused project and the ninth project overall to graduate from the foundation's hosting program. TiKV provides a distributed key–value database. Vitess is a database clustering system for horizontal scaling of MySQL, first created for internal use by YouTube. It became a CNCF project in 2018 and graduated in November 2019. Contour is a management server for Envoy that can direct the management of Kubernetes' traffic. Contour also provides routing features that are more advanced than Kubernetes' out-of-the-box Ingress specification. VMWare contributed the project to CNCF in July 2020. Cortex offers horizontally scalable, multi-tenant, long-term storage for Prometheus and works alongside Amazon DynamoDB, Google Bigtable, Cassandra, S3, GCS, and Microsoft Azure. It was introduced into the ecosystem incubator alongside Thanos in August 2020. CRI-O is an Open Container Initiative (OCI) based "implementation of Kubernetes Container Runtime Interface". CRI-O allows Kubernetes to be container runtime-agnostic. It became an incubating project in 2019. gRPC is a "modern open source high performance RPC framework that can run in any environment." The project was formed in 2015 when Google decided to open sou

    Read more →
  • Wake-sleep algorithm

    Wake-sleep algorithm

    The wake-sleep algorithm is an unsupervised learning algorithm for deep generative models, especially Helmholtz Machines. The algorithm is similar to the expectation-maximization algorithm, and optimizes the model likelihood for observed data. The name of the algorithm derives from its use of two learning phases, the “wake” phase and the “sleep” phase, which are performed alternately. It can be conceived as a model for learning in the brain, but is also being applied for machine learning. == Description == The goal of the wake-sleep algorithm is to find a hierarchical representation of observed data. In a graphical representation of the algorithm, data is applied to the algorithm at the bottom, while higher layers form gradually more abstract representations. Between each pair of layers are two sets of weights: Recognition weights, which define how representations are inferred from data, and generative weights, which define how these representations relate to data. == Training == Training consists of two phases – the “wake” phase and the “sleep” phase. It has been proven that this learning algorithm is convergent. === The "wake" phase === Neurons are fired by recognition connections (from what would be input to what would be output). Generative connections (leading from outputs to inputs) are then modified to increase probability that they would recreate the correct activity in the layer below – closer to actual data from sensory input. === The "sleep" phase === The process is reversed in the “sleep” phase – neurons are fired by generative connections while recognition connections are being modified to increase probability that they would recreate the correct activity in the layer above – further to actual data from sensory input. == Extensions == Since the recognition network is limited in its flexibility, it might not be able to approximate the posterior distribution of latent variables well. To better approximate the posterior distribution, it is possible to employ importance sampling, with the recognition network as the proposal distribution. This improved approximation of the posterior distribution also improves the overall performance of the model.

    Read more →
  • Ni1000

    Ni1000

    The Ni1000 is an artificial neural network chip developed by Nestor Corporation and Intel, developed in the 1990s. It is Intel's second-generation neural network chip, but the first all-digital chip. The chip is aimed at image analysis applications– containing more than 3 million transistors – and can analyze 40,000 patterns per second. Prototypes running Nestor's OCR software in 1994 were capable of recognizing around 100 handwritten characters per second. The development was funded with money from DARPA and Office of Naval Research.

    Read more →