AI Grammar Clean Up

AI Grammar Clean Up — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Enonic XP

    Enonic XP

    Enonic XP is a free and open-source content platform. Developed by the Norwegian software company Enonic, the platform can be used to build websites, progressive web applications, or web-based APIs. Enonic XP uses an application framework for coding server logic with JavaScript, and has no need for SQL as it ships with an integrated content repository. The CMS is fully decoupled, meaning developers can create traditional websites and landing pages, or use XP in headless mode, that is without the presentation layer, for loading editorial content onto any device or client. Enonic is used by major organizations in Norway, including the national postal service Norway Post, the insurance company Gjensidige, the Norwegian Labour and Welfare Administration, and all the top football clubs in the national football league for men, Eliteserien. == Overview == Enonic XP ships with the content management system (CMS) Content Studio. This includes a visual drag and drop editor, a landing page editor, support for multi-site and multi-language, media and structured content, advanced image editing, responsive user interface, permissions and roles management, revision and version control, and bulk publishing. Integrations and applications can be directly installed via the "Applications" section in XP, where the platform finds apps approved in the official Enonic Market. There are no third-party databases in Enonic XP. Instead, the developers have built a distributed storage repository, avoiding the need to index content. The system brings together capabilities from Filesystem, NoSQL, document stores, and search in the storage technology, which automatically indexes everything put into the storage. Enonic XP supports deployment of server side JavaScript. The open-source framework runs on top of a JVM (Java virtual machine), and allows developers to run the same code in the browser and on the server, thus enabling them to employ JavaScript. While running on the Java virtual machine, Enonic XP can be deployed on most infrastructures. The dependency on a third-party application server to deploy code has been removed, as the platform is an application server by default. A developer can for instance insert his own modules and code straight into the system while it is running. JavaScript unifies all the technical elements, and Enonic XP features a MVC framework where everything on the back-end can be coded with server-side JavaScript. The Enonic platform can use any template engine. === Progressive web apps === Another feature of Enonic XP is the possibility for developers to create progressive web apps (PWA). A PWA is a web application that is a regular web page or website, but can appear to the user like a mobile application. === Headless CMS and integrations === Enonic XP is headless, which means it separates content and presentation. The platform supports GraphQL, provides several default APIs, and allows for building custom APIs through the Guillotine starter kit. Consequently, Enonic supports modern front-end frameworks, and offers integrations with e.g. Next.js and React. == History == Enonic AS was founded in 2000 by Morten Øien Eriksen and Thomas Sigdestad. The software company specialized in building services and solutions, including a content management system known as "Vertical Site", then "Enonic CMS". Being aware that they had application, database, and website teams working on separate silos toward the same goal, Enonic sought to combine the different elements into a single software. The resulting application platform Enonic XP, first released in 2015, includes a CMS as an optional surface layer. In March 2020, Enonic XP was ranked by SoftwareReviews, a division of Info-Tech Research Group, a Canadian IT research and analyst firm, as the "Leader" in Web Experience Management. The ranking is based on user reviews, and is featured in SoftwareReviews‘ Digital Experience Data Quadrant Report, a comprehensive evaluation and ranking of leading Web Experience Management vendors. Enonic was also ranked first in 2021 and 2022. === Release history === Enonic XP assumed the mantle from the previous content management system Enonic CMS, and thus began with "version 5.0.0." The following list only contains major releases. == Development and support == Enonic offers a user and developer community consisting of a forum, support system with tickets, documentation, codex, learning and training center with certifications, and various community groups. Writing about the support system, Mike Johnston of CMS Critic notes that "enterprise customers obviously get access to a higher level of personalized support, where the Enonic support team can respond as fast as two hours." The support system is divided in three levels: silver, gold and platinum—from next day business support to 24/7 support. As Enonic XP is open-source, known vulnerabilities, bugs and issues are listed on GitHub.

    Read more →
  • Seq2seq

    Seq2seq

    Seq2seq is a family of machine learning approaches used for natural language processing. Originally developed by Lê Viết Quốc, a Vietnamese computer scientist and a machine learning pioneer at Google Brain, this framework has become foundational in many modern AI systems. Applications include language translation, image captioning, conversational models, speech recognition, and text summarization. Seq2seq uses sequence transformation: it turns one sequence into another sequence. == History == One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. seq2seq is an approach to machine translation (or more generally, sequence transduction) with roots in information theory, where communication is understood as an encode-transmit-decode process, and machine translation can be studied as a special case of communication. This viewpoint was elaborated, for example, in the noisy channel model of machine translation. In practice, seq2seq maps an input sequence into a real-numerical vector by using a neural network (the encoder), and then maps it back to an output sequence using another neural network (the decoder). The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014. In the seq2seq as proposed by them, both the encoder and the decoder were LSTMs. This had the "bottleneck" problem, since the encoding vector has a fixed size, so for long input sequences, information would tend to be lost, as they are difficult to fit into the fixed-length encoding vector. The attention mechanism, proposed in 2014, resolved the bottleneck problem. They called their model RNNsearch, as it "emulates searching through a source sentence during decoding a translation". A problem with seq2seq models at this point was that recurrent neural networks are difficult to parallelize. The 2017 publication of Transformers resolved the problem by replacing the encoding RNN with self-attention Transformer blocks ("encoder blocks"), and the decoding RNN with cross-attention causally-masked Transformer blocks ("decoder blocks"). === Priority dispute === One of the papers cited as the originator for seq2seq is (Sutskever et al 2014), published at Google Brain while they were on Google's machine translation project. The research allowed Google to overhaul Google Translate into Google Neural Machine Translation in 2016. Tomáš Mikolov claims to have developed the idea (before joining Google Brain) of using a "neural language model on pairs of sentences... and then [generating] translation after seeing the first sentence"—which he equates with seq2seq machine translation, and to have mentioned the idea to Ilya Sutskever and Quoc Le (while at Google Brain), who failed to acknowledge him in their paper. Mikolov had worked on RNNLM (using RNN for language modelling) for his PhD thesis, and is more notable for developing word2vec. == Architecture == The main reference for this section is. === Encoder === The encoder is responsible for processing the input sequence and capturing its essential information, which is stored as the hidden state of the network and, in a model with attention mechanism, a context vector. The context vector is the weighted sum of the input hidden states and is generated for every time instance in the output sequences. === Decoder === The decoder takes the context vector and hidden states from the encoder and generates the final output sequence. The decoder operates in an autoregressive manner, producing one element of the output sequence at a time. At each step, it considers the previously generated elements, the context vector, and the input sequence information to make predictions for the next element in the output sequence. Specifically, in a model with attention mechanism, the context vector and the hidden state are concatenated together to form an attention hidden vector, which is used as an input for the decoder. The seq2seq method developed in the early 2010s uses two neural networks: an encoder network converts an input sentence into numerical vectors, and a decoder network converts those vectors to sentences in the target language. The Attention mechanism was grafted onto this structure in 2014 and is shown below. Later it was refined into the encoder-decoder Transformer architecture of 2017. === Training vs prediction === There is a subtle difference between training and prediction. During training time, both the input and the output sequences are known. During prediction time, only the input sequence is known, and the output sequence must be decoded by the network itself. Specifically, consider an input sequence x 1 : n {\displaystyle x_{1:n}} and output sequence y 1 : m {\displaystyle y_{1:m}} . The encoder would process the input x 1 : n {\displaystyle x_{1:n}} step by step. After that, the decoder would take the output from the encoder, as well as the as input, and produce a prediction y ^ 1 {\displaystyle {\hat {y}}_{1}} . Now, the question is: what should be input to the decoder in the next step? A standard method for training is "teacher forcing". In teacher forcing, no matter what is output by the decoder, the next input to the decoder is always the reference. That is, even if y ^ 1 ≠ y 1 {\displaystyle {\hat {y}}_{1}\neq y_{1}} , the next input to the decoder is still y 1 {\displaystyle y_{1}} , and so on. During prediction time, the "teacher" y 1 : m {\displaystyle y_{1:m}} would be unavailable. Therefore, the input to the decoder must be y ^ 1 {\displaystyle {\hat {y}}_{1}} , then y ^ 2 {\displaystyle {\hat {y}}_{2}} , and so on. It is found that if a model is trained purely by teacher forcing, its performance would degrade during prediction time, since generation based on the model's own output is different from generation based on the teacher's output. This is called exposure bias or a train/test distribution shift. A 2015 paper recommends that, during training, randomly switch between teacher forcing and no teacher forcing. === Attention for seq2seq === The attention mechanism is an enhancement introduced by Bahdanau et al. in 2014 to address limitations in the basic Seq2Seq architecture where a longer input sequence results in the hidden state output of the encoder becoming irrelevant for the decoder. It enables the model to selectively focus on different parts of the input sequence during the decoding process. At each decoder step, an alignment model calculates the attention score using the current decoder state and all of the attention hidden vectors as input. An alignment model is another neural network model that is trained jointly with the seq2seq model used to calculate how well an input, represented by the hidden state, matches with the previous output, represented by attention hidden state. A softmax function is then applied to the attention score to get the attention weight. In some models, the encoder states are directly fed into an activation function, removing the need for alignment model. An activation function receives one decoder state and one encoder state and returns a scalar value of their relevance. Consider the seq2seq language English-to-French translation task. To be concrete, let us consider the translation of "the zone of international control ", which should translate to "la zone de contrôle international ". Here, we use the special token as a control character to delimit the end of input for both the encoder and the decoder. An input sequence of text x 0 , x 1 , … {\displaystyle x_{0},x_{1},\dots } is processed by a neural network (which can be an LSTM, a Transformer encoder, or some other network) into a sequence of real-valued vectors h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , where h {\displaystyle h} stands for "hidden vector". After the encoder has finished processing, the decoder starts operating over the hidden vectors, to produce an output sequence y 0 , y 1 , … {\displaystyle y_{0},y_{1},\dots } , autoregressively. That is, it always takes as input both the hidden vectors produced by the encoder, and what the decoder itself has produced before, to produce the next output word: ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , "") → "la" ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la") → "la zone" ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la zone") → "la zone de" ... ( h 0 , h 1 , … {\displaystyle h_{0},h_{1},\dots } , " la zone de contrôle international") → "la zone de contrôle international " Here, we use the special token as a control character to delimit the start of input for the decoder. The decoding terminates as soon as "" appears in the decoder output. ==

    Read more →
  • Application-release automation

    Application-release automation

    Application-release automation (ARA) refers to the process of packaging and deploying an application or update of an application from development, across various environments, and ultimately to production. ARA solutions must combine the capabilities of deployment automation, environment management and modeling, and release coordination. == Relationship with DevOps == ARA tools help cultivate DevOps best practices by providing a combination of automation, environment modeling and workflow-management capabilities. These practices help teams deliver software rapidly, reliably and responsibly. ARA tools achieve a key DevOps goal of implementing continuous delivery with a large quantity of releases quickly. == Relationship with deployment == ARA is more than just software-deployment automation – it deploys applications using structured release-automation techniques that allow for an increase in visibility for the whole team. It combines workload automation and release-management tools as they relate to release packages, as well as movement through different environments within the DevOps pipeline. ARA tools help regulate deployments, how environments are created and deployed, and how and when releases are deployed. == ARA Solutions == All ARA solutions must include capabilities in automation, environment modeling, and release coordination. Additionally, the solution must provide this functionality without reliance on other tools.

    Read more →
  • Grammar induction

    Grammar induction

    Grammar induction (or grammatical inference) is the process in machine learning of learning a formal grammar (usually as a collection of re-write rules or productions or alternatively as a finite-state machine or automaton of some kind) from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs. == Grammar classes == Grammatical inference has often been very focused on the problem of learning finite-state machines of various types (see the article Induction of regular languages for details on these approaches), since there have been efficient algorithms for this problem since the 1980s. Since the beginning of the century, these approaches have been extended to the problem of inference of context-free grammars and richer formalisms, such as multiple context-free grammars and parallel multiple context-free grammars. Other classes of grammars for which grammatical inference has been studied are combinatory categorial grammars, stochastic context-free grammars, contextual grammars and pattern languages. == Learning models == The simplest form of learning is where the learning algorithm merely receives a set of examples drawn from the language in question: the aim is to learn the language from examples of it (and, rarely, from counter-examples, that is, example that do not belong to the language). However, other learning models have been studied. One frequently studied alternative is the case where the learner can ask membership queries as in the exact query learning model or minimally adequate teacher model introduced by Angluin. == Methodologies == There is a wide variety of methods for grammatical inference. Two of the classic sources are Fu (1977) and Fu (1982). Duda, Hart & Stork (2001) also devote a brief section to the problem, and cite a number of references. The basic trial-and-error method they present is discussed below. For approaches to infer subclasses of regular languages in particular, see Induction of regular languages. A more recent textbook is de la Higuera (2010), which covers the theory of grammatical inference of regular languages and finite state automata. D'Ulizia, Ferri and Grifoni provide a survey that explores grammatical inference methods for natural languages. === Induction of probabilistic grammars === There are several methods for induction of probabilistic context-free grammars. === Grammatical inference by trial-and-error === The method proposed in Section 8.7 of Duda, Hart & Stork (2001) suggests successively guessing grammar rules (productions) and testing them against positive and negative observations. The rule set is expanded so as to be able to generate each positive example, but if a given rule set also generates a negative example, it must be discarded. This particular approach can be characterized as "hypothesis testing" and bears some similarity to Mitchel's version space algorithm. The Duda, Hart & Stork (2001) text provide a simple example which nicely illustrates the process, but the feasibility of such an unguided trial-and-error approach for more substantial problems is dubious. === Grammatical inference by genetic algorithms === Grammatical induction using evolutionary algorithms is the process of evolving a representation of the grammar of a target language through some evolutionary process. Formal grammars can easily be represented as tree structures of production rules that can be subjected to evolutionary operators. Algorithms of this sort stem from the genetic programming paradigm pioneered by John Koza. Other early work on simple formal languages used the binary string representation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNF language made trees a more flexible approach. Koza represented Lisp programs as trees. He was able to find analogues to the genetic operators within the standard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of genetic crossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness is measured by scoring the output from the functions of the Lisp code. Similar analogues between the tree structured lisp representation and the representation of grammars as trees, made the application of genetic programming techniques possible for grammar induction. In the case of grammar induction, the transplantation of sub-trees corresponds to the swapping of production rules that enable the parsing of phrases from some language. The fitness operator for the grammar is based upon some measure of how well it performed in parsing some group of sentences from the target language. In a tree representation of a grammar, a terminal symbol of a production rule corresponds to a leaf node of the tree. Its parent nodes corresponds to a non-terminal symbol (e.g. a noun phrase or a verb phrase) in the rule set. Ultimately, the root node might correspond to a sentence non-terminal. === Grammatical inference by greedy algorithms === Like all greedy algorithms, greedy grammar inference algorithms make, in iterative manner, decisions that seem to be the best at that stage. The decisions made usually deal with things like the creation of new rules, the removal of existing rules, the choice of a rule to be applied or the merging of some existing rules. Because there are several ways to define 'the stage' and 'the best', there are also several greedy grammar inference algorithms. These context-free grammar generating algorithms make the decision after every read symbol: Lempel-Ziv-Welch algorithm creates a context-free grammar in a deterministic way such that it is necessary to store only the start rule of the generated grammar. Sequitur and its modifications. These context-free grammar generating algorithms first read the whole given symbol-sequence and then start to make decisions: Byte pair encoding and its optimizations. === Distributional learning === A more recent approach is based on distributional learning. Algorithms using these approaches have been applied to learning context-free grammars and mildly context-sensitive languages and have been proven to be correct and efficient for large subclasses of these grammars. === Learning of pattern languages === Angluin defines a pattern to be "a string of constant symbols from Σ and variable symbols from a disjoint set". The language of such a pattern is the set of all its nonempty ground instances i.e. all strings resulting from consistent replacement of its variable symbols by nonempty strings of constant symbols. A pattern is called descriptive for a finite input set of strings if its language is minimal (with respect to set inclusion) among all pattern languages subsuming the input set. Angluin gives a polynomial algorithm to compute, for a given input string set, all descriptive patterns in one variable x. To this end, she builds an automaton representing all possibly relevant patterns; using sophisticated arguments about word lengths, which rely on x being the only variable, the state count can be drastically reduced. Erlebach et al. give a more efficient version of Angluin's pattern learning algorithm, as well as a parallelized version. Arimura et al. show that a language class obtained from limited unions of patterns can be learned in polynomial time. === Pattern theory === Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithms and machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language. In addition to the new algebraic vocabulary, its statistical approach was novel in its aim to: Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was commonplace at the time. Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbs-like graph. Study the randomness and variability of these graphs. Create the basic classes of stochastic models applied by listing the deformations of the patterns. Synthesize (sample) from the models, not just analyze signals with it. Broad in its mathematical coverage, pattern theory spans algebra and statistics, as well as local topological and global entropic properties. == Applications == The principle of grammar induction has been applied to other aspects of natural language processing, and has been applied (among many other problems) to semantic parsing, natural language understanding, example-based translation, language acquisition, grammar-based compre

    Read more →
  • Artificial Linguistic Internet Computer Entity

    Artificial Linguistic Internet Computer Entity

    A.L.I.C.E. (Artificial Linguistic Internet Computer Entity), also referred to as Alicebot, or simply Alice, is a natural language processing chatbot—a program that engages in a conversation with a human by applying some heuristical pattern matching rules to the human's input. It was inspired by Joseph Weizenbaum's classical ELIZA program. It is one of the strongest programs of its type and has won the Loebner Prize, awarded to accomplished humanoid, talking robots, three times (in 2000, 2001, and 2004). The program is unable to pass the Turing test, as even the casual user will often expose its mechanistic aspects in short conversations. Alice was originally composed by Richard Wallace; it "came to life" on November 23, 1995. The program was rewritten in Java beginning in 1998. The current incarnation of the Java implementation is Program D. The program uses an XML Schema called AIML (Artificial Intelligence Markup Language) for specifying the heuristic conversation rules. Alice code has been reported to be available as open source. The AIML source is available from ALICE A.I. Foundation on Google Code and from the GitHub account of Richard Wallace. These AIML files can be run using an AIML interpreter like Program O or Program AB. == In popular culture == Spike Jonze has cited ALICE as the inspiration for his academy award-winning film Her, in which a human falls in love with a chatbot. In a New Yorker article titled “Can Humans Fall in Love with Bots?” Jonze said “that the idea originated from a program he tried about a decade ago called the ALICE bot, which engages in friendly conversation.” The Los Angeles Times reported:Though the film’s premise evokes comparisons to Siri, Jonze said he actually had the idea well before the Apple digital assistant came along, after using a program called Alicebot about ten years ago. As geek nostalgists will recall, that intriguing if at times crude software (it flunked the industry-standard Turing Test) would attempt to engage users in everyday chatter based on a database of prior conversations. Jonze liked it, and decided to apply a film genre to it. “I thought about that idea, and what if you had a real relationship with it?” Jonze told reporters. “And I used that as a way to write a relationship movie and a love story.”

    Read more →
  • Topological deep learning

    Topological deep learning

    Topological deep learning (TDL) is a research field that extends deep learning to handle complex, non-Euclidean data structures. Traditional deep learning models, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), excel in processing data on regular grids and sequences. However, scientific and real-world data often exhibit more intricate data domains encountered in scientific computations, including point clouds, meshes, time series, scalar fields graphs, or general topological spaces like simplicial complexes and CW complexes. TDL addresses this by incorporating topological concepts to process data with higher-order relationships, such as interactions among multiple entities and complex hierarchies. This approach leverages structures like simplicial complexes and hypergraphs to capture global dependencies and qualitative spatial properties, offering a more nuanced representation of data. TDL also encompasses methods from computational and algebraic topology that permit studying properties of neural networks and their training process, such as their predictive performance or generalization properties. The mathematical foundations of TDL are algebraic topology, differential topology, and geometric topology. Therefore, TDL can be generalized for data on differentiable manifolds, knots, links, tangles, curves, etc. == History and motivation == Traditional techniques from deep learning often operate under the assumption that a dataset is residing in a highly-structured space (like images, where convolutional neural networks exhibit outstanding performance over alternative methods) or a Euclidean space. The prevalence of new types of data, in particular graphs, meshes, and molecules, resulted in the development of new techniques, culminating in the field of geometric deep learning, which originally proposed a signal-processing perspective for treating such data types. While originally confined to graphs, where connectivity is defined based on nodes and edges, follow-up work extended concepts to a larger variety of data types, including simplicial complexes and CW complexes, with recent work proposing a unified perspective of message-passing on general combinatorial complexes. An independent perspective on different types of data originated from topological data analysis, which proposed a new framework for describing structural information of data, i.e., their "shape," that is inherently aware of multiple scales in data, ranging from local information to global information. While at first restricted to smaller datasets, subsequent work developed new descriptors that efficiently summarized topological information of datasets to make them available for traditional machine-learning techniques, such as support vector machines or random forests. Such descriptors ranged from new techniques for feature engineering over new ways of providing suitable coordinates for topological descriptors, or the creation of more efficient dissimilarity measures. Contemporary research in this field is largely concerned with either integrating information about the underlying data topology into existing deep-learning models or obtaining novel ways of training on topological domains. == Learning on topological spaces == One of the core concepts in topological deep learning is considering the domain upon which this data is defined and supported. In case of Euclidean data, such as images, this domain is a grid, upon which the pixel value of the image is supported. In a more general setting this domain might be a topological domain. Studying and developing deep learning models that are supported ln topological domains constitute the essence of topological deep learning. Next, we introduce the most common topological domains that are encountered in a deep learning setting. These domains include, but not limited to, graphs, simplicial complexes, cell complexes, combinatorial complexes and hypergraphs. Given a finite set S of abstract entities, a neighborhood function N {\displaystyle {\mathcal {N}}} on S is an assignment that attach to every point x {\displaystyle x} in S a subset of S or a relation. Such a function can be induced by equipping S with an auxiliary structure. Edges provide one way of defining relations among the entities of S. More specifically, edges in a graph allow one to define the notion of neighborhood using, for instance, the one hop neighborhood notion. Edges however, limited in their modeling capacity as they can only be used to model binary relations among entities of S since every edge is connected typically to two entities. In many applications, it is desirable to permit relations that incorporate more than two entities. The idea of using relations that involve more than two entities is central to topological domains. Such higher-order relations allow for a broader range of neighborhood functions to be defined on S to capture multi-way interactions among entities of S. Next we review the main properties, advantages, and disadvantages of some commonly studied topological domains in the context of deep learning, including (abstract) simplicial complexes, regular cell complexes, hypergraphs, and combinatorial complexes. ==== Comparisons among topological domains ==== Each of the enumerated topological domains has its own characteristics, advantages, and limitations: Simplicial complexes Simplest form of higher-order domains. Extensions of graph-based models. Admit hierarchical structures, making them suitable for various applications. Hodge theory can be naturally defined on simplicial complexes. Require relations to be subsets of larger relations, imposing constraints on the structure. Cell Complexes Generalize simplicial complexes. Provide more flexibility in defining higher-order relations. Each cell in a cell complex is homeomorphic to an open ball, attached together via attaching maps. Boundary cells of each cell in a cell complex are also cells in the complex. Represented combinatorially via incidence matrices. Hypergraphs Allow arbitrary set-type relations among entities. Relations are not imposed by other relations, providing more flexibility. Do not explicitly encode the dimension of cells or relations. Useful when relations in the data do not adhere to constraints imposed by other models like simplicial and cell complexes. Combinatorial Complexes : Generalize and bridge the gaps between simplicial complexes, cell complexes, and hypergraphs. Allow for hierarchical structures and set-type relations. Combine features of other complexes while providing more flexibility in modeling relations. Can be represented combinatorially, similar to cell complexes. ==== Hierarchical structure and set-type relations ==== The properties of simplicial complexes, cell complexes, and hypergraphs give rise to two main features of relations on higher-order domains, namely hierarchies of relations and set-type relations. ===== Rank function ===== A rank function on a higher-order domain X is an order-preserving function rk: X → Z, where rk(x) attaches a non-negative integer value to each relation x in X, preserving set inclusion in X. Cell and simplicial complexes are common examples of higher-order domains equipped with rank functions and therefore with hierarchies of relations. ===== Set-type relations ===== Relations in a higher-order domain are called set-type relations if the existence of a relation is not implied by another relation in the domain. Hypergraphs constitute examples of higher-order domains equipped with set-type relations. Given the modeling limitations of simplicial complexes, cell complexes, and hypergraphs, we develop the combinatorial complex, a higher-order domain that features both hierarchies of relations and set-type relations. The learning tasks in TDL can be broadly classified into three categories: Cell classification: Predict targets for each cell in a complex. Examples include triangular mesh segmentation, where the task is to predict the class of each face or edge in a given mesh. Complex classification: Predict targets for an entire complex. For example, predict the class of each input mesh. Cell prediction: Predict properties of cell-cell interactions in a complex, and in some cases, predict whether a cell exists in the complex. An example is the prediction of linkages among entities in hyperedges of a hypergraph. In practice, to perform the aforementioned tasks, deep learning models designed for specific topological spaces must be constructed and implemented. These models, known as topological neural networks, are tailored to operate effectively within these spaces. === Topological neural networks === Central to TDL are topological neural networks (TNNs), specialized architectures designed to operate on data structured in topological domains. Unlike traditional neural networks tailored for grid-like structures, TNNs are adept at handling more intricate data representations, such as graphs

    Read more →
  • Concept mining

    Concept mining

    Concept mining is an activity that results in the extraction of concepts from artifacts. Solutions to the task typically involve aspects of artificial intelligence and statistics, such as data mining and text mining. Because artifacts are typically a loosely structured sequence of words and other symbols (rather than concepts), the problem is nontrivial, but it can provide powerful insights into the meaning, provenance and similarity of documents. == Methods == Traditionally, the conversion of words to concepts has been performed using a thesaurus, and for computational techniques the tendency is to do the same. The thesauri used are either specially created for the task, or a pre-existing language model, usually related to Princeton's WordNet. The mappings of words to concepts are often ambiguous. Typically each word in a given language will relate to several possible concepts. Humans use context to disambiguate the various meanings of a given piece of text, where available machine translation systems cannot easily infer context. For the purposes of concept mining, however, these ambiguities tend to be less important than they are with machine translation, for in large documents the ambiguities tend to even out, much as is the case with text mining. There are many techniques for disambiguation that may be used. Examples are linguistic analysis of the text and the use of word and concept association frequency information that may be inferred from large text corpora. Recently, techniques that base on semantic similarity between the possible concepts and the context have appeared and gained interest in the scientific community. == Applications == === Detecting and indexing similar documents in large corpora === One of the spin-offs of calculating document statistics in the concept domain, rather than the word domain, is that concepts form natural tree structures based on hypernymy and meronymy. These structures can be used to generate simple tree membership statistics, that can be used to locate any document in a Euclidean concept space. If the size of a document is also considered as another dimension of this space then an extremely efficient indexing system can be created. This technique is currently in commercial use locating similar legal documents in a 2.5 million document corpus. === Clustering documents by topic === Standard numeric clustering techniques may be used in "concept space" as described above to locate and index documents by the inferred topic. These are numerically far more efficient than their text mining cousins, and tend to behave more intuitively, in that they map better to the similarity measures a human would generate.

    Read more →
  • Version space learning

    Version space learning

    Version space learning is a logical approach to machine learning, specifically binary classification. Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences. Formally, the hypothesis space is a disjunction H 1 ∨ H 2 ∨ . . . ∨ H n {\displaystyle H_{1}\lor H_{2}\lor ...\lor H_{n}} (i.e., one or more of hypotheses 1 through n are true). A version space learning algorithm is presented with examples, which it will use to restrict its hypothesis space; for each example x, the hypotheses that are inconsistent with x are removed from the space. This iterative refining of the hypothesis space is called the candidate elimination algorithm, the hypothesis space maintained inside the algorithm, its version space. == The version space algorithm == In settings where there is a generality-ordering on hypotheses, it is possible to represent the version space by two sets of hypotheses: (1) the most specific consistent hypotheses, and (2) the most general consistent hypotheses, where "consistent" indicates agreement with observed data. The most specific hypotheses (i.e., the specific boundary SB) cover the observed positive training examples, and as little of the remaining feature space as possible. These hypotheses, if reduced any further, exclude a positive training example, and hence become inconsistent. These minimal hypotheses essentially constitute a (pessimistic) claim that the true concept is defined just by the positive data already observed: Thus, if a novel (never-before-seen) data point is observed, it should be assumed to be negative. (I.e., if data has not previously been ruled in, then it's ruled out.) The most general hypotheses (i.e., the general boundary GB) cover the observed positive training examples, but also cover as much of the remaining feature space without including any negative training examples. These, if enlarged any further, include a negative training example, and hence become inconsistent. These maximal hypotheses essentially constitute a (optimistic) claim that the true concept is defined just by the negative data already observed: Thus, if a novel (never-before-seen) data point is observed, it should be assumed to be positive. (I.e., if data has not previously been ruled out, then it's ruled in.) Thus, during learning, the version space (which itself is a set – possibly infinite – containing all consistent hypotheses) can be represented by just its lower and upper bounds (maximally general and maximally specific hypothesis sets), and learning operations can be performed just on these representative sets. After learning, classification can be performed on unseen examples by testing the hypothesis learned by the algorithm. If the example is consistent with multiple hypotheses, a majority vote rule can be applied. == Historical background == The notion of version spaces was introduced by Mitchell in the early 1980s as a framework for understanding the basic problem of supervised learning within the context of solution search. Although the basic "candidate elimination" search method that accompanies the version space framework is not a popular learning algorithm, there are some practical implementations that have been developed (e.g., Sverdlik & Reynolds 1992, Hong & Tsang 1997, Dubois & Quafafou 2002). A major drawback of version space learning is its inability to deal with noise: any pair of inconsistent examples can cause the version space to collapse, i.e., become empty, so that classification becomes impossible. One solution of this problem is proposed by Dubois and Quafafou that proposed the Rough Version Space, where rough sets based approximations are used to learn certain and possible hypothesis in the presence of inconsistent data.

    Read more →
  • Event store

    Event store

    An event store is a type of database optimized for storage of events. Conceptually, an event store records only the events affecting an entity, dossier, or policy, and the state of the entity at any point in its history can be reconstructed by replaying its contributing events in sequential order. Events (and their corresponding data) are the only "real" facts that should be stored in the database. All other objects can be derived from these events, meaning they are instantiated in memory by runtime code as needed (e.g. for showing in a user interface). In theory, any object that aggregates over recorded event data is not stored in the database. Instead these objects are built 'on the fly', by traversing the event history. When the aggregated object instance is no longer needed, it can simply be discarded (released from memory). == Example with insurance policies == For example, the event store concept of a database can be applied to insurance policies or pension dossiers. In these policies or dossiers the instantiation of each object that make up the dossier or policy (the person, partner(s), employments, etc.) can be derived and can be instantiated in memory based on the real world events. == Double timeline == A crucial part of an event store database is that each event has a double timeline: This enables event stores to correct errors of events that have been entered into the event store database before. The two dates are: Valid date is the date at which the event has become valid. Transaction date is the date at which the event is entered into the database. == Error correction == Another crucial part of an event store database is that events that are stored are not allowed to be changed. Once stored, also erroneous events are not changed anymore. The only way to change (or better: correct) these events is to instantiate a new event with the new values and using the double timeline. A correcting event would have the new values of the original event, with an event data of that corrected event, but a different transaction date. This mechanism ensures reproducibility at each moment in the time, even in the time period before the correction has taken place. It also allows to reproduce situations based on erroneous events (if required). == Advantages and disadvantages == One advantage of the event store concept is that handling the effects of back dated events (events that take effect before previous events and that may even invalidate them) is much easier. An event store will simplify the code in that rolling back erroneous situations and rolling up the new, correct situations is not needed anymore. Disadvantage may be that the code needs to re-instantiate all objects in memory based on the events each time a service call is received for a specific dossier or policy. == Compared to regular databases == In regular databases, handling backdated events to correct previous, erroneous events can be painful as it often results in rolling back all previous, erroneous transactions and objects and rolling up the new, correct transactions and objects. In an event store, only the new event (and its corresponding facts) are stored. The code will then redetermine the transactions and objects based on the new facts in memory.

    Read more →
  • Egocentric vision

    Egocentric vision

    Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting. The wearable camera looking forwards is often supplemented with a camera looking inward at the user's eye and able to measure a user's eye gaze, which is useful to reveal attention and to better understand the user's activity and intentions. == History == The idea of using a wearable camera to gather visual data from a first-person perspective dates back to the 70s, when Steve Mann invented "Digital Eye Glass", a device that, when worn, causes the human eye itself to effectively become both an electronic camera and a television display. Subsequently, wearable cameras were used for health-related applications in the context of Humanistic Intelligence and Wearable AI. Egocentric vision is best done from the point-of-eye, but may also be done by way of a neck-worn camera when eyeglasses would be in-the-way. This neck-worn variant was popularized by way of the Microsoft SenseCam in 2006 for experimental health research works. The interest of the computer vision community into the egocentric paradigm has been arising slowly entering the 2010s and it is rapidly growing in recent years, boosted by both the impressive advances in the field of wearable technology and by the increasing number of potential applications. The prototypical first-person vision system described by Kanade and Hebert, in 2012 is composed by three basic components: a localization component able to estimate the surrounding, a recognition component able to identify object and people, and an activity recognition component, able to provide information about the current activity of the user. Together, these three components provide a complete situational awareness of the user, which in turn can be used to provide assistance to the user or to the caregiver. Following this idea, the first computational techniques for egocentric analysis focused on hand-related activity recognition and social interaction analysis. Also, given the unconstrained nature of the video and the huge amount of data generated, temporal segmentation and summarization were among the first problems addressed. After almost ten years of egocentric vision (2007–2017), the field is still undergoing diversification. Emerging research topics include: Social saliency estimation Multi-agent egocentric vision systems Privacy preserving techniques and applications Attention-based activity analysis Social interaction analysis Hand pose analysis Ego graphical User Interfaces (EUI) Understanding social dynamics and attention Revisiting robotic vision and machine vision as egocentric sensing Activity forecasting Gaze prediction == Technical challenges == Today's wearable cameras are small and lightweight digital recording devices that can acquire images and videos automatically, without the user intervention, with different resolutions and frame rates, and from a first-person point of view. Therefore, wearable cameras are naturally primed to gather visual information from our everyday interactions since they offer an intimate perspective of the visual field of the camera wearer. Depending on the frame rate, it is common to distinguish between photo-cameras (also called lifelogging cameras) and video-cameras. The former (e.g., Narrative Clip and Microsoft SenseCam), are commonly worn on the chest, and are characterized by a very low frame rate (up to 2fpm) that allows to capture images over a long period of time without the need of recharging the battery. Consequently, they offer considerable potential for inferring knowledge about e.g. behaviour patterns, habits or lifestyle of the user. However, due to the low frame-rate and the free motion of the camera, temporally adjacent images typically present abrupt appearance changes so that motion features cannot be reliably estimated. The latter (e.g., Google Glass, GoPro), are commonly mounted on the head, and capture conventional video (around 35fps) that allows to capture fine temporal details of interactions. Consequently, they offer potential for in-depth analysis of daily or special activities. However, since the camera is moving with the wearer head, it becomes more difficult to estimate the global motion of the wearer and in the case of abrupt movements, the images can result blurred. In both cases, since the camera is worn in a naturalistic setting, visual data present a huge variability in terms of illumination conditions and object appearance. Moreover, the camera wearer is not visible in the image and what he/she is doing has to be inferred from the information in the visual field of the camera, implying that important information about the wearer, such for instance as pose or facial expression estimation, is not available. == Applications == A collection of studies published in a special theme issue of the American Journal of Preventive Medicine has demonstrated the potential of lifelogs captured through wearable cameras from a number of viewpoints. In particular, it has been shown that used as a tool for understanding and tracking lifestyle behaviour, lifelogs would enable the prevention of noncommunicable diseases associated to unhealthy trends and risky profiles (such as obesity and depression). In addition, used as a tool of re-memory cognitive training, lifelogs would enable the prevention of cognitive and functional decline in elderly people. More recently, egocentric cameras have been used to study human and animal cognition, human-human social interaction, human-robot interaction, human expertise in complex tasks. Other applications include navigation/assistive technologies for the blind, monitoring and assistance of industrial workflows, and augmented reality interfaces.

    Read more →
  • Image registration

    Image registration

    Image registration is the process of transforming different sets of data into one coordinate system. Data may be multiple photographs, data from different sensors, times, depths, or viewpoints. It is used in computer vision, medical imaging, military automatic target recognition, and compiling and analyzing images and data from satellites. Registration is necessary in order to be able to compare or integrate the data obtained from these different measurements. == Algorithm classification == === Intensity-based vs feature-based === Image registration or image alignment algorithms can be classified into intensity-based and feature-based. One of the images is referred to as the target, fixed or sensed image and the others are referred to as the moving or source images. Image registration involves spatially transforming the source/moving image(s) to align with the target image. The reference frame in the target image is stationary, while the other datasets are transformed to match to the target. Intensity-based methods compare intensity patterns in images via correlation metrics, while feature-based methods find correspondence between image features such as points, lines, and contours. Intensity-based methods register entire images or sub-images. If sub-images are registered, centers of corresponding sub images are treated as corresponding feature points. Feature-based methods establish a correspondence between a number of especially distinct points in images. Knowing the correspondence between a number of points in images, a geometrical transformation is then determined to map the target image to the reference images, thereby establishing point-by-point correspondence between the reference and target images. Methods combining intensity-based and feature-based information have also been developed. === Transformation models === Image registration algorithms can also be classified according to the transformation models they use to relate the target image space to the reference image space. The first broad category of transformation models includes affine transformations, which include rotation, scaling, translation and shearing. Affine transformations are global in nature, thus, they cannot model local geometric differences between images. The second category of transformations allow 'elastic' or 'nonrigid' transformations. These transformations are capable of locally warping the target image to align with the reference image. Nonrigid transformations include radial basis functions (thin-plate or surface splines, multiquadrics, and compactly-supported transformations), physical continuum models (viscous fluids), and large deformation models (diffeomorphisms). Transformations are commonly described by a parametrization, where the model dictates the number of parameters. For instance, the translation of a full image can be described by a translation vector parameter. These models are called parametric models. Non-parametric models on the other hand, do not follow any parameterization, allowing each image element to be displaced arbitrarily. There are a number of programs that implement both estimation and application of a warp-field. It is a part of the SPM and AIR programs. === Transformations of coordinates via the law of function composition rather than addition === Alternatively, many advanced methods for spatial normalization are building on structure preserving transformations homeomorphisms and diffeomorphisms since they carry smooth submanifolds smoothly during transformation. Diffeomorphisms are generated in the modern field of Computational Anatomy based on flows since diffeomorphisms are not additive although they form a group, but a group under the law of function composition. For this reason, flows which generalize the ideas of additive groups allow for generating large deformations that preserve topology, providing 1-1 and onto transformations. Computational methods for generating such transformation are often called LDDMM which provide flows of diffeomorphisms as the main computational tool for connecting coordinate systems corresponding to the geodesic flows of Computational Anatomy. There are a number of programs which generate diffeomorphic transformations of coordinates via diffeomorphic mapping including MRI Studio and MRI Cloud.org === Spatial vs frequency domain methods === Spatial methods operate in the image domain, matching intensity patterns or features in images. Some of the feature matching algorithms are outgrowths of traditional techniques for performing manual image registration, in which an operator chooses corresponding control points (CP) in images. When the number of control points exceeds the minimum required to define the appropriate transformation model, iterative algorithms like RANSAC can be used to robustly estimate the parameters of a particular transformation type (e.g. affine) for registration of the images. Frequency-domain methods find the transformation parameters for registration of the images while working in the transform domain. Such methods work for simple transformations, such as translation, rotation, and scaling. Applying the phase correlation method to a pair of images produces a third image which contains a single peak. The location of this peak corresponds to the relative translation between the images. Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects typical of medical or satellite images. Additionally, the phase correlation uses the fast Fourier transform to compute the cross-correlation between the two images, generally resulting in large performance gains. The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation. === Single- vs multi-modality methods === Another classification can be made between single-modality and multi-modality methods. Single-modality methods tend to register images in the same modality acquired by the same scanner/sensor type, while multi-modality registration methods tended to register images acquired by different scanner/sensor types. Multi-modality registration methods are often used in medical imaging as images of a subject are frequently obtained from different scanners. Examples include registration of brain CT/MRI images or whole body PET/CT images for tumor localization, registration of contrast-enhanced CT images against non-contrast-enhanced CT images for segmentation of specific parts of the anatomy, and registration of ultrasound and CT images for prostate localization in radiotherapy. === Automatic vs interactive methods === Registration methods may be classified based on the level of automation they provide. Manual, interactive, semi-automatic, and automatic methods have been developed. Manual methods provide tools to align the images manually. Interactive methods reduce user bias by performing certain key operations automatically while still relying on the user to guide the registration. Semi-automatic methods perform more of the registration steps automatically but depend on the user to verify the correctness of a registration. Automatic methods do not allow any user interaction and perform all registration steps automatically. === Similarity measures for image registration === Image similarities are broadly used in medical imaging. An image similarity measure quantifies the degree of similarity between intensity patterns in two images. The choice of an image similarity measure depends on the modality of the images to be registered. Common examples of image similarity measures include cross-correlation, mutual information, sum of squared intensity differences, and ratio image uniformity. Mutual information and normalized mutual information are the most popular image similarity measures for registration of multimodality images. Cross-correlation, sum of squared intensity differences and ratio image uniformity are commonly used for registration of images in the same modality. Many new features have been derived for cost functions based on matching methods via large deformations have emerged in the field Computational Anatomy including Measure matching which are pointsets or landmarks without correspondence, Curve matching and Surface matching via mathematical currents and varifolds. == Uncertainty == There is a level of uncertainty associated with registering images that have any spatio-temporal differences. A confident registration with a measure of uncertainty is critical for many change detection applications such as medical diagnostics. In remote sensing applications where a digital image pixel may represent several kilometers of spatial distance (such as NASA's LANDSAT imagery), an uncertain image registration can mean that a solution could b

    Read more →
  • PatchMatch

    PatchMatch

    PatchMatch is an algorithm used to quickly find correspondences (or matches) between small square regions (or patches) of an image. It has various applications in image editing, such as reshuffling or removing objects from images or altering their aspect ratios without cropping or noticeably stretching them. PatchMatch was first presented in a 2011 paper by researchers at Princeton University. == Algorithm == The goal of the algorithm is to find the patch correspondence by defining a nearest-neighbor field (NNF) as a function f : R 2 → R 2 {\displaystyle f:\mathbb {R} ^{2}\to \mathbb {R} ^{2}} of offsets, which is over all possible matches of patch (location of patch centers) in image A, for some distance function of two patches D {\displaystyle D} . So, for a given patch coordinate a {\displaystyle a} in image A {\displaystyle A} and its corresponding nearest neighbor b {\displaystyle b} in image B {\displaystyle B} , f ( a ) {\displaystyle f(a)} is simply b − a {\displaystyle b-a} . However, if we search for every point in image B {\displaystyle B} , the work will be too hard to complete. So the following algorithm is done in a randomized approach in order to accelerate the calculation speed. The algorithm has three main components. Initially, the nearest-neighbor field is filled with either random offsets or some prior information. Next, an iterative update process is applied to the NNF, in which good patch offsets are propagated to adjacent pixels, followed by random search in the neighborhood of the best offset found so far. Independent of these three components, the algorithm also uses a coarse-to-fine approach by building an image pyramid to obtain the better result. === Initialization === When initializing with random offsets, we use independent uniform samples across the full range of image B {\displaystyle B} . This algorithm avoids using an initial guess from the previous level of the pyramid because in this way the algorithm can avoid being trapped in local minima. === Iteration === After initialization, the algorithm attempted to perform iterative process of improving the N N F {\displaystyle NNF} . The iterations examine the offsets in scan order (from left to right, top to bottom), and each undergoes propagation followed by random search. === Propagation === We attempt to improve f ( x , y ) {\displaystyle f(x,y)} using the known offsets of f ( x − 1 , y ) {\displaystyle f(x-1,y)} and f ( x , y − 1 ) {\displaystyle f(x,y-1)} , assuming that the patch offsets are likely to be the same. That is, the algorithm will take new value for f ( x , y ) {\displaystyle f(x,y)} to be arg ⁡ min ( x , y ) D ( f ( x , y ) ) , D ( f ( x − 1 , y ) ) , D ( f ( x , y − 1 ) ) {\displaystyle \arg \min \limits _{(x,y)}{D(f(x,y)),D(f(x-1,y)),D(f(x,y-1))}} . So if f ( x , y ) {\displaystyle f(x,y)} has a correct mapping and is in a coherent region R {\displaystyle R} , then all of R {\displaystyle R} below and to the right of f ( x , y ) {\displaystyle f(x,y)} will be filled with the correct mapping. Alternatively, on even iterations, the algorithm search for different direction, fill the new value to be arg ⁡ min ( x , y ) { D ( f ( x , y ) ) , D ( f ( x + 1 , y ) ) , D ( f ( x , y + 1 ) ) } {\displaystyle \arg \min \limits _{(x,y)}\{D(f(x,y)),D(f(x+1,y)),D(f(x,y+1))\}} . === Random search === Let v 0 = f ( x , y ) {\displaystyle v_{0}=f(x,y)} , we attempt to improve f ( x , y ) {\displaystyle f(x,y)} by testing a sequence of candidate offsets at an exponentially decreasing distance from v 0 {\displaystyle v_{0}} u i = v 0 + w α i R i {\displaystyle u_{i}=v_{0}+w\alpha ^{i}R_{i}} where R i {\displaystyle R_{i}} is a uniform random in [ − 1 , 1 ] × [ − 1 , 1 ] {\displaystyle [-1,1]\times [-1,1]} , w {\displaystyle w} is a large window search radius which will be set to maximum picture size, and α {\displaystyle \alpha } is a fixed ratio often assigned as 1/2. This part of the algorithm allows the f ( x , y ) {\displaystyle f(x,y)} to jump out of local minimum through random process. === Halting criterion === The often used halting criterion is set the iteration times to be about 4~5. Even with low iteration, the algorithm works well.

    Read more →
  • Machine learning in video games

    Machine learning in video games

    Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control, procedural content generation (PCG) and deep learning-based content generation. Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems. Information on machine learning techniques in the field of games is mostly known to public through research projects as most gaming companies choose not to publish specific information about their intellectual property. The most publicly known application of machine learning in games is likely the use of deep learning agents that compete with professional human players in complex strategy games. There has been a significant application of machine learning on games such as Atari/ALE, Doom, Minecraft, StarCraft, and car racing. Other games that did not originally exists as video games, such as chess and Go have also been affected by the machine learning. == Overview of relevant machine learning techniques == === Deep learning === Deep learning is a subset of machine learning which focuses heavily on the use of artificial neural networks (ANN) that learn to solve complex tasks. Deep learning uses multiple layers of ANN and other techniques to progressively extract information from an input. Due to this complex layered approach, deep learning models often require powerful machines to train and run on. ==== Convolutional neural networks ==== Convolutional neural networks (CNN) are specialized ANNs that are often used to analyze image data. These types of networks are able to learn translation invariant patterns, which are patterns that are not dependent on location. CNNs are able to learn these patterns in a hierarchy, meaning that earlier convolutional layers will learn smaller local patterns while later layers will learn larger patterns based on the previous patterns. A CNN's ability to learn visual data has made it a commonly used tool for deep learning in games. === Recurrent neural network === Recurrent neural networks are a type of ANN that are designed to process sequences of data in order, one part at a time rather than all at once. An RNN runs over each part of a sequence, using the current part of the sequence along with memory of previous parts of the current sequence to produce an output. These types of ANN are highly effective at tasks such as speech recognition and other problems that depend heavily on temporal order. There are several types of RNNs with different internal configurations; the basic implementation suffers from a lack of long term memory due to the vanishing gradient problem, thus it is rarely used over newer implementations. ==== Long short-term memory ==== A long short-term memory (LSTM) network is a specific implementation of a RNN that is designed to deal with the vanishing gradient problem seen in simple RNNs, which would lead to them gradually "forgetting" about previous parts of an inputted sequence when calculating the output of a current part. LSTMs solve this problem with the addition of an elaborate system that uses an additional input/output to keep track of long term data. LSTMs have achieved very strong results across various fields, and were used by several monumental deep learning agents in games. === Reinforcement learning === Reinforcement learning is the process of training an agent using rewards and/or punishments. The way an agent is rewarded or punished depends heavily on the problem; such as giving an agent a positive reward for winning a game or a negative one for losing. Reinforcement learning is used heavily in the field of machine learning and can be seen in methods such as Q-learning, policy search, Deep Q-networks and others. It has seen strong performance in both the field of games and robotics. === Neuroevolution === Neuroevolution involves the use of both neural networks and evolutionary algorithms. Instead of using gradient descent like most neural networks, neuroevolution models make use of evolutionary algorithms to update neurons in the network. Researchers claim that this process is less likely to get stuck in a local minimum and is potentially faster than state of the art deep learning techniques. == Deep learning agents == Machine learning agents have been used to take the place of a human player rather than function as NPCs, which are deliberately added into video games as part of designed gameplay. Deep learning agents have achieved impressive results when used in competition with both humans and other artificial intelligence agents. === Chess === Chess is a turn-based strategy game that is considered a difficult AI problem due to the computational complexity of its board space. Similar strategy games are often solved with some form of a Minimax Tree Search. These types of AI agents have been known to beat professional human players, such as the historic 1997 Deep Blue versus Garry Kasparov match. Since then, machine learning agents have shown ever greater success than previous AI agents. === Go === Go is another turn-based strategy game which is considered an even more difficult AI problem than chess. The state space of is Go is around 10^170 possible board states compared to the 10^120 board states for Chess. Prior to recent deep learning models, AI Go agents were only able to play at the level of a human amateur. ==== AlphaGo ==== Google's 2015 AlphaGo was the first AI agent to beat a professional Go player. AlphaGo used a deep learning model to train the weights of a Monte Carlo tree search (MCTS). The deep learning model consisted of 2 ANN, a policy network to predict the probabilities of potential moves by opponents, and a value network to predict the win chance of a given state. The deep learning model allows the agent to explore potential game states more efficiently than a vanilla MCTS. The network were initially trained on games of humans players and then were further trained by games against itself. ==== AlphaGo Zero ==== AlphaGo Zero, another implementation of AlphaGo, was able to train entirely by playing against itself. It was able to quickly train up to the capabilities of the previous agent. === StarCraft series === StarCraft and its sequel StarCraft II are real-time strategy (RTS) video games that have become popular environments for AI research. Blizzard and DeepMind have worked together to release a public StarCraft 2 environment for AI research to be done on. Various deep learning methods have been tested on both games, though most agents usually have trouble outperforming the default AI with cheats enabled or skilled players of the game. ==== Alphastar ==== Alphastar was the first AI agent to beat professional StarCraft 2 players without any in-game advantages. The deep learning network of the agent initially received input from a simplified zoomed out version of the gamestate, but was later updated to play using a camera like other human players. The developers have not publicly released the code or architecture of their model, but have listed several state of the art machine learning techniques such as relational deep reinforcement learning, long short-term memory, auto-regressive policy heads, pointer networks, and centralized value baseline. Alphastar was initially trained with supervised learning, it watched replays of many human games in order to learn basic strategies. It then trained against different versions of itself and was improved through reinforcement learning. The final version was hugely successful, but only trained to play on a specific map in a protoss mirror matchup. === Dota 2 === Dota 2 is a multiplayer online battle arena (MOBA) game. Like other complex games, traditional AI agents have not been able to compete on the same level as professional human player. The only widely published information on AI agents attempted on Dota 2 is OpenAI's deep learning Five agent. ==== OpenAI Five ==== OpenAI Five utilized separate long short-term memory networks to learn each hero. It trained using a reinforcement learning technique known as Proximal Policy Learning running on a system containing 256 GPUs and 128,000 CPU cores. Five trained for months, accumulating 180 years of game experience each day, before facing off with professional players. It was eventually able to beat the 2018 Dota 2 esports champion team in a 2019 series of games. === Planetary Annihilation === Planetary Annihilation is a real-time strategy game which focuses on massive scale war. The developers use ANNs in their default AI agent. === Supreme Commander 2 === Supreme Commander 2 is a real-time strategy (RTS) video game. The game uses Multilayer Perceptrons (MLPs) to control a platoon’s reaction to encountered enemy units. Total of four MLPs are used, one for each platoon type: land, naval

    Read more →
  • Latent semantic analysis

    Latent semantic analysis

    Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents. An information retrieval technique using latent semantic structure was patented in 1988 by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter. In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI). == Overview == === Occurrence matrix === LSA can use a document-term matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to terms and whose columns correspond to documents. A typical example of the weighting of the elements of the matrix is tf-idf (term frequency–inverse document frequency): the weight of an element of the matrix is proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance. This matrix is also common to standard semantic models, though it is not necessarily explicitly expressed as a matrix, since the mathematical properties of matrices are not always used. === Rank lowering === After the construction of the occurrence matrix, LSA finds a low-rank approximation to the term-document matrix. There could be various reasons for these approximations: The original term-document matrix is presumed too large for the computing resources; in this case, the approximated low rank matrix is interpreted as an approximation (a "least and necessary evil"). The original term-document matrix is presumed noisy: for example, anecdotal instances of terms are to be eliminated. From this point of view, the approximated matrix is interpreted as a de-noisified matrix (a better matrix than the original). The original term-document matrix is presumed overly sparse relative to the "true" term-document matrix. That is, the original matrix lists only the words actually in each document, whereas we might be interested in all words related to each document—generally a much larger set due to synonymy. The consequence of the rank lowering is that some dimensions are combined and depend on more than one term: {(car), (truck), (flower)} → {(1.3452 car + 0.2828 truck), (flower)} This mitigates the problem of identifying synonymy, as the rank lowering is expected to merge the dimensions associated with terms that have similar meanings. It also partially mitigates the problem with polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share a similar meaning. Conversely, components that point in other directions tend to either simply cancel out, or, at worst, to be smaller than components in the directions corresponding to the intended sense. === Derivation === Let X {\displaystyle X} be a matrix where element ( i , j ) {\displaystyle (i,j)} describes the occurrence of term i {\displaystyle i} in document j {\displaystyle j} (this can be, for example, the frequency). X {\displaystyle X} will look like this: d j ↓ t i T → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] {\displaystyle {\begin{matrix}&{\textbf {d}}_{j}\\&\downarrow \\{\textbf {t}}_{i}^{T}\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}\end{matrix}}} Now a row in this matrix will be a vector corresponding to a term, giving its relation to each document: t i T = [ x i , 1 … x i , j … x i , n ] {\displaystyle {\textbf {t}}_{i}^{T}={\begin{bmatrix}x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\end{bmatrix}}} Likewise, a column in this matrix will be a vector corresponding to a document, giving its relation to each term: d j = [ x 1 , j ⋮ x i , j ⋮ x m , j ] {\displaystyle {\textbf {d}}_{j}={\begin{bmatrix}x_{1,j}\\\vdots \\x_{i,j}\\\vdots \\x_{m,j}\\\end{bmatrix}}} Now the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} between two term vectors gives the correlation between the terms over the set of documents. The matrix product X X T {\displaystyle XX^{T}} contains all these dot products. Element ( i , p ) {\displaystyle (i,p)} (which is equal to element ( p , i ) {\displaystyle (p,i)} ) contains the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} ( = t p T t i {\displaystyle ={\textbf {t}}_{p}^{T}{\textbf {t}}_{i}} ). Likewise, the matrix X T X {\displaystyle X^{T}X} contains the dot products between all the document vectors, giving their correlation over the terms: d j T d q = d q T d j {\displaystyle {\textbf {d}}_{j}^{T}{\textbf {d}}_{q}={\textbf {d}}_{q}^{T}{\textbf {d}}_{j}} . Now, from the theory of linear algebra, there exists a decomposition of X {\displaystyle X} such that U {\displaystyle U} and V {\displaystyle V} are orthogonal matrices and Σ {\displaystyle \Sigma } is a diagonal matrix. This is called a singular value decomposition (SVD): X = U Σ V T {\displaystyle {\begin{matrix}X=U\Sigma V^{T}\end{matrix}}} The matrix products giving us the term and document correlations then become X X T = ( U Σ V T ) ( U Σ V T ) T = ( U Σ V T ) ( V T T Σ T U T ) = U Σ V T V Σ T U T = U Σ Σ T U T X T X = ( U Σ V T ) T ( U Σ V T ) = ( V T T Σ T U T ) ( U Σ V T ) = V Σ T U T U Σ V T = V Σ T Σ V T {\displaystyle {\begin{matrix}XX^{T}&=&(U\Sigma V^{T})(U\Sigma V^{T})^{T}=(U\Sigma V^{T})(V^{T^{T}}\Sigma ^{T}U^{T})=U\Sigma V^{T}V\Sigma ^{T}U^{T}=U\Sigma \Sigma ^{T}U^{T}\\X^{T}X&=&(U\Sigma V^{T})^{T}(U\Sigma V^{T})=(V^{T^{T}}\Sigma ^{T}U^{T})(U\Sigma V^{T})=V\Sigma ^{T}U^{T}U\Sigma V^{T}=V\Sigma ^{T}\Sigma V^{T}\end{matrix}}} Since Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} and Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } are diagonal we see that U {\displaystyle U} must contain the eigenvectors of X X T {\displaystyle XX^{T}} , while V {\displaystyle V} must be the eigenvectors of X T X {\displaystyle X^{T}X} . Both products have the same non-zero eigenvalues, given by the non-zero entries of Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} , or equally, by the non-zero entries of Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } . Now the decomposition looks like this: X U Σ V T ( d j ) ( d ^ j ) ↓ ↓ ( t i T ) → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] = ( t ^ i T ) → [ [ u 1 ] … [ u l ] ] ⋅ [ σ 1 … 0 ⋮ ⋱ ⋮ 0 … σ l ] ⋅ [ [ v 1 ] ⋮ [ v l ] ] {\displaystyle {\begin{matrix}&X&&&U&&\Sigma &&V^{T}\\&({\textbf {d}}_{j})&&&&&&&({\hat {\textbf {d}}}_{j})\\&\downarrow &&&&&&&\downarrow \\({\textbf {t}}_{i}^{T})\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}&=&({\hat {\textbf {t}}}_{i}^{T})\rightarrow &{\begin{bmatrix}{\begin{bmatrix}\,\\\,\\{\textbf {u}}_{1}\\\,\\\,\end{bmatrix}}\dots {\begin{bmatrix}\,\\\,\\{\textbf {u}}_{l}\\\,\\\,\end{bmatrix}}\end{bmatrix}}&\cdot &{\begin{bmatrix}\sigma _{1}&\dots &0\\\vdots &\ddots &\vdots \\0&\dots &\sigma _{l}\\\end{bmatrix}}&\cdot &{\begin{bmatrix}{\begin{bmatrix}&&{\textbf {v}}_{1}&&\end{bmatrix}}\\\vdots \\{\begin{bmatrix}&&{\textbf {v}}_{l}&&\end{bmatrix}}\end{bmatrix}}\end{matrix}}} The values σ 1 , … , σ l {\displaystyle \sigma _{1},\dots ,\sigma _{l}} are called the singular values, and u 1 , … , u l {\displaystyle u_{1},\dots ,u_{l}} and v 1 , … , v l {\displaystyle v_{1},\dots ,v_{l}} the left and right singular vectors. Notice the only part of U {\displaystyle U} that contributes to t i {\displaystyle {\textbf {t}}_{i}} is the i 'th {\displaystyle i{\textrm {'th}}} row. Let this row vector be called t ^ i T {\displaystyle {\hat {\textrm {t}}}_{i}^{T}} . Likewise, the only part of V T {\displaystyle V^{T}} that contributes to d j {\displaystyle {\textbf {d}}_{j}} is the j 'th {\displaystyle j{\textrm {'th}}} column, d ^ j {\displaystyle {\hat {\textrm {d}}}_{j}} . These are not the eigenvectors, but depend on all the eigenvectors. I

    Read more →
  • Ontology learning

    Ontology learning

    Ontology learning (ontology extraction, ontology augmentation generation, ontology generation, or ontology acquisition) is the automatic or semi-automatic creation of ontologies, including extracting the corresponding domain's terms and the relationships between the concepts that these terms represent from a corpus of natural language text, and encoding them with an ontology language for easy retrieval. As building ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process. Typically, the process starts by extracting terms and concepts or noun phrases from plain text using linguistic processors such as part-of-speech tagging and phrase chunking. Then statistical or symbolic techniques are used to extract relation signatures, often based on pattern-based or definition-based hypernym extraction techniques. == Procedure == Ontology learning (OL) is used to (semi-)automatically extract whole ontologies from natural language text. The process is usually split into the following eight tasks, which are not all necessarily applied in every ontology learning system. === Domain terminology extraction === During the domain terminology extraction step, domain-specific terms are extracted, which are used in the following step (concept discovery) to derive concepts. Relevant terms can be determined, e.g., by calculation of the TF/IDF values or by application of the C-value / NC-value method. The resulting list of terms has to be filtered by a domain expert. In the subsequent step, similarly to coreference resolution in information extraction, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept. The most common methods therefore are clustering and the application of statistical similarity measures. === Concept discovery === In the concept discovery step, terms are grouped to meaning bearing units, which correspond to an abstraction of the world and therefore to concepts. The grouped terms are these domain-specific terms and their synonyms, which were identified in the domain terminology extraction step. === Concept hierarchy derivation === In the concept hierarchy derivation step, the OL system tries to arrange the extracted concepts in a taxonomic structure. This is mostly achieved with unsupervised hierarchical clustering methods. Because the result of such methods is often noisy, a supervision step, e.g., user evaluation, is added. A further method for the derivation of a concept hierarchy exists in the usage of several patterns that should indicate a sub- or supersumption relationship. Patterns like “X, that is a Y” or “X is a Y” indicate that X is a subclass of Y. Such pattern can be analyzed efficiently, but they often occur too infrequently to extract enough sub- or supersumption relationships. Instead, bootstrapping methods are developed, which learn these patterns automatically and therefore ensure broader coverage. === Learning of non-taxonomic relations === In the learning of non-taxonomic relations step, relationships are extracted that do not express any sub- or supersumption. Such relationships are, e.g., works-for or located-in. There are two common approaches to solve this subtask. The first is based upon the extraction of anonymous associations, which are named appropriately in a second step. The second approach extracts verbs, which indicate a relationship between entities, represented by the surrounding words. The result of both approaches need to be evaluated by an ontologist to ensure accuracy. === Rule discovery === During rule discovery, axioms (formal description of concepts) are generated for the extracted concepts. This can be achieved, e.g., by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree. The result of this process is a list of axioms, which, afterwards, is comprehended to a concept description. This output is then evaluated by an ontologist. === Ontology population === At this step, the ontology is augmented with instances of concepts and properties. For the augmentation with instances of concepts, methods based on the matching of lexico-syntactic patterns are used. Instances of properties are added through the application of bootstrapping methods, which collect relation tuples. === Concept hierarchy extension === In this step, the OL system tries to extend the taxonomic structure of an existing ontology with further concepts. This can be performed in a supervised manner with a trained classifier or in an unsupervised manner via the application of similarity measures. === Frame and Event detection === During frame/event detection, the OL system tries to extract complex relationships from text, e.g., who departed from where to what place and when. Approaches range from applying SVM with kernel methods to semantic role labeling (SRL) to deep semantic parsing techniques. == Tools == Dog4Dag (Dresden Ontology Generator for Directed Acyclic Graphs) is an ontology generation plugin for Protégé 4.1 and OBOEdit 2.1. It allows for term generation, sibling generation, definition generation, and relationship induction. Integrated into Protégé 4.1 and OBO-Edit 2.1, DOG4DAG allows ontology extension for all common ontology formats (e.g., OWL and OBO). Limited largely to EBI and Bio Portal lookup service extensions.

    Read more →