AI App Inventor

AI App Inventor — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Quantum machine learning

    Quantum machine learning

    Quantum machine learning (QML) is the study of quantum algorithms for machine learning. It often refers to quantum algorithms for machine learning tasks which analyze classical data, sometimes called quantum-enhanced machine learning. QML algorithms use qubits and quantum operations to try to improve the space and time complexity of classical machine learning algorithms. Hybrid QML methods involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device. These routines can be more complex in nature and executed faster on a quantum computer. Furthermore, quantum algorithms can be used to analyze quantum states instead of classical data. The term "quantum machine learning" is sometimes used to refer classical machine learning methods applied to data generated from quantum experiments (i.e. machine learning of quantum systems), such as learning the phase transitions of a quantum system or creating new quantum experiments. QML also extends to a branch of research that explores methodological and structural similarities between certain physical systems and learning systems, in particular neural networks. For example, some mathematical and numerical techniques from quantum physics are applicable to classical deep learning and vice versa. Furthermore, researchers investigate more abstract notions of learning theory with respect to quantum information, sometimes referred to as "quantum learning theory". == Machine learning with quantum computers == Quantum-enhanced machine learning refers to quantum algorithms that solve tasks in machine learning, thereby improving and often expediting classical machine learning techniques. Such algorithms typically require one to encode the given classical data set into a quantum computer to make it accessible for quantum information processing. Subsequently, quantum information processing routines are applied and the result of the quantum computation is read out by measuring the quantum system. For example, the outcome of the measurement of a qubit reveals the result of a binary classification task. While many proposals of QML algorithms are still purely theoretical and require a full-scale universal quantum computer to be tested, others have been implemented on small-scale or special purpose quantum devices. === Quantum associative memories and quantum pattern recognition === Early work on quantum associative memories has been done by Dan Ventura and Tony Martinez and by Carlo A. Trugenberger in the late 1990s and early 2000s. Associative (or content-addressable) memories are able to recognize stored content on the basis of a similarity measure, while random access memories are accessed by the address of stored information and not its content. As such they must be able to retrieve both incomplete and corrupted patterns, the essential machine learning task of pattern recognition. Typical classical associative memories store p patterns in the O ( n 2 ) {\displaystyle O(n^{2})} interactions (synapses) of a real, symmetric energy matrix over a network of n artificial neurons. The encoding is such that the desired patterns are local minima of the energy functional and retrieval is done by minimizing the total energy, starting from an initial configuration. Unfortunately, classical associative memories are severely limited by the phenomenon of cross-talk. When too many patterns are stored, spurious memories appear which quickly proliferate, so that the energy landscape becomes disordered and no retrieval is anymore possible. The number of storable patterns is typically limited by a linear function of the number of neurons, p ≤ O ( n ) {\displaystyle p\leq O(n)} . Quantum associative memories (in their simplest realization) store patterns in a unitary matrix U acting on the Hilbert space of n qubits. Retrieval is realized by the unitary evolution of a fixed initial state to a quantum superposition of the desired patterns with probability distribution peaked on the most similar pattern to an input. By its very quantum nature, the retrieval process is thus probabilistic. Because quantum associative memories are free from cross-talk, however, spurious memories are never generated. Correspondingly, they have a superior capacity than classical ones. The number of parameters in the unitary matrix U is O ( p n ) {\displaystyle O(pn)} . One can thus have efficient, spurious-memory-free quantum associative memories for any polynomial number of patterns. If the matrix U is encoded as a unique operator (as opposed as to a sequence of gates as in the circuit model), e.g. by an optical interferometer, the retrieval becomes efficient even for an exponential number of patterns. === Linear algebra simulation with quantum amplitudes === A number of quantum algorithms for machine learning are based on the idea of amplitude encoding, that is, to associate the amplitudes of a quantum state with the inputs and outputs of computations. Since a state of n {\displaystyle n} qubits is described by 2 n {\displaystyle 2^{n}} complex amplitudes, this information encoding can allow for an exponentially compact representation. Intuitively, this corresponds to associating a discrete probability distribution over binary random variables with a classical vector. The goal of algorithms based on amplitude encoding is to formulate quantum algorithms whose resources grow polynomially in the number of qubits n {\displaystyle n} , which amounts to a logarithmic time complexity in the number of amplitudes and thereby the dimension of the input. Many QML algorithms in this category are based on variations of the quantum algorithm for linear systems of equations (colloquially called HHL, after the paper's authors) which, under specific conditions, performs a matrix inversion using an amount of physical resources growing only logarithmically in the dimensions of the matrix. One of these conditions is that a Hamiltonian which entry-wise corresponds to the matrix can be simulated efficiently, which is known to be possible if the matrix is sparse or low rank. For reference, any known classical algorithm for matrix inversion requires a number of operations that grows more than quadratically in the dimension of the matrix (e.g. O ( n 2.373 ) {\displaystyle O{\mathord {\left(n^{2.373}\right)}}} ), but they are not restricted to sparse matrices. Quantum matrix inversion can be applied to machine learning methods in which the training reduces to solving a linear system of equations, for example in least-squares linear regression, the least-squares version of support vector machines, and Gaussian processes. A crucial bottleneck of methods that simulate linear algebra computations with the amplitudes of quantum states is state preparation, which often requires one to initialise a quantum system in a state whose amplitudes reflect the features of the entire dataset. Although efficient methods for state preparation are known for specific cases, this step easily hides the complexity of the task. === Variational quantum algorithms (VQAs) === In a variational quantum algorithm, a classical computer optimizes the parameters used to prepare a quantum state, while a quantum computer is used to do the actual state preparation and measurement. VQAs are considered promising candidates for noisy intermediate-scale quantum computers. Variational quantum circuits (or parameterized quantum circuits) are a popular class of VQAs where the parameters are those used in a fixed quantum circuit. Researchers have studied VQCs to solve optimization problems and find the ground state energy of complex quantum systems, which were difficult to solve using a classical computer. === Quantum binary classifier === Pattern reorganization is one of the important tasks of machine learning, binary classification is one of the tools or algorithms to find patterns. Binary classification is used in supervised learning and in unsupervised learning. In QML, classical bits are converted to qubits and they are mapped to Hilbert space; complex value data are used in a quantum binary classifier to use the advantage of Hilbert space. By exploiting the quantum mechanic properties such as superposition, entanglement, interference the quantum binary classifier produces the accurate result in short period of time. === Quantum machine learning algorithms based on Grover search === Another approach to improving classical machine learning with quantum information processing uses amplitude amplification methods based on Grover's search algorithm, which has been shown to solve unstructured search problems with a quadratic speedup compared to classical algorithms. These quantum routines can be employed for learning algorithms that translate into an unstructured search task, as can be done, for instance, in the case of the k-medians and the k-nearest neighbors algorithms. Other applications include quadratic speedups in the training of perceptrons. An e

    Read more →
  • Plinian Core

    Plinian Core

    Plinian Core is a set of vocabulary terms that can be used to describe different aspects of biological species information. Under "biological species Information" all kinds of properties or traits related to taxa—biological and non-biological—are included. Thus, for instance, terms pertaining descriptions, legal aspects, conservation, management, demographics, nomenclature, or related resources are incorporated. == Description == The Plinian Core is aimed to facilitate the exchange of information about the species and upper taxa. What is in scope? Species level catalogs of any kind of biological objects or data. Terminology associated with biological collection data. Striving for compatibility with other biodiversity-related standards. Facilitating the addition of components and attributes of biological data. What is not in scope? Data interchange protocols. Non-biodiversity-related data. Occurrence level data. This standard is named after Pliny the Elder, a very influential figure in the study of the biological species. Plinian Core design requirements includes: ease of use, to be self-contained, able to support data integration from multiple databases, and ability to handle different levels of granularity. Core terms can be grouped in its current version as follows: Metadata Base Elements Record Metadata Nomenclature and Classification Taxonomic description Natural history Invasive species Habitat and Distribution Demography and Threats Uses, Management and Conservation associatedParty, MeasurementOrFact, References, AncillaryData == Background == Plinian Core started as a collaborative project between Instituto Nacional de Biodiversidad and GBIF Spain in 2005. A series of iterations in which elements were defined and implanted in different projects resulted in a "Plinian Core Flat" [deprecated]. As a result, a new development was impulse to overcome them in 2012. New formal requirements, additional input and a will to better support the standard and its documentation, as well as to align it with the processes of TDWG, the world reference body for biodiversity information standards. A new version, Plinian Core v3.x.x was defined. This provides more flexibility to fully represent the information of a species in a variety of scenarios. New elements to deal with aspects such as IPR, related resources, referenced, etc. were introduced, and elements already included were better-defined and documented. Partner for the development of Plinian Core in this new phase incorporated the University of Granada (UG, Spain), the Alexander von Humboldt Institute (IAvH, Colombia), the National Commission for the Knowledge and Use of Biodiversity (Conabio, Mexico) and the University of São Paulo (USP, Brazil). A "Plinian Core Task Group" within TDWG "Interest Group on species Information" was constituted and currently working on its development. == Levels of the standard == Plinian Core is presented in to levels: the abstract model and the application profiles. The abstract model (AM), comprising the abstract model schema(xsd) and the terms' URIs, is the normative part. It is all comprehensive, and allows for different levels of granularity in describing species properties. The AM should be taken as a "menu" from which to choose terms and level of detail needed in any specific project. The subsets of the abstract model intended to be implemented in specific projects are the "application profiles" (APs). Besides containing part of the elements of the AM, APs can impose additional specifications on the included elements, such as controlled vocabularies. Some examples of APs in use follow: Application profile CONABIO Application profile INBIO Application profile GBIF.ES Application profile Banco de Datos de la Naturaleza.Spain Application profile SIB-COLOMBIA == Relation to other standards == Plinian incorporates a number of elements already defined by other standards. The following table summarizes these standards and the elements used in Plinian Core:

    Read more →
  • Historical Thesaurus of English

    Historical Thesaurus of English

    The Historical Thesaurus of English (HTE) is the largest thesaurus in the world. It is called a historical thesaurus as it arranges the whole vocabulary of English, from the earliest written records in Old English to the present, according to the first documented occurrence of a word in the entire history of the English language. The HTE was conceived and begun in 1965 by the English Language & Linguistics department of the University of Glasgow, who have ever since continued to compile the thesaurus. From the 1980s onwards the project was moved from paper-based records to a computer database. Today, the HTE is available to the public online, but a print version, the Historical Thesaurus of the Oxford English Dictionary (HTOED), was published in 2009. == Main project: The Historical Thesaurus of English (HTE) == The Historical Thesaurus of English (HTE) is a complete database of all the words in the Oxford English Dictionary and other dictionaries (including Old English), arranged by semantic field and date. In this way, the HTE arranges the whole vocabulary of English, from the earliest written records in Old English to the present, alongside dates of use. It is the first historical thesaurus to be compiled for any of the world's languages and contains 800,000 meanings for 600,000 words, within 230,000 categories. As the HTE website states, "in addition to providing hitherto unavailable information for linguistic and textual scholars, the Historical Thesaurus online is a rich resource for students of social and cultural history, showing how concepts developed through the words that refer to them." === Structure === The work is divided into three main sections: the External World, the Mind, and Society. These are broken down into successively narrower domains. The text eventually discriminates more than 236,000 categories. The second order categories are: === History === The ambitious project was announced at a 1965 meeting of the Philological Society by its originator, Michael Samuels. Work on the HTE started in the same year. In 2017, the University of Glasgow was awarded the Queen's Anniversary Prize for Higher Education for the HTE. A second edition of the online HTE is currently in progress and is expected to be launched in late 2020. Work is released on the freely-available HTE website when available. == Print edition: Historical Thesaurus of the Oxford English Dictionary (HTOED) == On 22 October 2009, after 44 years of work, version 1.0 of the HTE was published by Oxford University Press in a two-volume slipcased set as the Historical Thesaurus of the Oxford English Dictionary (HTOED). The two hardcover volumes together total nearly 4,500 pages.

    Read more →
  • AI-assisted virtualization software

    AI-assisted virtualization software

    AI-assisted virtualization software is a type of technology that combines the principles of virtualization with advanced artificial intelligence (AI) algorithms. This software is designed to improve efficiency and management of virtual environments and resources. This technology has been used in cloud computing and for various industries. == History == Virtualization originated in mainframe computers in the 1960s in order to divide system resources between different applications. The term has since broadened. The use of AI in virtualization significantly increased in the early 2020s. == Uses == AI-assisted virtualization software uses AI-related technology such as machine learning, deep learning, and neural networks to attempt to make more accurate predictions and decisions regarding the management of virtual environments. Features include intelligent automation, predictive analytics, and dynamic resource allocation. Intelligent Automation: Automating tasks such as resource provisioning and routine maintenance. The AI learns from ongoing operations and can predict and perform necessary tasks autonomously. Predictive Analytics: Utilizing AI to analyze data patterns and trends, predicting future issues or resource requirements. It aids in proactive management and mitigation of potential problems. Dynamic Resource Allocation: Through the analysis of real-time and historical data, the AI system dynamically assigns resources based on demand and need, optimizing overall system performance and reducing wastage. AI-assisted virtualization software has been used in cloud computing to optimize the use of resources and reduce costs. In healthcare, these technologies have been used to create virtual patient profiles. They are also used in data centers to improve performance and energy efficiency. It has also been used in network function virtualization (NFV) to improve virtual network infrastructure. Implementing this type of software requires a high degree of technological sophistication and can incur significant costs. There are also concerns about the risks associated with AI, such as algorithmic bias and security vulnerabilities. Additionally, there are issues related to governance, the ethics of artificial intelligence, and regulations of AI technologies.

    Read more →
  • Schema-agnostic databases

    Schema-agnostic databases

    Schema-agnostic databases or vocabulary-independent databases aim at supporting users to be abstracted from the representation of the data, supporting the automatic semantic matching between queries and databases. Schema-agnosticism is the property of a database of mapping a query issued with the user terminology and structure, automatically mapping it to the dataset vocabulary. The increase in the size and in the semantic heterogeneity of database schemas bring new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. == Description == The evolution of data environments towards the consumption of data from multiple data sources and the growth in the schema size, complexity, dynamicity and decentralisation (SCoDD) of schemas increases the complexity of contemporary data management. The SCoDD trend emerges as a central data management concern in Big Data scenarios, where users and applications have a demand for more complete data, produced by independent data sources, under different semantic assumptions and contexts of use, which is the typical scenario for Semantic Web Data applications. The evolution of databases in the direction of heterogeneous data environments strongly impacts the usability, semiotics and semantic assumptions behind existing data accessibility methods such as structured queries, keyword-based search and visual query systems. With schema-less databases containing potentially millions of dynamically changing attributes, it becomes unfeasible for some users to become aware of the 'schema' or vocabulary in order to query the database. At this scale, the effort in understanding the schema in order to build a structured query can become prohibitive. == Schema-agnostic queries == Schema-agnostic queries can be defined as query approaches over structured databases which allow users satisfying complex information needs without the understanding of the representation (schema) of the database. Similarly, Tran et al. defines it as "search approaches, which do not require users to know the schema underlying the data". Approaches such as keyword-based search over databases allow users to query databases without employing structured queries. However, as discussed by Tran et al.: "From these points, users however have to do further navigation and exploration to address complex information needs. Unlike keyword search used on the Web, which focuses on simple needs, the keyword search elaborated here is used to obtain more complex results. Instead of a single set of resources, the goal is to compute complex sets of resources and their relations." The development of approaches to support natural language interfaces (NLI) over databases have aimed towards the goal of schema-agnostic queries. Complementarily, some approaches based on keyword search have targeted keyword-based queries which express more complex information needs. Other approaches have explored the construction of structured queries over databases where schema constraints can be relaxed. All these approaches (natural language, keyword-based search and structured queries) have targeted different degrees of sophistication in addressing the problem of supporting a flexible semantic matching between queries and data, which vary from the completely absence of the semantic concern to more principled semantic models. While the demand for schema-agnosticism has been an implicit requirement across semantic search and natural language query systems over structured data, it is not sufficiently individuated as a concept and as a necessary requirement for contemporary database management systems. Recent works have started to define and model the semantic aspects involved on schema-agnostic queries. === Schema-agnostic structured queries === Consist of schema-agnostic queries following the syntax of a structured standard (for example SQL, SPARQL). The syntax and semantics of operators are maintained, while different terminologies are used. ==== Example 1 ==== SELECT ?y { BillClinton hasDaughter ?x . ?x marriedTo ?y . } which maps to the following SPARQL query in the dataset vocabulary: ==== Example 2 ==== which maps to the following SPARQL query in the dataset vocabulary: === Schema-agnostic keyword queries === Consist of schema-agnostic queries using keyword queries. In this case the syntax and semantics of operators are different from the structured query syntax. ==== Example ==== "Bill Clinton daughter married to" "Books by William Goldman with more than 300 pages" == Semantic complexity == As of 2016 the concept of schema-agnostic queries has been developed primarily in academia. Most of schema-agnostic query systems have been investigated in the context of Natural Language Interfaces over databases or over the Semantic Web. These works explore the application of semantic parsing techniques over large, heterogeneous and schema-less databases. More recently, the individuation of the concept of schema-agnostic query systems and databases have appeared more explicitly within the literature. Freitas et al. provide a probabilistic model on the semantic complexity of mapping schema-agnostic queries.

    Read more →
  • Turing test

    Turing test

    The Turing test, originally called the imitation game by Alan Turing in 1949, is a test of a machine's ability to exhibit intelligent behaviour equivalent to that of a human. In the test, a human evaluator judges a text transcript of a natural-language conversation between a human and a machine. The evaluator tries to identify the machine, and the machine passes if the evaluator cannot reliably tell them apart. The results would not depend on the machine's ability to answer questions correctly, only on how closely its answers resembled those of a human. Since the Turing test is a test of indistinguishability in performance capacity, the verbal version generalizes naturally to all of human performance capacity, verbal as well as nonverbal (robotic). The test was introduced by Turing in his 1950 paper "Computing Machinery and Intelligence" while working at the University of Manchester. It opens with the words: "I propose to consider the question, 'Can machines think?'." Because "thinking" is difficult to define, Turing chooses to "replace the question by another, which is closely related to it and is expressed in relatively unambiguous words". Turing describes the new form of the problem in terms of a three-person party game called the "imitation game", in which an interrogator asks questions of a man and a woman in another room in order to determine the correct sex of the two players. Turing's new question is: "Are there imaginable digital computers which would do well in the imitation game?" This question, Turing believed, was one that could actually be answered. In the remainder of the paper, he argued against the major objections to the proposition that "machines can think". Since Turing introduced his test, it has been highly influential in the philosophy of artificial intelligence, resulting in substantial discussion and controversy, as well as criticism from philosophers like John Searle, who argue against the test's ability to detect consciousness. == History == === Philosophical background === The question of whether it is possible for machines to think has a long history, which is firmly entrenched in the distinction between dualist and materialist views of the mind. René Descartes prefigures aspects of the Turing test in his 1637 Discourse on the Method when he writes: [H]ow many different automata or moving machines could be made by the industry of man ... For we can easily understand a machine's being constituted so that it can utter words, and even emit some responses to action on it of a corporeal kind, which brings about a change in its organs; for instance, if touched in a particular part it may ask what we wish to say to it; if in another part it may exclaim that it is being hurt, and so on. But it never happens that it arranges its speech in various ways, in order to reply appropriately to everything that may be said in its presence, as even the lowest type of man can do. Here Descartes notes that automata are capable of responding to human interactions but argues that such automata cannot respond appropriately to things said in their presence in the way that any human can. Descartes therefore prefigures the Turing test by defining the insufficiency of appropriate linguistic response as that which separates the human from the automaton. Descartes fails to consider the possibility that future automata might be able to overcome such insufficiency, and so does not propose the Turing test as such, even if he prefigures its conceptual framework and criterion. Denis Diderot formulates in his 1746 book Pensées philosophiques a Turing-test criterion, though with the important implicit limiting assumption maintained, of the participants being natural living beings, rather than considering created artifacts: If they find a parrot who could answer to everything, I would claim it to be an intelligent being without hesitation. This does not mean he agrees with this, but that it was already a common argument of materialists at that time. According to dualism, the mind is non-physical (or, at the very least, has non-physical properties) and, therefore, cannot be explained in purely physical terms. According to materialism, the mind can be explained physically, which leaves open the possibility of minds that are produced artificially. In 1936, philosopher Alfred Ayer considered the standard philosophical question of other minds: how do we know that other people have the same conscious experiences that we do? In his book, Language, Truth and Logic, Ayer suggested a protocol to distinguish between a conscious man and an unconscious machine: "The only ground I can have for asserting that an object which appears to be conscious is not really a conscious being, but only a dummy or a machine, is that it fails to satisfy one of the empirical tests by which the presence or absence of consciousness is determined". (This suggestion is very similar to the Turing test, but it is not certain that Ayer's popular philosophical classic was familiar to Turing.) In other words, a thing is not conscious if it fails the consciousness test. === Cultural background === A rudimentary idea of the Turing test appears in the 1726 novel Gulliver's Travels by Jonathan Swift. When Gulliver is brought before the king of Brobdingnag, the king thinks at first that Gulliver might be a "a piece of clock-work (which is in that country arrived to a very great perfection) contrived by some ingenious artist". Even when he hears Gulliver speaking, the king still doubts whether Gulliver was taught "a set of words" to make him "sell at a better price". Gulliver tells that only after "he put several other questions to me, and still received rational answers" the king became satisfied that Gulliver was not a machine. Tests where a human judges whether a computer or an alien is intelligent were an established convention in science fiction by the 1940s, and it is likely that Turing would have been aware of these. Stanley G. Weinbaum's "A Martian Odyssey" (1934) provides an example of how nuanced such tests could be. Earlier examples of machines or automatons attempting to pass as human include the Ancient Greek myth of Pygmalion who creates a sculpture of a woman that is animated by Aphrodite, Carlo Collodi's novel The Adventures of Pinocchio, about a puppet who wants to become a real boy, and E. T. A. Hoffmann's 1816 story "The Sandman," where the protagonist falls in love with an automaton. In all these examples, people are fooled by artificial beings that—up to a point—pass as human. === Alan Turing and the imitation game === Researchers in the United Kingdom had been exploring "machine intelligence" for up to ten years prior to the founding of the field of artificial intelligence (AI) research in 1956. It was a common topic among the members of the Ratio Club, an informal group of British cybernetics and electronics researchers that included Alan Turing. Turing, in particular, had been running the notion of machine intelligence since at least 1941 and one of the earliest-known mentions of "computer intelligence" was made by him in 1947. In Turing's report, "Intelligent Machinery," he investigated "the question of whether or not it is possible for machinery to show intelligent behaviour" and, as part of that investigation, proposed what may be considered the forerunner to his later tests: It is not difficult to devise a paper machine which will play a not very bad game of chess. Now get three men A, B and C as subjects for the experiment. A and C are to be rather poor chess players, B is the operator who works the paper machine. ... Two rooms are used with some arrangement for communicating moves, and a game is played between C and either A or the paper machine. C may find it quite difficult to tell which he is playing. "Computing Machinery and Intelligence" (1950) was the first published paper by Turing to focus exclusively on machine intelligence. Turing begins the 1950 paper with the claim, "I propose to consider the question 'Can machines think?'" As he highlights, the traditional approach to such a question is to start with definitions, defining both the terms "machine" and "think". Turing chooses not to do so; instead, he replaces the question with a new one, "which is closely related to it and is expressed in relatively unambiguous words". In essence he proposes to change the question from "Can machines think?" to "Can machines do what we (as thinking entities) can do?" The advantage of the new question, Turing argues, is that it draws "a fairly sharp line between the physical and intellectual capacities of a man". To demonstrate this approach Turing proposes a test inspired by a party game, known as the "imitation game", in which a man and a woman go into separate rooms and guests try to tell them apart by writing a series of questions and reading the typewritten answers sent back. In this game, both the man and the woman aim to convince the guests that they ar

    Read more →
  • Embodied cognition

    Embodied cognition

    Embodied cognition represents a diverse group of theories which investigate how cognition is shaped by the bodily state and capacities of the organism. These embodied factors include the motor system, the perceptual system, bodily interactions with the environment (situatedness), and the assumptions about the world that shape the functional structure of the brain and body of the organism. Embodied cognition suggests that these elements are essential to a wide spectrum of cognitive functions, such as perception biases, memory recall, comprehension and high-level mental constructs (such as meaning attribution and categories) and performance on various cognitive tasks (reasoning or judgment). The embodied mind thesis challenges other theories, such as cognitivism, computationalism, and Cartesian dualism. It is closely related to the extended mind thesis, situated cognition, and enactivism. The modern version depends on understandings drawn from up-to-date research in psychology, linguistics, cognitive science, dynamical systems, artificial intelligence, robotics, animal cognition, plant cognition, and neurobiology. == Theory == Proponents of the embodied cognition thesis emphasize the active and significant role the body plays in the shaping of cognition and in the understanding of an agent's mind and cognitive capacities. In philosophy, embodied cognition holds that an agent's cognition, rather than being the product of mere (innate) abstract representations of the world, is strongly influenced by aspects of an agent's body beyond the brain itself. An embodied model of cognition opposes the disembodied Cartesian model, according to which all mental phenomena are non-physical and, therefore, not influenced by the body. With this opposition the embodiment thesis intends to reintroduce an agent's bodily experiences into any account of cognition. It is a rather broad thesis and encompasses both weak and strong variants of embodiment. In an attempt to reconcile cognitive science with human experience, the enactive approach to cognition defines "embodiment" as follows: By using the term embodied we mean to highlight two points: first that cognition depends upon the kinds of experience that come from having a body with various sensorimotor capacities, and second, that these individual sensorimotor capacities are themselves embedded in a more encompassing biological, psychological and cultural context. This double sense attributed to the embodiment thesis emphasizes the many aspects of cognition that researchers in different fields—such as philosophy, cognitive science, artificial intelligence, psychology, and neuroscience—are involved with. This general characterization of embodiment faces some difficulties: a consequence of this emphasis on the body, experience, culture, context, and the cognitive mechanisms of an agent in the world is that often distinct views and approaches to embodied cognition overlap. The theses of extended cognition and situated cognition, for example, are usually intertwined and not always carefully separated. And since each of the aspects of the embodiment thesis is endorsed to different degrees, embodied cognition should be better seen "as a research program rather than a well-defined unified theory". Some authors explain the embodiment thesis by arguing that cognition depends on an agent's body and its interactions with a determined environment. From this perspective, cognition in real biological systems is not an end in itself; it is constrained by the system's goals and capacities. Such constraints do not mean cognition is set by adaptive behavior (or autopoiesis) alone, but instead that cognition requires "some kind of information processing... the transformation or communication of incoming information". The acquiring of such information involves the agent's "exploration and modification of the environment". It would be a mistake, however, to suppose that cognition consists simply of building maximally accurate representations of input information...the gaining of knowledge is a stepping stone to achieving the more immediate goal of guiding behavior in response to the system's changing surroundings. Another approach to understanding embodied cognition comes from a narrower characterization of the embodiment thesis. The following narrower view of embodiment avoids any compromises to external sources other than the body and allows differentiating between embodied cognition, extended cognition, and situated cognition. Thus, the embodiment thesis can be specified as follows: Many features of cognition are embodied in that they are deeply dependent upon characteristics of the physical body of an agent, such that the agent's beyond-the-brain body plays a significant causal role, or a physically constitutive role, in that agent's cognitive processing. This thesis points out the core idea that an agent's body plays a significant role in shaping different features of cognition, such as perception, attention, memory, reasoning—among others. Likewise, these features of cognition depend on the kind of body an agent has. The thesis omits direct mention of some aspects of the "more encompassing biological, psychological and cultural context" included in the enactive definition, making it possible to separate embodied cognition, extended cognition, and situated cognition. In contrast to the embodiment thesis, the extended mind thesis limits cognitive processing neither to the brain nor even to the body, it extends it outward into the agent's world. Situated cognition emphasizes that this extension is not just a matter of including resources outside the head but stressing the role of probing and changing interactions with the agent's world. Cognition is situated in that it is inherently dependent upon the cultural and social contexts within which it takes place. This conceptual reframing of cognition as an activity influenced by the body has had significant implications. For instance, the view of cognition inherited by most contemporary cognitive neuroscience is internalist in nature. An agent's behavior along with its capacity to maintain (accurate) representations of the surrounding environment were considered as the product of "powerful brains that can maintain the world models and devise plans". From this perspective, cognizing was conceived as something that an isolated brain did. In contrast, accepting the role the body plays during cognitive processes allows us to account for a more encompassing view of cognition. This shift in perspective within neuroscience suggests that successful behavior in real-world scenarios demands the integration of several sensorimotor and cognitive (as well as affective) capacities of an agent. Thus, cognition emerges in the relationship between an agent and the affordances provided by the environment rather than in the brain alone. In 2002, a collection of positive characterizations summarizing what the embodiment thesis entails for cognition were offered. Professor of Cognitive Psychology Margaret Wilson argues that the general outlook of embodied cognition "displays an interesting co-variation of multiple observations and houses a number of different claims: (1) cognition is situated; (2) cognition is time-pressured; (3) we off-load cognitive work onto the environment; (4) the environment is part of the cognitive system; (5) cognition is for action; (6) offline cognition is bodily-based". According to Wilson, the first three and the fifth claim appear to be at least partially true, while the fourth claim is deeply problematic in that all things that have an impact on the elements of a system are not necessarily considered part of the system. The sixth claim has received the least attention in the literature on embodied cognition, yet it might be the most significant of the six claims as it shows how certain human cognitive capabilities, that previously were thought to be highly abstract, now appear to be leaning towards an embodied approach for their explanation. Wilson also describes at least five main (abstract) categories that combine both sensory and motor skills (or sensorimotor functions). The first three are working memory, episodic memory, and implicit memory; the fourth is mental imagery, and finally, the fifth concerns reasoning and problem solving. == History == The theory of embodied cognition, along with the multiple aspects it comprises, can be regarded as the imminent result of an intellectual skepticism towards the flourishment of the disembodied theory of mind put forth by René Descartes in the 17th century. According to Cartesian dualism, the mind is entirely distinct from the body and can be successfully explained and understood without reference to the body or to its processes. Research has been done to identify the set of ideas that would establish what could be considered as the early stages of embodied cognition around inquiries regarding the mind-body-soul rel

    Read more →
  • ELVIS Act

    ELVIS Act

    The ELVIS Act or Ensuring Likeness Voice and Image Security Act, signed into law by Tennessee Governor Bill Lee on March 21, 2024, marked a significant milestone in the area of regulation of artificial intelligence and public sector policies for artists in the era of artificial intelligence (AI) and AI alignment. It was noted as the first enacted legislation in the United States specifically designed to protect musicians from the unauthorized use of their voices through artificial intelligence technologies and against audio deepfakes and voice cloning. This legislation distinguishes itself by adding penalties for copying a performer's voice. == Origin and advocacy == The inception of the ELVIS Act has been attributed to Gebre Waddell, founder of Sound Credit, who initially conceptualized a framework in 2023 that later evolved into the legislation. Representative Justin J. Pearson acknowledged Waddell's pivotal role during the March 4 House Floor Session on the bill. Leading Tennessee musicians supported the ELVIS Act. Tennessee Governor Bill Lee endorsed it as a Governor's Bill, and it was introduced in the Tennessee Legislature as House Bill 2091 by William Lamberth (R-44) and Senate Bill 2096 by Jack Johnson (R-27). The ELVIS Act is an amendment to a 1984 law that was the result of the Elvis Presley estate litigation for controlling how his likeness could be used after death. == Lobbying from the recording industry == The legislative journey of the ELVIS Act included a broad coalition of music industry stakeholders, including: These organizations, led by the Recording Academy and the RIAA, played roles in drafting the legislation, advocating for passage, and rallying support among the industry and legislators. The act gained momentum through discussions that bridged industry concerns with legislative action. This collaborative process led to a proposal that specifically targets the use of AI to create unauthorized reproductions of artists' voices and images. == Opposition == The ELVIS Act saw industry opposition from the Motion Picture Association, including testimony in the House Banking & Consumer Affairs Subcommittee, including remarks that the law risks "interference with our members’ ability to portray real people and events." TechNet, representing companies such as OpenAI, Google and Amazon, expressed their opposition in the hearing to the bill as drafted, asserting that the language was too broadly written and could have unintended consequences. Other concerns included its potential application to cover bands, but lawmakers assured people that this was not the intention. The bill passed the Tennessee House and Senate with a unanimous, bi-partisan vote including 93 ayes and 0 Noes in the House, and 30 ayes and 0 noes in the Senate. == Passage == By explicitly addressing AI impersonation, the ELVIS Act originated a legal approach to safeguarding personal rights, in the context of digital and technological advancements. It extends protections to an artist's voice and likeness, areas vulnerable to exploitation with the proliferation of AI technologies that occurred in 2023. The legislation received widespread support from the music industry, signaling a significant step forward in the ongoing effort to balance innovation with the protection of individual rights and creative integrity. It was reported as underscoring Tennessee's commitment to its musical heritage and showed the state as a leader in adapting copyright and privacy protections to the modern technological landscape. Artists including Chris Janson and Luke Bryan appeared at the signing ceremony hosted at Robert's Western World to support the new law and commemorate its passing. == Legal precedent == The ELVIS Act was reported as representing a development in the discourse surrounding AI, intellectual property, and personal rights. It was hoped by proponents to set a precedent for future legislative efforts both within and beyond Tennessee, offering a model for how states and potentially the federal government could address similar challenges. As AI technology continues to evolve, the act represents a foundational framework for protecting the authenticity and rights of artists, ensuring contributions remain protected. The act prohibits usage of AI to clone the voice of an artist without consent and can be criminally enforced as a Class A misdemeanor. This legislation's success was hoped by its supporters to inspire similar actions in other states, contributing to a unified approach to copyright and privacy in the digital age. Such a national response would reinforce the importance of safeguarding artists' rights against unauthorized use of their voices and likenesses.

    Read more →
  • Text-to-image personalization

    Text-to-image personalization

    Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data (usually a foundation model), is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects (such as the user's pet) or more abstract categories (new artistic style or object relations). Text-to-Image personalization methods typically bind the novel (personal) concept to new words in the vocabulary of the model. These words can then be used in future prompts to invoke the concept for subject-driven generation, inpainting, style transfer and even to correct biases in the model. To do so, models either optimize word-embeddings, fine-tune the generative model itself, or employ a mixture of both approaches. == Technology == Text-to-Image personalization was first proposed during August 2022 by two concurrent works, Textual Inversion and DreamBooth. In both cases, a user provides a few images (typically 3–5) of a concept, like their own dog, together with a coarse descriptor of the concept class (like the word "dog"). The model then learns to represent the subject through a reconstruction based objective, where prompts referring to the subject are expected to reconstruct images from the training set. In Textual Inversion, the personalized concepts are introduced into the text-to-image model by adding new words to the vocabulary of the model. Typical text-to-image models represent words (and sometimes parts-of-words) as tokens, or indices in a predefined dictionary. During generation, an input prompt is converted into such tokens, each of which is converted into a ‘word-embedding’: a continuous vector representation which is learned for each token as part of the model's training. Textual Inversion proposes to optimize a new word-embedding vector for representing the novel concept. This new embedding vector can then be assigned to a user-chosen string, and invoked whenever the user's prompt contains this string. In DreamBooth, rather than optimizing a new word vector, the full generative model itself is fine-tuned. The user first selects an existing token, typically one which rarely appears in prompts. The subject itself is then represented by a string containing this token, followed by a coarse descriptor of the subject's class. A prompt describing the subject will then take the form: "A photo of " (e.g. "a photo of sks cat" when learning to represent a specific cat). The text-to-image model is then tuned so that prompts of this form will generate images of the subject. == Textual Inversion == The key idea in Textual Inversion is to add a new term to the vocabulary of the diffusion model that corresponds to the new (personalized) concept. Textual Inversion operates by inverting the concepts into new pseudo-words within the textual embedding space of a pre-trained text-to-image model. These pseudo-words can be injected into new scenes using simple natural language descriptions, allowing for simple and intuitive modifications. The method allows a user to leverage multi-modal information — using a text-driven interface for ease of editing, but providing visual cues when approaching the limits of natural language. The resulting model is extremely light-weight per concept: only 1K long, but succeeds to encode detailed visual properties of the concept. == Extensions == Several approaches were proposed to refine and improve over the original methods. These include the following. Low-rank Adaptation (LoRA) - an adapter-based technique for efficient finetuning of models. In the case of text-to-image models, LoRA is typically used to modify the cross-attention layers of a diffusion model. Perfusion - a low rank update method that also locks the activations of the key matrix in the diffusion model's cross attention layers to the concept's coarse class. Extended Textual Inversion - a technique that learns an individual word embedding for each layer in the diffusion model's denoising network. Encoder-based methods that use another neural network to quickly personalize a model == Challenges and limitations == Text-to-image personalization methods must contend with several challenges. At their core is the goal of achieving high-fidelity to the personal concept while maintaining high alignment between novel prompts containing the subject, and the generated images (typically referred to as ‘editability’). Another challenge that personalization methods must contend with is memory requirements. Initial implementations of personalization methods required more than 20 Gigabytes of GPU memory, and more recent approaches have reported requirements of more than 40 Gigabytes. However, optimizations such as Flash Attention have since reduced this requirement considerably. Approaches that tune the entire generative model may also create checkpoints that are several gigabytes in size, making it difficult to share or store many models. Embedding based approaches require only a few kilobytes, but typically struggle to preserve identity while maintaining editability. More recent approaches have proposed hybrid tuning goals which optimize both an embedding and a subset of network weights. These can reduce storage requirements to as little as 100 Kilobytes while achieving quality comparable to full tuning methods. Finally, optimization processes can be lengthy, requiring several minutes of tuning for each novel concept. Encoder and quick-tuning methods aim to reduce this to seconds or less.

    Read more →
  • Tim Houlne

    Tim Houlne

    Tim Houlne is an American business executive, entrepreneur, and author known for his work in outsourcing and homeshoring, remote working, and artificial intelligence (AI) in customer service. He is the founder and CEO of Humach, a company that uses human agents and AI in customer experience solutions. Previously, he was co-founder and CEO of Working Solutions, a virtual contact center company in the United States. == Early life and education == Houlne graduated from Missouri Western State University (MWSU) in 1986 with a bachelor's degree in business administration and from the University of Texas in Dallas with an MBA. In 2024, MWSU and North Central Missouri College renamed the Convergent Technology Alliance Center to the Houlne Center for Convergent Technology. The 20,000 square-foot learning laboratory provides training and applied education experiences in industries such as AI, cybersecurity, manufacturing and construction, and service technologies. == Career == In 1998, Houlne co-founded Working Solutions, a Plano, Texas-based U.S. outsourcing company that provides customer service using remote, home-based agents. As CEO, he oversaw the development of a virtual workforce model that routes service calls to either domestic or offshore agents, according to client needs and service requirements. In 2015, Houlne founded Humach, a customer experience outsourcing provider that uses human service agents with AI-based digital agents. The company derives its name from the combination of services provided by humans and machines. Its clients include Amazon, Carfax and McDonald's. The company acquired InfiniteAI in 2020, and Markets EQ in 2025. In 2013, Houlne was named a finalist for the Ernst & Young Entrepreneur of the Year Award (Southwest Region).He is the co-author of several books focused on the evolution of work, the gig economy, and the influence of AI in customer-facing roles. == Works == The New World of Work: From the Cube to the Cloud (2013) ISBN 0982562276 OCLC 813933360 The New World of Work, Second Edition: The Cube, the Cloud and What's Next (2023) ISBN 9781642258318 OCLC 1389815847 The Intelligent Workforce: How Humans & Machines Will Co-Create a Better Future (2024) ISBN 9798887501604 OCLC 1439598569

    Read more →
  • Pattern language

    Pattern language

    A pattern language is an organized and coherent set of patterns, each of which describes a problem and the core of a solution that can be used in many ways within a specific field of expertise. The term was coined by architect Christopher Alexander and popularized by his 1977 book A Pattern Language. A pattern language can also be an attempt to express the deeper wisdom of what brings aliveness within a particular field of human endeavor, through a set of interconnected patterns. Aliveness is one placeholder term for "the quality that has no name": a sense of wholeness, spirit, or grace, that while of varying form, is precise and empirically verifiable. Alexander claims that ordinary people can use this design approach to successfully solve very large, complex design problems. == What is a pattern? == When a designer designs something – whether a house, computer program, or lamp – they must make many decisions about how to solve problems. A single problem is documented with its typical place (the syntax), and use (the grammar) with the most common and recognized good solution seen in the wild, like the examples seen in dictionaries. Each such entry is a single design pattern. Each pattern has a name, a descriptive entry, and some cross-references, much like a dictionary entry. A documented pattern should explain why that solution is good in the pattern's contexts. Elemental or universal patterns such as "door" or "partnership" are versatile ideals of design, either as found in experience or for use as components in practice, explicitly described as holistic resolutions of the forces in recurrent contexts and circumstances, whether in architecture, medicine, software development or governance, etc. Patterns might be invented or found and studied, such as the naturally occurring patterns of design that characterize human environments. Like all languages, a pattern language has vocabulary, syntax, and grammar – but a pattern language applies to some complex activity other than communication. In pattern languages for design, the parts break down in this way: The language description – the vocabulary – is a collection of named, described solutions to problems in a field of interest. These are called design patterns. So, for example, the language for architecture describes items like: settlements, buildings, rooms, windows, latches, etc. Each solution includes syntax, a description that shows where the solution fits in a larger, more comprehensive or more abstract design. This automatically links the solution into a web of other needed solutions. For example, rooms have ways to get light, and ways to get people in and out. The solution includes grammar that describes how the solution solves a problem or produces a benefit. So, if the benefit is unneeded, the solution is not used. Perhaps that part of the design can be left empty to save money or other resources; if people do not need to wait to enter a room, a simple doorway can replace a waiting room. In the language description, grammar and syntax cross index (often with a literal alphabetic index of pattern names) to other named solutions, so the designer can quickly think from one solution to related, needed solutions, and document them in a logical way. In Christopher Alexander's book A Pattern Language, the patterns are in decreasing order by size, with a separate alphabetic index. The web of relationships in the index of the language provides many paths through the design process. This simplifies the design work because designers can start the process from any part of the problem they understand and work toward the unknown parts. At the same time, if the pattern language has worked well for many projects, there is reason to believe that even a designer who does not completely understand the design problem at first will complete the design process, and the result will be usable. For example, skiers coming inside must shed snow and store equipment. The messy snow and boot cleaners should stay outside. The equipment needs care, so the racks should be inside. == Many patterns form a language == Just as words must have grammatical and semantic relationships to each other in order to make a spoken language useful, design patterns must be related to each other in position and utility order to form a pattern language. Christopher Alexander's work describes a process of decomposition, in which the designer has a problem (perhaps a commercial assignment), selects a solution, then discovers new, smaller problems resulting from the larger solution. Occasionally, the smaller problems have no solution, and a different larger solution must be selected. Eventually all of the remaining design problems are small enough or routine enough to be solved by improvisation by the builders, and the "design" is done. The actual organizational structure (hierarchical, iterative, etc.) is left to the discretion of the designer, depending on the problem. This explicitly lets a designer explore a design, starting from some small part. When this happens, it's common for a designer to realize that the problem is actually part of a larger solution. At this point, the design almost always becomes a better design. In the language, therefore, each pattern has to indicate its relationships to other patterns and to the language as a whole. This gives the designer using the language a great deal of guidance about the related problems that must be solved. The most difficult part of having an outside expert apply a pattern language is in fact to get a reliable, complete list of the problems to be solved. Of course, the people most familiar with the problems are the people that need a design. So, Alexander famously advocated on-site improvisation by concerned, empowered users, as a powerful way to form very workable large-scale initial solutions, maximizing the utility of a design, and minimizing the design rework. The desire to empower users of architecture was, in fact, what led Alexander to undertake a pattern language project for architecture in the first place. == Design problems in a context == An important aspect of design patterns is to identify and document the key ideas that make a good system different from a poor system (that may be a house, a computer program or an object of daily use), and to assist in the design of future systems. The idea expressed in a pattern should be general enough to be applied in very different systems within its context, but still specific enough to give constructive guidance. The range of situations in which the problems and solutions addressed in a pattern apply is called its context. An important part in each pattern is to describe this context. Examples can further illustrate how the pattern applies to very different situation. For instance, Alexander's pattern "A PLACE TO WAIT" addresses bus stops in the same way as waiting rooms in a surgery, while still proposing helpful and constructive solutions. The "Gang-of-Four" book Design Patterns by Gamma et al. proposes solutions that are independent of the programming language, and the program's application domain. Still, the problems and solutions described in a pattern can vary in their level of abstraction and generality on the one side, and specificity on the other side. In the end this depends on the author's preferences. However, even a very abstract pattern will usually contain examples that are, by nature, absolutely concrete and specific. Patterns can also vary in how far they are proven in the real world. Alexander gives each pattern a rating by zero, one or two stars, indicating how well they are proven in real-world examples. It is generally claimed that all patterns need at least some existing real-world examples. It is, however, conceivable to document yet unimplemented ideas in a pattern-like format. The patterns in Alexander's book also vary in their level of scale – some describing how to build a town or neighbourhood, others dealing with individual buildings and the interior of rooms. Alexander sees the low-scale artifacts as constructive elements of the large-scale world, so they can be connected to a hierarchic network. === Balancing of forces === A pattern must characterize the problems that it is meant to solve, the context or situation where these problems arise, and the conditions under which the proposed solutions can be recommended. Often these problems arise from a conflict of different interests or "forces". A pattern emerges as a dialogue that will then help to balance the forces and finally make a decision. For instance, there could be a pattern suggesting a wireless telephone. The forces would be the need to communicate, and the need to get other things done at the same time (cooking, inspecting the bookshelf). A very specific pattern would be just "WIRELESS TELEPHONE". More general patterns would be "WIRELESS DEVICE" or "SECONDARY ACTIVITY", suggesting that a secondary activity (such as talking on t

    Read more →
  • Capsule neural network

    Capsule neural network

    A capsule neural network (CapsNet) is a machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization. The idea is to add structures called "capsules" to a convolutional neural network (CNN), and to reuse output from several of those capsules to form more stable (with respect to various perturbations) representations for higher capsules. The output is a vector consisting of the probability of an observation, and a pose for that observation. This vector is similar to what is done for example when doing classification with localization in CNNs. Among other benefits, capsnets address the "Picasso problem" in image recognition: images that have all the right parts but that are not in the correct spatial relationship (e.g., in a "face", the positions of the mouth and one eye are switched). For image recognition, capsnets exploit the fact that while viewpoint changes have nonlinear effects at the pixel level, they have linear effects at the part/object level. This can be compared to inverting the rendering of an object of multiple parts. == History == In 2000, Geoffrey Hinton et al. described an imaging system that combined segmentation and recognition into a single inference process using parse trees. So-called credibility networks described the joint distribution over the latent variables and over the possible parse trees. That system proved useful on the MNIST handwritten digit database. A dynamic routing mechanism for capsule networks was introduced by Hinton and his team in 2017. The approach was claimed to reduce error rates on MNIST and to reduce training set sizes. Results were claimed to be considerably better than a CNN on highly overlapped digits. In Hinton's original idea one minicolumn would represent and detect one multidimensional entity. == Transformations == An invariant is an object property that does not change as a result of some transformation. For example, the area of a circle does not change if the circle is shifted to the left. Informally, an equivariant is a property that changes predictably under transformation. For example, the center of a circle moves by the same amount as the circle when shifted. A nonequivariant is a property whose value does not change predictably under a transformation. For example, transforming a circle into an ellipse means that its perimeter can no longer be computed as π times the diameter. In computer vision, the class of an object is expected to be an invariant over many transformations. I.e., a cat is still a cat if it is shifted, turned upside down or shrunken in size. However, many other properties are instead equivariant. The volume of a cat changes when it is scaled. Equivariant properties such as a spatial relationship are captured in a pose, data that describes an object's translation, rotation, scale and reflection. Translation is a change in location in one or more dimensions. Rotation is a change in orientation. Scale is a change in size. Reflection is a mirror image. Unsupervised capsnets learn a global linear manifold between an object and its pose as a matrix of weights. In other words, capsnets can identify an object independent of its pose, rather than having to learn to recognize the object while including its spatial relationships as part of the object. In capsnets, the pose can incorporate properties other than spatial relationships, e.g., color (cats can be of various colors). Multiplying the object by the manifold poses the object (for an object, in space). == Pooling == Capsnets reject the pooling layer strategy of conventional CNNs that reduces the amount of detail to be processed at the next higher layer. Pooling allows a degree of translational invariance (it can recognize the same object in a somewhat different location) and allows a larger number of feature types to be represented. Capsnet proponents argue that pooling: violates biological shape perception in that it has no intrinsic coordinate frame; provides invariance (discarding positional information) instead of equivariance (disentangling that information); ignores the linear manifold that underlies many variations among images; routes statically instead of communicating a potential "find" to the feature that can appreciate it; damages nearby feature detectors, by deleting the information they rely upon. == Capsules == A capsule is a set of neurons that individually activate for various properties of a type of object, such as position, size and hue. Formally, a capsule is a set of neurons that collectively produce an activity vector with one element for each neuron to hold that neuron's instantiation value (e.g., hue). Graphics programs use instantiation value to draw an object. Capsnets attempt to derive these from their input. The probability of the entity's presence in a specific input is the vector's length, while the vector's orientation quantifies the capsule's properties. Artificial neurons traditionally output a scalar, real-valued activation that loosely represents the probability of an observation. Capsnets replace scalar-output feature detectors with vector-output capsules and max-pooling with routing-by-agreement. Because capsules are independent, when multiple capsules agree, the probability of correct detection is much higher. A minimal cluster of two capsules considering a six-dimensional entity would agree within 10% by chance only once in a million trials. As the number of dimensions increase, the likelihood of a chance agreement across a larger cluster with higher dimensions decreases exponentially. Capsules in higher layers take outputs from capsules at lower layers, and accept those whose outputs cluster. A cluster causes the higher capsule to output a high probability of observation that an entity is present and also output a high-dimensional (20-50+) pose. Higher-level capsules ignore outliers, concentrating on clusters. This is similar to the Hough transform, the RHT and RANSAC from classic digital image processing. == Routing by agreement == The outputs from one capsule (child) are routed to capsules in the next layer (parent) according to the child's ability to predict the parents' outputs. Over the course of a few iterations, each parents' outputs may converge with the predictions of some children and diverge from those of others, meaning that that parent is present or absent from the scene. For each possible parent, each child computes a prediction vector by multiplying its output by a weight matrix (trained by backpropagation). Next the output of the parent is computed as the scalar product of a prediction with a coefficient representing the probability that this child belongs to that parent. A child whose predictions are relatively close to the resulting output successively increases the coefficient between that parent and child and decreases it for parents that it matches less well. This increases the contribution that that child makes to that parent, thus increasing the scalar product of the capsule's prediction with the parent's output. After a few iterations, the coefficients strongly connect a parent to its most likely children, indicating that the presence of the children imply the presence of the parent in the scene. The more children whose predictions are close to a parent's output, the more quickly the coefficients grow, driving convergence. The pose of the parent (reflected in its output) progressively becomes compatible with that of its children. The coefficients' initial logits are the log prior probabilities that a child belongs to a parent. The priors can be trained discriminatively along with the weights. The priors depend on the location and type of the child and parent capsules, but not on the current input. At each iteration, the coefficients are adjusted via a "routing" softmax so that they continue to sum to 1 (to express the probability that a given capsule is the parent of a given child.) Softmax amplifies larger values and diminishes smaller values beyond their proportion of the total. Similarly, the probability that a feature is present in the input is exaggerated by a nonlinear "squashing" function that reduces values (smaller ones drastically and larger ones such that they are less than 1). This dynamic routing mechanism provides the necessary deprecation of alternatives ("explaining away") that is needed for segmenting overlapped objects. This learned routing of signals has no clear biological equivalent. Some operations can be found in cortical layers, but they do not seem to relate this technique. === Math/code === The pose vector u i {\textstyle \mathbf {u} _{i}} is rotated and translated by a matrix W i j {\textstyle \mathbf {W} _{ij}} into a vector u ^ j | i {\textstyle \mathbf {\hat {u}} _{j|i}} that predicts the output of the parent capsule. u ^ j | i = W i j u i {\displaystyle \mathbf {

    Read more →
  • Linguatec

    Linguatec

    The Linguatec Sprachtechnologien GmbH is a language technology provider, specialized in the field of machine translation, speech synthesis and speech recognition. Linguatec was founded in Munich in 1996 and its headquarters are in Pasing. Linguatec has won the European Information Society Technologies Prize three times. On their website, they are now using the online service Voice Reader Web, so that the information can be read out in every language by means of a text-to-speech function. == Core areas == Machine translation The different versions of Personal Translator (seven language pairs) can be used "for home use" or for professional business use in the company network. In addition to this, specialist dictionaries are offered to broaden standard vocabulary. Speech synthesis The Voice Reader text-to-speech program reads in twelve languages: German, British English, American English, French, Quebec French, Spanish, Mexican Spanish, Italian, Dutch, Portuguese, Czech, Chinese. Speech recognition Voice Pro is based on ViaVoice technology from IBM. There are special software programs for doctors and lawyers. == Patents == 2005 pending patent application for a newly developed hybrid technology that uses the intelligence of neural networks for machine translation. == Awards == 2004 European IT Prize for Beyond Babel 2004 test winner Stiftung Warentest – best voice recognition 1998 European IT Prize – applied voice recognition 1996 European IT Prize – automated translation == Studies == 2005 University of Regensburg: Voice Reader user test 2002 Fraunhofer Institute for Industrial Engineering and Organization IAO: user study on the efficiency of machine translation

    Read more →
  • Greg Brockman

    Greg Brockman

    Gregory Brockman (born November 29, 1987) is an American entrepreneur and software engineer. He is co-founder and president of OpenAI. He began his career at Stripe in 2010, upon leaving MIT, and became CTO in 2013. He left Stripe in 2015 to co-found OpenAI, where he also served as CTO. == Early life == Brockman was born in Thompson, North Dakota, and attended Red River High School, where he excelled in mathematics, chemistry, and computer science. He won a silver medal in the 2006 International Chemistry Olympiad and became the first finalist from North Dakota to participate in the Intel science talent search since 1973. In 2003, 2005, and 2007, he attended Canada/USA Mathcamp, a summer program for mathematically talented high-school students. In 2008, Brockman enrolled at Harvard University but left a year later, briefly enrolling at the Massachusetts Institute of Technology. == Career == In 2010, he dropped out of MIT to join Stripe, a company founded by Patrick Collison, his MIT classmate, and John Collison. In 2013, he became Stripe's first CTO, while the company grew from 5 to 205 employees. Brockman left Stripe in May 2015. === OpenAI === Brockman met with Sam Altman and Elon Musk, and led the recruiting of the OpenAI founding team. Many of its members, including Ilya Sutskever, were top AI research talent that left high paying jobs for the opportunity at OpenAI. He co-founded OpenAI in December 2015 alongside Altman, Sutskever and others. The company initially operated from Brockman's living room. He led various projects at OpenAI, including OpenAI Gym and OpenAI Five, a Dota 2 bot. On February 14, 2019, OpenAI announced that they had developed a new large language model called GPT-2, but kept it private due to their concern for its potential misuse. They released the model to a limited group of beta testers in May 2019. On March 14, 2023, in a live video demo, Brockman unveiled GPT-4, the fourth iteration in the GPT series. On November 17, 2023, alongside the firing of Sam Altman, Brockman was told he had been removed from the board. Sutskever supplied the board with a document of alleged bullying by Brockman. Mira Murati said Brockman's relationship with Altman made it impossible for her to do her job, and Altman had already "fielded many requests from OpenAI employees to rein in Brockman". Brockman was to report to Murati, but on November 17, he announced that he had quit the company. On November 20, 2023, Microsoft CEO Satya Nadella announced that Brockman and Altman would join Microsoft to lead a new advanced AI research team. The following day, after a deal was reached to reinstate Altman as CEO, Brockman returned to OpenAI. Brockman took a sabbatical from August to November 2024. === Elon Musk lawsuit === Jury selection for OpenAI cofounder Elon Musk's lawsuit against OpenAI and its current executives, including Brockman, began on April 27, 2026. On April 28, 2026, trial testimony was by now underway, with Elon Musk beginning his testimony against Altman and OpenAI. On April 30, 2026 Musk would enter his third day of testimony. == Personal life == In November 2019 after a year of dating, Brockman married Anna at OpenAI's offices on a workday. Ilya Sutskever officiated. == Political activities == Brockman and his wife were the biggest donors to Donald Trump's Super PAC, MAGA Inc., in 2025 with each of them donating US$12.5 million. Brockman and his wife also donated $50 million to Leading the Future, a super PAC dedicated to AI deregulation that he helped found with Andreessen Horowitz co-founders Marc Andreessen and Ben Horowitz. OpenAI publicly expressed openness to increased regulatory oversight and has a policy against donating to such Super PACs.

    Read more →
  • Simple Knowledge Organization System

    Simple Knowledge Organization System

    Simple Knowledge Organization System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data. == History == === DESIRE II project (1997–2000) === The most direct ancestor to SKOS was the RDF Thesaurus work undertaken in the second phase of the EU DESIRE project . Motivated by the need to improve the user interface and usability of multi-service browsing and searching, a basic RDF vocabulary for Thesauri was produced. As noted later in the SWAD-Europe workplan, the DESIRE work was adopted and further developed in the SOSIG and LIMBER projects. A version of the DESIRE/SOSIG implementation was described in W3C's QL'98 workshop, motivating early work on RDF rule and query languages: A Query and Inference Service for RDF. === LIMBER (1999–2001) === SKOS built upon the output of the Language Independent Metadata Browsing of European Resources (LIMBER) project funded by the European Community, and part of the Information Society Technologies programme. In the LIMBER project CCLRC further developed an RDF thesaurus interchange format which was demonstrated on the European Language Social Science Thesaurus (ELSST) at the UK Data Archive as a multilingual version of the English language Humanities and Social Science Electronic Thesaurus (HASSET) which was planned to be used by the Council of European Social Science Data Archives CESSDA. === SWAD-Europe (2002–2004) === SKOS as a distinct initiative began in the SWAD-Europe project, bringing together partners from both DESIRE, SOSIG (ILRT) and LIMBER (CCLRC) who had worked with earlier versions of the schema. It was developed in the Thesaurus Activity Work Package, in the Semantic Web Advanced Development for Europe (SWAD-Europe) project. SWAD-Europe was funded by the European Community, and part of the Information Society Technologies programme. The project was designed to support W3C's Semantic Web Activity through research, demonstrators and outreach efforts conducted by the five project partners, ERCIM, the ILRT at Bristol University, HP Labs, CCLRC and Stilo. The first release of SKOS Core and SKOS Mapping were published at the end of 2003, along with other deliverables on RDF encoding of multilingual thesauri and thesaurus mapping. === Semantic web activity (2004–2005) === Following the termination of SWAD-Europe, SKOS effort was supported by the W3C Semantic Web Activity in the framework of the Best Practice and Deployment Working Group. During this period, focus was put both on consolidation of SKOS Core, and development of practical guidelines for porting and publishing thesauri for the Semantic Web. === Development as W3C Recommendation (2006–2009) === The SKOS main published documents — the SKOS Core Guide, the SKOS Core Vocabulary Specification, and the Quick Guide to Publishing a Thesaurus on the Semantic Web — were developed through the W3C Working Draft process. Principal editors of SKOS were Alistair Miles, initially Dan Brickley, and Sean Bechhofer. The Semantic Web Deployment Working Group, chartered for two years (May 2006 – April 2008), put in its charter to push SKOS forward on the W3C Recommendation track. The roadmap projected SKOS as a Candidate Recommendation by the end of 2007, and as a Proposed Recommendation in the first quarter of 2008. The main issues to solve were determining its precise scope of use, and its articulation with other RDF languages and standards used in libraries (such as Dublin Core). === Formal release (2009) === On August 18, 2009, W3C released the new standard that builds a bridge between the world of knowledge organization systems – including thesauri, classifications, subject headings, taxonomies, and folksonomies – and the linked data community, bringing benefits to both. Libraries, museums, newspapers, government portals, enterprises, social networking applications, and other communities that manage large collections of books, historical artifacts, news reports, business glossaries, blog entries, and other items can now use SKOS to leverage the power of linked data. === Historical view of components === SKOS was originally designed as a modular and extensible family of languages, organized as SKOS Core, SKOS Mapping, and SKOS Extensions, and a Metamodel. The entire specification is now complete within the namespace http://www.w3.org/2004/02/skos/core#. == Overview == In addition to the reference itself, the SKOS Primer (a W3C Working Group Note) summarizes the Simple Knowledge Organization System. The SKOS defines the classes and properties sufficient to represent the common features found in a standard thesaurus. It is based on a concept-centric view of the vocabulary, where primitive objects are not terms, but abstract notions represented by terms. Each SKOS concept is defined as an RDF resource. Each concept can have RDF properties attached, including: one or more preferred index terms (at most one in each natural language) alternative terms or synonyms definitions and notes, with specification of their language Concepts can be organized in hierarchies using broader-narrower relationships, or linked by non-hierarchical (associative) relationships. Concepts can be gathered in concept schemes, to provide consistent and structured sets of concepts, representing whole or part of a controlled vocabulary. === Element categories === The principal element categories of SKOS are concepts, labels, notations, documentation, semantic relations, mapping properties, and collections. The associated elements are listed in the table below. === Concepts === The SKOS vocabulary is based on concepts. Concepts are the units of thought—ideas, meanings, or objects and events (instances or categories)—which underlie many knowledge organization systems. As such, concepts exist in the mind as abstract entities which are independent of the terms used to label them. In SKOS, a Concept (based on the OWL Class) is used to represent items in a knowledge organization system (terms, ideas, meanings, etc.) or such a system's conceptual or organizational structure. A ConceptScheme is analogous to a vocabulary, thesaurus, or other way of organizing concepts. SKOS does not constrain a concept to be within a particular scheme, nor does it provide any way to declare a complete scheme—there is no way to say the scheme consists only of certain members. A topConcept is (one of) the upper concept(s) in a hierarchical scheme. === Labels and notations === Each SKOS label is a string of Unicode characters, optionally with language tags, that are associated with a concept. The prefLabel is the preferred human-readable string (maximum one per language tag), while altLabel can be used for alternative strings, and hiddenLabel can be used for strings that are useful to associate, but not meant for humans to read. A SKOS notation is similar to a label, but this literal string has a datatype, like integer, float, or date; the datatype can even be made up (see 6.5.1 Notations, Typed Literals and Datatypes in the SKOS Reference). The notation is useful for classification codes and other strings not recognizable as words. === Documentation === The Documentation or Note properties provide basic information about SKOS concepts. All the properties are considered a type of skos:note; they just provide more specific kinds of information. The property definition, for example, should contain a full description of the subject resource. More specific note types can be defined in a SKOS extension, if desired. A query for skos:note ? will obtain all the notes about , including definitions, examples, and scope, history and change, and editorial documentation. Any of these SKOS Documentation properties can refer to several object types: a literal (e.g., a string); a resource node that has its own properties; or a reference to another document, for example using a URI. This enables the documentation to have its own metadata, like creator and creation date. Specific guidance on SKOS documentation properties can be found in the SKOS Primer Documentary Notes. === Semantic relations === SKOS semantic relations are intended to provide ways to declare relationships between concepts within a concept scheme. While there are no restrictions precluding their use with two concepts from separate schemes, this is discouraged because it is likely to overstate what can be known about the two schemes, and perhaps link them inappropriately. The property related simply makes an association relationship between two concepts; no hierarchy or generality relation is implied. The properties broader and narrower are used to assert a direct hierarchical link between two concepts. The meaning may be unexpected; the relat

    Read more →