AI Data Warehouse

AI Data Warehouse — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Human image synthesis

    Human image synthesis

    Human image synthesis is technology that can be applied to make believable and even photorealistic renditions of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films using computer generated imagery have featured synthetic images of human-like characters digitally composited onto the real or other simulated film material. Towards the end of the 2010s deep learning artificial intelligence has been applied to synthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work. == Timeline of human image synthesis == In 1971 Henri Gouraud made the first CG geometry capture and representation of a human face. Modeling was his wife Sylvie Gouraud. The 3D model was a simple wire-frame model and he applied the Gouraud shader he is most known for to produce the first known representation of human-likeness on computer. The 1972 short film A Computer Animated Hand by Edwin Catmull and Fred Parke was the first time that computer-generated imagery was used in film to simulate moving human appearance. The film featured a computer simulated hand and face (watch film here). The 1976 film Futureworld reused parts of A Computer Animated Hand on the big screen. The 1983 music video for song Musique Non-Stop by German band Kraftwerk aired in 1986. Created by the artist Rebecca Allen, it features non-realistic looking, but clearly recognizable computer simulations of the band members. The 1994 film The Crow was the first film production to make use of digital compositing of a computer simulated representation of a face onto scenes filmed using a body double. Necessity was the muse as the actor Brandon Lee portraying the protagonist was tragically killed accidentally on-stage. In 1999 Paul Debevec et al. of USC captured the reflectance field of a human face with their first version of a light stage. They presented their method at the SIGGRAPH 2000 In 2003 audience debut of photo realistic human-likenesses in the 2003 films The Matrix Reloaded in the burly brawl sequence where up-to-100 Agent Smiths fight Neo and in The Matrix Revolutions where at the start of the end showdown Agent Smith's cheekbone gets punched in by Neo leaving the digital look-alike unnaturally unhurt. The Matrix Revolutions bonus DVD documents and depicts the process in some detail and the techniques used, including facial motion capture and limbal motion capture, and projection onto models. In 2003 The Animatrix: Final Flight of the Osiris a state-of-the-art want-to-be human likenesses not quite fooling the watcher made by Square Pictures. In 2003 digital likeness of Tobey Maguire was made for movies Spider-man 2 and Spider-man 3 by Sony Pictures Imageworks. In 2005 the Face of the Future project was an established. by the University of St Andrews and Perception Lab, funded by the EPSRC. The website contains a "Face Transformer", which enables users to transform their face into any ethnicity and age as well as the ability to transform their face into a painting (in the style of either Sandro Botticelli or Amedeo Modigliani). This process is achieved by combining the user's photograph with an average face. In 2009 Debevec et al. presented new digital likenesses, made by Image Metrics, this time of actress Emily O'Brien whose reflectance was captured with the USC light stage 5 Motion looks fairly convincing contrasted to the clunky run in the Animatrix: Final Flight of the Osiris which was state-of-the-art in 2003 if photorealism was the intention of the animators. In 2009 a digital look-alike of a younger Arnold Schwarzenegger was made for the movie Terminator Salvation though the end result was critiqued as unconvincing. Facial geometry was acquired from a 1984 mold of Schwarzenegger. In 2010 Walt Disney Pictures released a sci-fi sequel entitled Tron: Legacy with a digitally rejuvenated digital look-alike of actor Jeff Bridges playing the antagonist CLU. In SIGGGRAPH 2013 Activision and USC presented a real-time "Digital Ira" a digital face look-alike of Ari Shapiro, an ICT USC research scientist, utilizing the USC light stage X by Ghosh et al. for both reflectance field and motion capture. The end result both precomputed and real-time rendering with the modernest game GPU shown here and looks fairly realistic. In 2014 The Presidential Portrait by USC Institute for Creative Technologies in conjunction with the Smithsonian Institution was made using the latest USC mobile light stage wherein President Barack Obama had his geometry, textures and reflectance captured. In 2014 Ian Goodfellow et al. presented the principles of a generative adversarial network. GANs made the headlines in early 2018 with the deepfakes controversies. For the 2015 film Furious 7 a digital look-alike of actor Paul Walker who died in an accident during the filming was done by Weta Digital to enable the completion of the film. In 2016 techniques which allow near real-time counterfeiting of facial expressions in existing 2D video have been believably demonstrated. In 2016 a digital look-alike of Peter Cushing was made for the Rogue One film where its appearance would appear to be of same age as the actor was during the filming of the original 1977 Star Wars film. In SIGGRAPH 2017 an audio driven digital look-alike of upper torso of Barack Obama was presented by researchers from University of Washington. It was driven only by a voice track as source data for the animation after the training phase to acquire lip sync and wider facial information from training material consisting 2D videos with audio had been completed. Late 2017 and early 2018 saw the surfacing of the deepfakes controversy where porn videos were doctored using deep machine learning so that the face of the actress was replaced by the software's opinion of what another persons face would look like in the same pose and lighting. In 2018 Game Developers Conference Epic Games and Tencent Games demonstrated "Siren", a digital look-alike of the actress Bingjie Jiang. It was made possible with the following technologies: CubicMotion's computer vision system, 3Lateral's facial rigging system and Vicon's motion capture system. The demonstration ran in near real time at 60 frames per second in the Unreal Engine 4. In 2018 at the World Internet Conference in Wuzhen the Xinhua News Agency presented two digital look-alikes made to the resemblance of its real news anchors Qiu Hao (Chinese language) and Zhang Zhao (English language). The digital look-alikes were made in conjunction with Sogou. Neither the speech synthesis used nor the gesturing of the digital look-alike anchors were good enough to deceive the watcher to mistake them for real humans imaged with a TV camera. In September 2018 Google added "involuntary synthetic pornographic imagery" to its ban list, allowing anyone to request the search engine block results that falsely depict them as "nude or in a sexually explicit situation." In February 2019 Nvidia open sources StyleGAN, a novel generative adversarial network. Right after this Phillip Wang made the website ThisPersonDoesNotExist.com with StyleGAN to demonstrate that unlimited amounts of often photo-realistic looking facial portraits of no-one can be made automatically using a GAN. Nvidia's StyleGAN was presented in a not yet peer reviewed paper in late 2018. At the June 2019 CVPR the MIT CSAIL presented a system titled "Speech2Face: Learning the Face Behind a Voice" that synthesizes likely faces based on just a recording of a voice. It was trained with massive amounts of video of people speaking. Since 1 July 2019 Virginia has criminalized the sale and dissemination of unauthorized synthetic pornography, but not the manufacture., as § 18.2–386.2 titled 'Unlawful dissemination or sale of images of another; penalty.' became part of the Code of Virginia. The law text states: "Any person who, with the intent to coerce, harass, or intimidate, maliciously disseminates or sells any videographic or still image created by any means whatsoever that depicts another person who is totally nude, or in a state of undress so as to expose the genitals, pubic area, buttocks, or female breast, where such person knows or has reason to know that he is not licensed or authorized to disseminate or sell such videographic or still image is guilty of a Class 1 misdemeanor.". The identical bills were House Bill 2678 presented by Delegate Marcus Simon to the Virginia House of Delegates on 14 January 2019 and three-day later an identical Senate bill 1736 was introduced to the Senate of Virginia by Senator Adam Ebbin. Since 1 September 2019 Texas senate bill SB 751 amendments to the election code came into effect, giving candidates in elections a 30-day protection period to the elections during which making and distributing digital look-alikes or synthetic fakes of the candidates is an offense. Th

    Read more →
  • Google matrix

    Google matrix

    A Google matrix is a particular stochastic matrix that is used by Google's PageRank algorithm. The matrix represents a graph with edges representing links between pages. The PageRank of each page can then be generated iteratively from the Google matrix using the power method. However, in order for the power method to converge, the matrix must be stochastic, irreducible and aperiodic. == Adjacency matrix A and Markov matrix S == In order to generate the Google matrix G, we must first generate an adjacency matrix A which represents the relations between pages or nodes. Assuming there are N pages, we can fill out A by doing the following: A matrix element A i , j {\displaystyle A_{i,j}} is filled with 1 if node j {\displaystyle j} has a link to node i {\displaystyle i} , and 0 otherwise; this is the adjacency matrix of links. A related matrix S corresponding to the transitions in a Markov chain of given network is constructed from A by dividing the elements of column "j" by a number of k j = Σ i = 1 N A i , j {\displaystyle k_{j}=\Sigma _{i=1}^{N}A_{i,j}} where k j {\displaystyle k_{j}} is the total number of outgoing links from node j to all other nodes. The columns having zero matrix elements, corresponding to dangling nodes, are replaced by a constant value 1/N. Such a procedure adds a link from every sink, dangling state a {\displaystyle a} to every other node. Now by the construction the sum of all elements in any column of matrix S is equal to unity. In this way the matrix S is mathematically well defined and it belongs to the class of Markov chains and the class of Perron-Frobenius operators. That makes S suitable for the PageRank algorithm. == Construction of Google matrix G == Then the final Google matrix G can be expressed via S as: G i j = α S i j + ( 1 − α ) 1 N ( 1 ) {\displaystyle G_{ij}=\alpha S_{ij}+(1-\alpha ){\frac {1}{N}}\;\;\;\;\;\;\;\;\;\;\;(1)} By the construction the sum of all non-negative elements inside each matrix column is equal to unity. The numerical coefficient α {\displaystyle \alpha } is known as a damping factor. Usually S is a sparse matrix and for modern directed networks it has only about ten nonzero elements in a line or column, thus only about 10N multiplications are needed to multiply a vector by matrix G. == Examples of Google matrix == An example of the matrix S {\displaystyle S} construction via Eq.(1) within a simple network is given in the article CheiRank. For the actual matrix, Google uses a damping factor α {\displaystyle \alpha } around 0.85. The term ( 1 − α ) {\displaystyle (1-\alpha )} gives a surfer probability to jump randomly on any page. The matrix G {\displaystyle G} belongs to the class of Perron-Frobenius operators of Markov chains. The examples of Google matrix structure are shown in Fig.1 for Wikipedia articles hyperlink network in 2009 at small scale and in Fig.2 for University of Cambridge network in 2006 at large scale. == Spectrum and eigenstates of G matrix == For 0 < α < 1 {\displaystyle 0<\alpha <1} there is only one maximal eigenvalue λ = 1 {\displaystyle \lambda =1} with the corresponding right eigenvector which has non-negative elements P i {\displaystyle P_{i}} which can be viewed as stationary probability distribution. These probabilities ordered by their decreasing values give the PageRank vector P i {\displaystyle P_{i}} with the PageRank K i {\displaystyle K_{i}} used by Google search to rank webpages. Usually one has for the World Wide Web that P ∝ 1 / K β {\displaystyle P\propto 1/K^{\beta }} with β ≈ 0.9 {\displaystyle \beta \approx 0.9} . The number of nodes with a given PageRank value scales as N P ∝ 1 / P ν {\displaystyle N_{P}\propto 1/P^{\nu }} with the exponent ν = 1 + 1 / β ≈ 2.1 {\displaystyle \nu =1+1/\beta \approx 2.1} . The left eigenvector at λ = 1 {\displaystyle \lambda =1} has constant matrix elements. With 0 < α {\displaystyle 0<\alpha } all eigenvalues move as λ i → α λ i {\displaystyle \lambda _{i}\rightarrow \alpha \lambda _{i}} except the maximal eigenvalue λ = 1 {\displaystyle \lambda =1} , which remains unchanged. The PageRank vector varies with α {\displaystyle \alpha } but other eigenvectors with λ i < 1 {\displaystyle \lambda _{i}<1} remain unchanged due to their orthogonality to the constant left vector at λ = 1 {\displaystyle \lambda =1} . The gap between λ = 1 {\displaystyle \lambda =1} and other eigenvalue being 1 − α ≈ 0.15 {\displaystyle 1-\alpha \approx 0.15} gives a rapid convergence of a random initial vector to the PageRank approximately after 50 multiplications on G {\displaystyle G} matrix. At α = 1 {\displaystyle \alpha =1} the matrix G {\displaystyle G} has generally many degenerate eigenvalues λ = 1 {\displaystyle \lambda =1} (see e.g. [6]). Examples of the eigenvalue spectrum of the Google matrix of various directed networks is shown in Fig.3 from and Fig.4 from. The Google matrix can be also constructed for the Ulam networks generated by the Ulam method [8] for dynamical maps. The spectral properties of such matrices are discussed in [9,10,11,12,13,15]. In a number of cases the spectrum is described by the fractal Weyl law [10,12]. The Google matrix can be constructed also for other directed networks, e.g. for the procedure call network of the Linux Kernel software introduced in [15]. In this case the spectrum of λ {\displaystyle \lambda } is described by the fractal Weyl law with the fractal dimension d ≈ 1.3 {\displaystyle d\approx 1.3} (see Fig.5 from ). Numerical analysis shows that the eigenstates of matrix G {\displaystyle G} are localized (see Fig.6 from ). Arnoldi iteration method allows to compute many eigenvalues and eigenvectors for matrices of rather large size [13]. Other examples of G {\displaystyle G} matrix include the Google matrix of brain [17] and business process management [18], see also. Applications of Google matrix analysis to DNA sequences is described in [20]. Such a Google matrix approach allows also to analyze entanglement of cultures via ranking of multilingual Wikipedia articles abouts persons [21] == Historical notes == The Google matrix with damping factor was described by Sergey Brin and Larry Page in 1998 [22], see also articles on PageRank history [23], [24].

    Read more →
  • Indic OCR

    Indic OCR

    Indic OCR refers to the process of converting text images written in Indic scripts into e-text using Optical character recognition (OCR) techniques. Broadly, it can also refer to the OCR systems of Brahmic scripts for languages of South Asia and Southeast Asia, not just the scripts of the Indian subcontinent, which are all written in an abugida-based writing system. OCR for Latin characters is still not 100% accurate but a relatively high degree of accuracy in conversion has been able to be achieved. Such accuracy has not yet been able to be achieved for Indic scripts using OCR. This is due in part to the writing systems of Indic languages as well as a lack of standard representation, encoding, and support among operating systems and keyboards. The Centre for Development of Advanced Computing (C-DAC) and Technology Development for Indian Languages, the premier R&D organisation of the Ministry of Electronics and Information Technology (also known as MeitY) of India have carried out many projects relating to OCR. Their projects include OCR for Malayalam, Odia, Punjabi, Telugu and Devanagari script. == Properties of Indian writing systems == There are 22 officially recognised languages in India. Of these, Hindi, Bengali and Punjabi are the most widely spoken Indo-Aryan languages and are also the fourth, seventh and tenth most widely spoken languages in the world respectively. Two or more languages can be written with same script. For example, Devanagari is used to write Hindi, Marathi, Rajasthani, Sanskrit, Bhojpuri and others, while Eastern Nagari is used to write Bengali, Assamese, Manipuri and others. Apart from basic characters as consonants and vowels, most Indic languages combine 2 or more basic characters to form compound characters. The shape of a compound character is more complex than the constituent basic characters. Some Indo-Aryan languages (including Hindi and Punjabi) have a horizontal line over the characters, while other languages (including Gujarati) and Dravidian languages (Malayalam, Kannada, Tamil, and Telugu) do not. These are some of the main challenges for creating a single OCR for all Indic languages. Indic OCR also generally includes support for recently invented scripts in India like Ol Chiki, Warang Citi, Mundari Bani, etc. which are mainly created for writing Munda languages of Austroasiatic family. The concept of upper/lower case is absent in Indic scripts. Apart from Urdu, Sindhi, Kashmiri and Thaana, all other Indic languages are written from left to right. == Examples == SanskritOCR - OCR software for Sanskrit, Hindi and other Indo-Aryan languages based on the Devanagari script. Sanskrit OCR is developed by a Sanskrit scholar from Germany - Dr. Oliver Hellwig of Department for Languages and Cultures of Southern Asia, Freie Universität Berlin. The official website is in German. The interface of earlier versions of the software was also in German, but later versions have an English interface too. E-aksharayan - Optical character recognition engine for Indian languages Chitrankan - This technology was developed by ISI, Kolkata, and transferred to C-DAC. It processes printed Hindi text from a scanner or from an image. Indic OCR models for Tesseract (software) == OCR in use == OCR has been used for Wikisource and other projects.

    Read more →
  • Corpus linguistics

    Corpus linguistics

    Corpus linguistics is an empirical method for the study of language by text corpus (plural corpora). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. Large collections of text, though corpora may also be small in terms of running words, allow linguists to run quantitative analyses on linguistic concepts that may be difficult to test in a qualitative manner. The text-corpus method uses the body of texts in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated. Corpora have not only been used for linguistics research, they have been increasingly used to compile dictionaries (starting with The American Heritage Dictionary of the English Language in 1969) and reference grammars, with A Comprehensive Grammar of the English Language, published in 1985, as a first. Experts in the field have differing views about the annotation of a corpus. These views range from John McHardy Sinclair, who advocates minimal annotation so texts speak for themselves, to the Survey of English Usage team (University College, London), who advocate annotation as allowing greater linguistic understanding through rigorous recording. == History == Some of the earliest efforts at grammatical description were based at least in part on corpora of particular religious or cultural significance. For example, Prātiśākhya literature described the sound patterns of Sanskrit as found in the Vedas, and Pāṇini's grammar of classical Sanskrit was based at least in part on analysis of that same corpus. Similarly, the early Arabic grammarians paid particular attention to the language of the Quran. In the Western European tradition, scholars prepared concordances to allow detailed study of the language of the Bible and other canonical texts. === English corpora === A landmark in modern corpus linguistics was the publication of Computational Analysis of Present-Day American English in 1967. Written by Henry Kučera and W. Nelson Francis, the work was based on an analysis of the Brown Corpus, which is a structured and balanced corpus of one million words of American English from the year 1961. The corpus comprises 2000 text samples, from a variety of genres. The Brown Corpus was the first computerized corpus designed for linguistic research. Kučera and Francis subjected the Brown Corpus to a variety of computational analyses and then combined elements of linguistics, language teaching, psychology, statistics, and sociology to create a rich and variegated opus. A further key publication was Randolph Quirk's "Towards a description of English Usage" in 1960 in which he introduced the Survey of English Usage. Quirk's corpus was the first modern corpus to be built with the purpose of representing the whole language. Shortly thereafter, Boston publisher Houghton-Mifflin approached Kučera to supply a million-word, three-line citation base for its new American Heritage Dictionary, the first dictionary compiled using corpus linguistics. The AHD took the innovative step of combining prescriptive elements (how language should be used) with descriptive information (how it actually is used). Other publishers followed suit. The British publisher Collins' COBUILD monolingual learner's dictionary, designed for users learning English as a foreign language, was compiled using the Bank of English. The Survey of English Usage Corpus was used in the development of one of the most important Corpus-based Grammars, which was written by Quirk et al. and published in 1985 as A Comprehensive Grammar of the English Language. The Brown Corpus has also spawned a number of similarly structured corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (New Zealand English), Australian Corpus of English (Australian English), the Frown Corpus (early 1990s American English), and the FLOB Corpus (1990s British English). Other corpora represent many languages, varieties and modes, and include the International Corpus of English, and the British National Corpus, a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities (Oxford and Lancaster) and the British Library. For contemporary American English, work has stalled on the American National Corpus, but the 400+ million word Corpus of Contemporary American English (1990–present) is now available through a web interface. The first computerized corpus of transcribed spoken language was constructed in 1971 by the Montreal French Project, containing one million words, which inspired Shana Poplack's much larger corpus of spoken French in the Ottawa-Hull area. === Multilingual corpora === In the 1990s, many of the notable early successes on statistical methods in natural-language programming (NLP) occurred in the field of machine translation, due especially to work at IBM Research. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. There are corpora in non-European languages as well. For example, the National Institute for Japanese Language and Linguistics in Japan has built a number of corpora of spoken and written Japanese. Sign language corpora have also been created using video data. === Ancient languages corpora === Besides these corpora of living languages, computerized corpora have also been made of collections of texts in ancient languages. An example is the Andersen-Forbes database of the Hebrew Bible, developed since the 1970s, in which every clause is parsed using graphs representing up to seven levels of syntax, and every segment tagged with seven fields of information. The Quranic Arabic Corpus is an annotated corpus for the Classical Arabic language of the Quran. This is a recent project with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar. The Digital Corpus of Sanskrit (DCS) is a "Sandhi-split corpus of Sanskrit texts with full morphological and lexical analysis... designed for text-historical research in Sanskrit linguistics and philology." === Corpora from specific fields === Besides pure linguistic inquiry, researchers had begun to apply corpus linguistics to other academic and professional fields, such as the emerging sub-discipline of Law and Corpus Linguistics, which seeks to understand legal texts using corpus data and tools. The DBLP Discovery Dataset concentrates on computer science, containing relevant computer science publications with sentient metadata such as author affiliations, citations, or study fields. A more focused dataset was introduced by NLP Scholar, a combination of papers of the ACL Anthology and Google Scholar metadata. Corpora can also aid in translation efforts or in teaching foreign languages. == Methods == Corpus linguistics has generated a number of research methods, which attempt to trace a path from data to theory. Wallis and Nelson (2001) first introduced what they called the 3A perspective: Annotation, Abstraction and Analysis. Annotation consists of the application of a scheme to texts. Annotations may include structural markup, part-of-speech tagging, parsing, and numerous other representations. Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically includes linguist-directed search but may include e.g., rule-learning for parsers. Analysis consists of statistically probing, manipulating and generalising from the dataset. Analysis might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods. Most lexical corpora today are part-of-speech-tagged (POS-tagged). However even corpus linguists who work with 'unannotated plain text' inevitably apply some method to isolate salient terms. In such situations annotation and abstraction are combined in a lexical search. The advantage of publishing an annotated corpus is that other users can then perform experiments on the corpus (through corpus managers). Linguists with other interests and differing perspectives than the originators' can exploit this work. By sharing data

    Read more →
  • AI agent

    AI agent

    In the context of generative artificial intelligence, AI agents (also referred to as compound AI systems or agentic AI) are a class of intelligent agents that can pursue goals, use tools, and take actions with varying degrees of autonomy. In practice, they usually operate within human-defined objectives, constraints, and available tools. == Overview == AI agents possess several key attributes, including goal-directed behavior, natural language interfaces, the capacity to use external tools, and the ability to perform multi-step tasks. Their control flow is frequently driven by large language models (LLMs). Agent systems may also include memory components, planning logic, tool interfaces, and orchestration software for coordinating agent components. AI agents do not have a standard definition. NIST describes agentic AI as an emerging area requiring standards for secure operation, interoperability, and reliable interaction with external systems. A common application of AI agents is task automation: for example, booking travel plans based on a user's prompted request. Companies such as Google, Microsoft and Amazon Web Services have offered platforms for deploying pre-built AI agents. Several protocols have been proposed for standardizing inter-agent communication, with examples including the Model Context Protocol, Gibberlink, and many others. Some of these protocols are also used for connecting agents to external applications. In December 2025, Linux Foundation announced the formation of the Agentic AI Foundation (AAIF), with the goal of ensuring agentic AI evolves transparently and collaboratively. == History == AI agents have been traced back to research from the 1990s, with Harvard professor Milind Tambe noting that the definition of an AI agent was not clear at the time. Researcher Andrew Ng has been credited with spreading the term "agentic" to a wider audience in 2024. == Training and testing == Researchers have attempted to build world models and reinforcement learning environments to train or evaluate AI agents. For example, video games such as Minecraft and No Man's Sky as well as replicas of company websites, have also been used for training such agents. == Autonomous capabilities == The Financial Times compared the autonomy of AI agents to the SAE classification of self-driving cars, likening most applications to level 2 or level 3, with some achieving level 4 in highly specialized circumstances, and level 5 being theoretical. == Cognitive architecture == The following are some internal design options for reasoning within an agent: Retrieval-augmented generation ReAct (Reason + Act) pattern is an iterative process in which an AI agent alternates between reasoning and taking actions, receives observations from the environment or external tools, and integrates these observations into subsequent reasoning steps. Reflexion, which uses an LLM to create feedback on the agent's plan of action and stores that feedback in a memory cache. A tool/agent registry, for organizing software functions or other agents that the agent can use. One-shot model querying, which queries the model once to create the plan of action. === Reference architecture === Ken Huang proposed an AI agent reference architecture, which consists of seven interconnected layers, with each layer building on the functionality of the layers beneath it: Layer 1: Foundation models - provide the core AI engines to power agent capabilities. Layer 2: Data operations - manage the complex data infrastructure required for AI agent operations, including Vector database, data loaders, RAG. Layer 3: Agent frameworks - sophisticated software and tools that simplify the development and management of the AI agents. Layer 4: Deployment and infrastructure - provide the robust technical foundation for running AI agents. Layer 5: Evaluation and observability - focus on assessing the safety and performance of AI agents. Layer 6: Security and compliance - a crucial protective framework ensuring AI agents operate safely, securely, and conform to regulatory boundaries. At this layer security and compliance features embedded into all the AI agent stack layers are integrated together. Layer 7: Agent ecosystem - represents the AI agents' interface with real-world applications and users. == Orchestration patterns == To execute complex tasks, autonomous agents are often integrated with other agents or specialized tools. These configurations, known as orchestration patterns or workflows, include the following: Prompt chaining: A sequence where the output of one step serves as the input for the next. Routing: The classification of an input to direct it to a specialized downstream task or tool. Parallelization: The simultaneous execution of multiple tasks. Sequential processing: A fixed, linear progression of tasks through a predefined pipeline. Planner-critic: An iterative pattern where one agent generates a proposal and another evaluates it to provide feedback for refinement. == Multimodal AI agents == In addition to large language models (LLMs), vision-language models (VLMs) and multimodal foundation models can be used as the basis for agents. In September 2024, Allen Institute for AI released an open-source vision-language model. Nvidia released a framework for developers to use VLMs, LLMs and retrieval-augmented generation for building AI agents that can analyze images and videos, including video search and video summarization. Microsoft released a multimodal agent model – trained on images, video, software user interface interactions, and robotics data – that the company claimed can manipulate software and robots. == Applications == As of April 2025, per the Associated Press, there are few real-world applications of AI agents. As of June 2025, per Fortune, many companies are primarily experimenting with AI agents. The Information divided AI agents into seven archetypes: business-task agents, for acting within enterprise software; conversational agents, which act as chatbots for customer support; research agents, for querying and analyzing information (such as OpenAI Deep Research); analytics agents, for analyzing data to create reports; software developer or coding agents (such as Cursor); domain-specific agents, which include specific subject matter knowledge; and web browser agents (such as OpenAI Operator). By mid-2025, AI agents have been used in video game development, gambling (including sports betting), cryptocurrency wallets (including cryptocurrency trading and meme coins) and social media. In August 2025, New York Magazine described software development as the most definitive use case of AI agents. Likewise, by October 2025, noting a decline in expectations, The Information noted AI coding agents and customer support as the primary use cases by businesses. In November 2025, The Wall Street Journal reported that few companies that deployed AI agents have received a return on investment. === Applications in government === Several government bodies in the United States and United Kingdom have deployed or announced the deployment of agents, at the local and national level. The city of Kyle, Texas deployed an AI agent from Salesforce in March 2025 for 311 customer service. In November 2025, the Internal Revenue Service stated that it would use Agentforce, AI agents from Salesforce, for the Office of Chief Counsel, Taxpayer Advocate Services and the Office of Appeals. That same month, Staffordshire Police announced that they would trial Agentforce agents for handling non-emergency 101 calls in the United Kingdom starting in 2026. In December 2025, the Department of Neighborhoods in Detroit, Michigan, in partnership with a local business, deployed a pilot project in two Detroit districts for an AI agent to be used for customer service calls. In February 2025, Thomas Shedd, the director of the Technology Transformation Services, proposed using AI coding agents across the United States federal government. A recruiter for the Department of Government Efficiency proposed in April 2025 to use AI agents to automate the work of about 70,000 United States federal government employees, as part of a startup with funding from OpenAI and a partnership agreement with Palantir. This proposal was criticized by experts for its impracticality, if not impossibility, and the lack of corresponding widespread adoption by businesses. In December 2025, the Food and Drug Administration announced that it would offer "agentic AI capabilities" to its staff for "meeting management, pre-market reviews, review validation, post-market surveillance, inspections and compliance and administrative functions." That same month, the United States Department of Defense launched GenAI.mil, an internal platform for American military personnel to use generative AI-based applications based on Google Gemini, including "intelligent agentic workflows". Defense Secretary Pete Hegseth listed applications such as "[conducting] deep r

    Read more →
  • EDLUT

    EDLUT

    EDLUT (Event-Driven LookUp Table) is a computer application for simulating networks of spiking neurons. It was developed in the University of Granada and source code was released under GNU GPL version 3. EDLUT uses event-driven simulation scheme and lookup tables to efficiently simulate medium or large spiking neural networks. This allows this application to simulate detailed biological neuron models and to interface with experimental setups (such as a robotic arm) in real time.

    Read more →
  • Maximum-entropy Markov model

    Maximum-entropy Markov model

    In statistics, a maximum-entropy Markov model (MEMM), or conditional Markov model (CMM), is a graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. An MEMM is a discriminative model that extends a standard maximum entropy classifier by assuming that the unknown values to be learnt are connected in a Markov chain rather than being conditionally independent of each other. MEMMs find applications in natural language processing, specifically in part-of-speech tagging and information extraction. == Model == Suppose we have a sequence of observations O 1 , … , O n {\displaystyle O_{1},\dots ,O_{n}} that we seek to tag with the labels S 1 , … , S n {\displaystyle S_{1},\dots ,S_{n}} that maximize the conditional probability P ( S 1 , … , S n ∣ O 1 , … , O n ) {\displaystyle P(S_{1},\dots ,S_{n}\mid O_{1},\dots ,O_{n})} . In a MEMM, this probability is factored into Markov transition probabilities, where the probability of transitioning to a particular label depends only on the observation at that position and the previous position's label: P ( S 1 , … , S n ∣ O 1 , … , O n ) = ∏ t = 1 n P ( S t ∣ S t − 1 , O t ) . {\displaystyle P(S_{1},\dots ,S_{n}\mid O_{1},\dots ,O_{n})=\prod _{t=1}^{n}P(S_{t}\mid S_{t-1},O_{t}).} Each of these transition probabilities comes from the same general distribution P ( s ∣ s ′ , o ) {\displaystyle P(s\mid s',o)} . For each possible label value of the previous label s ′ {\displaystyle s'} , the probability of a certain label s {\displaystyle s} is modeled in the same way as a maximum entropy classifier: P ( s ∣ s ′ , o ) = P s ′ ( s ∣ o ) = 1 Z ( o , s ′ ) exp ⁡ ( ∑ a λ a f a ( o , s ) ) . {\displaystyle P(s\mid s',o)=P_{s'}(s\mid o)={\frac {1}{Z(o,s')}}\exp \left(\sum _{a}\lambda _{a}f_{a}(o,s)\right).} Here, the f a ( o , s ) {\displaystyle f_{a}(o,s)} are real-valued or categorical feature-functions, and Z ( o , s ′ ) {\displaystyle Z(o,s')} is a normalization term ensuring that the distribution sums to one. This form for the distribution corresponds to the maximum entropy probability distribution satisfying the constraint that the empirical expectation for the feature is equal to the expectation given the model: E e ⁡ [ f a ( o , s ) ] = E p ⁡ [ f a ( o , s ) ] for all a . {\displaystyle \operatorname {E} _{e}\left[f_{a}(o,s)\right]=\operatorname {E} _{p}\left[f_{a}(o,s)\right]\quad {\text{ for all }}a.} The parameters λ a {\displaystyle \lambda _{a}} can be estimated using generalized iterative scaling. Furthermore, a variant of the Baum–Welch algorithm, which is used for training HMMs, can be used to estimate parameters when training data has incomplete or missing labels. The optimal state sequence S 1 , … , S n {\displaystyle S_{1},\dots ,S_{n}} can be found using a very similar Viterbi algorithm to the one used for HMMs. The dynamic program uses the forward probability: α t + 1 ( s ) = ∑ s ′ ∈ S α t ( s ′ ) P s ′ ( s ∣ o t + 1 ) . {\displaystyle \alpha _{t+1}(s)=\sum _{s'\in S}\alpha _{t}(s')P_{s'}(s\mid o_{t+1}).} == Strengths and weaknesses == An advantage of MEMMs rather than HMMs for sequence tagging is that they offer increased freedom in choosing features to represent observations. In sequence tagging situations, it is useful to use domain knowledge to design special-purpose features. In the original paper introducing MEMMs, the authors write that "when trying to extract previously unseen company names from a newswire article, the identity of a word alone is not very predictive; however, knowing that the word is capitalized, that is a noun, that it is used in an appositive, and that it appears near the top of the article would all be quite predictive (in conjunction with the context provided by the state-transition structure)." Useful sequence tagging features, such as these, are often non-independent. Maximum entropy models do not assume independence between features, but generative observation models used in HMMs do. Therefore, MEMMs allow the user to specify many correlated, but informative features. Another advantage of MEMMs versus HMMs and conditional random fields (CRFs) is that training can be considerably more efficient. In HMMs and CRFs, one needs to use some version of the forward–backward algorithm as an inner loop in training. However, in MEMMs, estimating the parameters of the maximum-entropy distributions used for the transition probabilities can be done for each transition distribution in isolation. A drawback of MEMMs is that they potentially suffer from the "label bias problem," where states with low-entropy transition distributions "effectively ignore their observations." Conditional random fields were designed to overcome this weakness, which had already been recognised in the context of neural network-based Markov models in the early 1990s. Another source of label bias is that training is always done with respect to known previous tags, so the model struggles at test time when there is uncertainty in the previous tag.

    Read more →
  • Laws of Form

    Laws of Form

    Laws of Form (hereinafter LoF) is a book by G. Spencer-Brown, written by August 1967 and published in 1969. The book straddles the boundary between mathematics and philosophy. LoF describes three distinct logical systems: The primary arithmetic (described in Chapter 4 of LoF), whose models include Boolean arithmetic; The primary algebra (Chapter 6 of LoF), whose models include the two-element Boolean algebra (hereinafter abbreviated 2), Boolean logic, and the classical propositional calculus; Equations of the second degree (Chapter 11), whose interpretations include finite automata and Alonzo Church's Restricted Recursive Arithmetic (RRA). "Boundary algebra" is a Meguire (2011) term for the union of the primary algebra and the primary arithmetic. Laws of Form sometimes loosely refers to the "primary algebra" as well as to LoF. == Contents == The preface states that the work was first explored in 1959, and Spencer Brown cites Bertrand Russell as being supportive of his endeavour. He also thanks J. C. P. Miller of University College London for helping with the proofreading and offering other guidance. In 1963 Spencer Brown was invited by Harry Frost, staff lecturer in the physical sciences at the department of Extra-Mural Studies of the University of London, to deliver a course on the mathematics of logic. LoF emerged from work in electronic engineering its author did around 1960. Key ideas of the LOF were first outlined in his 1961 manuscript Design with the Nor, which remained unpublished until 2021, and further refined during subsequent lectures on mathematical logic he gave under the auspices of the University of London's extension program. LoF has appeared in several editions. The second series of editions appeared in 1972 with the "Preface to the First American Edition", which emphasised the use of self-referential paradoxes, and the most recent being a 1997 German translation. LoF has never gone out of print. LoF's mystical and declamatory prose and its love of paradox make it a challenging read for all. Spencer-Brown was influenced by Ludwig Wittgenstein and R. D. Laing. LoF also echoes a number of themes from the writings of Charles Sanders Peirce, Bertrand Russell, and Alfred North Whitehead. The work has had curious effects on some classes of its readership; for example, on obscure grounds, it has been claimed that the entire book is written in an operational way, giving instructions to the reader instead of telling them what "is", and that in accordance with G. Spencer-Brown's interest in paradoxes, the only sentence that makes a statement that something is, is the statement which says no such statements are used in this book. Furthermore, the claim asserts that except for this one sentence the book can be seen as an example of E-Prime. What prompted such a claim, is obscure, either in terms of incentive, logical merit, or as a matter of fact, because the book routinely and naturally uses the verb to be throughout, and in all its grammatical forms, as may be seen both in the original and in quotes shown below. == Reception == Ostensibly a work of formal mathematics and philosophy, LoF became something of a cult classic: it was praised by Heinz von Foerster when he reviewed it for the Whole Earth Catalog. Those who agree point to LoF as embodying an enigmatic "mathematics of consciousness", its algebraic symbolism capturing an (perhaps even "the") implicit root of cognition: the ability to "distinguish". LoF argues that primary algebra reveals striking connections among logic, Boolean algebra, and arithmetic, and the philosophy of language and mind. Stafford Beer wrote in a review for Nature in 1969, "When one thinks of all that Russell went through sixty years ago, to write the Principia, and all we his readers underwent in wrestling with those three vast volumes, it is almost sad". Banaschewski (1977) argues that the primary algebra is nothing but new notation for Boolean algebra. Indeed, the two-element Boolean algebra 2 can be seen as the intended interpretation of the primary algebra. Yet the notation of the primary algebra: Fully exploits the duality characterizing not just Boolean algebras but all lattices; Highlights how syntactically distinct statements in logic and 2 can have identical semantics; Dramatically simplifies Boolean algebra calculations, and proofs in sentential and syllogistic logic. Moreover, the syntax of the primary algebra can be extended to formal systems other than 2 and sentential logic, resulting in boundary mathematics. LoF has influenced, among others, Heinz von Foerster, Louis Kauffman, Niklas Luhmann, Humberto Maturana, Francisco Varela and William Bricken. Some of these authors have modified the primary algebra in a variety of interesting ways. LoF claimed that certain well-known mathematical conjectures of very long standing, such as the four color theorem, Fermat's Last Theorem, and the Goldbach conjecture, are provable using extensions of the primary algebra. Spencer-Brown eventually circulated a purported proof of the four color theorem, but it was met with skepticism. == The form (Chapter 1) == The symbol: Also called the "mark" or "cross", is the essential feature of the Laws of Form. In Spencer-Brown's inimitable and enigmatic fashion, the Mark symbolizes the root of cognition, i.e., the dualistic Mark indicates the capability of differentiating a "this" from "everything else but this". In LoF, a Cross denotes the drawing of a "distinction", and can be thought of as signifying the following, all at once: The act of drawing a boundary around something, thus separating it from everything else; That which becomes distinct from everything by drawing the boundary; Crossing from one side of the boundary to the other. All three ways imply an action on the part of the cognitive entity (e.g., person) making the distinction. As LoF puts it: "The first command: Draw a distinction can well be expressed in such ways as: Let there be a distinction, Find a distinction, See a distinction, Describe a distinction, Define a distinction, Or: Let a distinction be drawn". (LoF, Notes to chapter 2) The counterpoint to the Marked state is the Unmarked state, which is simply nothing, the void, or the un-expressable infinite represented by a blank space. It is simply the absence of a Cross. No distinction has been made and nothing has been crossed. The Marked state and the void are the two primitive values of the Laws of Form. The Cross can be seen as denoting the distinction between two states, one "considered as a symbol" and another not so considered. From this fact arises a curious resonance with some theories of consciousness and language. Paradoxically, the Form is at once Observer and Observed, and is also the creative act of making an observation. LoF (excluding back matter) closes with the words: ...the first distinction, the Mark and the observer are not only interchangeable, but, in the form, identical. C. S. Peirce came to a related insight in the 1890s; see § Related work. == The primary arithmetic (Chapter 4) == The syntax of the primary arithmetic goes as follows. There are just two atomic expressions: The empty Cross ; All or part of the blank page (the "void"). There are two inductive rules: A Cross may be written over any expression; Any two expressions may be concatenated. The semantics of the primary arithmetic are perhaps nothing more than the sole explicit definition in LoF: "Distinction is perfect continence". Let the "unmarked state" be a synonym for the void. Let an empty Cross denote the "marked state". To cross is to move from one value, the unmarked or marked state, to the other. We can now state the "arithmetical" axioms A1 and A2, which ground the primary arithmetic (and hence all of the Laws of Form): "A1. The law of Calling". Calling twice from a state is indistinguishable from calling once. To make a distinction twice has the same effect as making it once. For example, saying "Let there be light" and then saying "Let there be light" again, is the same as saying it once. Formally: = {\displaystyle \ =} "A2. The law of Crossing". After crossing from the unmarked to the marked state, crossing again ("recrossing") starting from the marked state returns one to the unmarked state. Hence recrossing annuls crossing. Formally: = {\displaystyle \ =} In both A1 and A2, the expression to the right of '=' has fewer symbols than the expression to the left of '='. This suggests that every primary arithmetic expression can, by repeated application of A1 and A2, be simplified to one of two states: the marked or the unmarked state. This is indeed the case, and the result is the expression's "simplification". The two fundamental metatheorems of the primary arithmetic state that: Every finite expression has a unique simplification. (T3 in LoF); Starting from an initial marked or unmarked state, "complicating" an expression by a finite number of repeated application of A1 and A2 cannot yield

    Read more →
  • BeeSafe

    BeeSafe

    BeeSafe is a personal safety mobile app launched in 2015 as a Slovak startup. It is a location-based security service that notifies family members and friends in case the user of the app gets in danger. The app has received numerous awards. The app has more than 700 downloads and 250 active logins from more than 60 countries worldwide. == History == BeeSafe was founded on March 20, 2015 by Peter Stražovec and Michal Kačerík. The project was a winner of Žilina’s Startup Weekend 2013 and a StartupAwards.SK 2015 finalist. Later on, the app was released in the Android and iOS marketplace. The whole BeeSafe project was in The Spot booster and incubator in Bratislava for three months. BeeSafe entered into an agreement with the city of Piešťany in November 2015 to increase the security of its citizen by connecting the mobile app with the police platform. It is the first city that started using the BeeSafe platform. Further on, the application tries to help people in other Slovak cities. The cities can see the users only if they are in danger. == Awards == BeeSafe app received the Via Bona award, it is a winner of a Slovak startup and has other nominations too.

    Read more →
  • Theano (software)

    Theano (software)

    Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures. == History == Theano is an open source project primarily developed by the Montreal Institute for Learning Algorithms (MILA) at the Université de Montréal. The name of the software references the ancient philosopher Theano, long associated with the development of the golden mean. On 28 September 2017, Pascal Lamblin posted a message from Yoshua Bengio, Head of MILA: major development would cease after the 1.0 release due to competing offerings by strong industrial players. Theano 1.0.0 was then released on 15 November 2017. On 17 May 2018, Chris Fonnesbeck wrote on behalf of the PyMC development team that the PyMC developers will officially assume control of Theano maintenance once the MILA development team steps down. On 29 January 2021, they started using the name Aesara for their fork of Theano. On 29 Nov 2022, the PyMC development team announced that the PyMC developers will fork the Aesara project under the name PyTensor. == Sample code == The following code is the original Theano's example. It defines a computational graph with 2 scalars a and b of type double and an operation between them (addition) and then creates a Python function f that does the actual computation. == Examples == === Matrix Multiplication (Dot Product) === The following code demonstrates how to perform matrix multiplication using Theano, which is essential for linear algebra operations in many machine learning tasks. === Gradient Calculation === The following code uses Theano to compute the gradient of a simple operation (like a neuron) with respect to its input. This is useful in training machine learning models (backpropagation). === Building a Simple Neural Network === The following code shows how to start building a simple neural network. This is a very basic neural network with one hidden layer. === Broadcasting in Theano === The following code demonstrates how broadcasting works in Theano. Broadcasting allows operations between arrays of different shapes without needing to explicitly reshape them.

    Read more →
  • The Best Free AI Background Remover for Beginners

    The Best Free AI Background Remover for Beginners

    In search of the best AI background remover? An AI background remover is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI background remover slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Myhill–Nerode theorem

    Myhill–Nerode theorem

    In the theory of formal languages, the Myhill–Nerode theorem provides a necessary and sufficient condition for a language to be regular. The theorem is named for John Myhill and Anil Nerode, who proved it at the University of Chicago in 1957 (Nerode & Sauer 1957, p. ii). == Statement == Given a language L {\displaystyle L} , and a pair of strings x {\displaystyle x} and y {\displaystyle y} , define a distinguishing extension to be a string z {\displaystyle z} such that exactly one of the two strings x z {\displaystyle xz} and y z {\displaystyle yz} belongs to L {\displaystyle L} . Define a relation ∼ L {\displaystyle \sim _{L}} on strings as x ∼ L y {\displaystyle x\;\sim _{L}\ y} if there is no distinguishing extension for x {\displaystyle x} and y {\displaystyle y} . It is easy to show that ∼ L {\displaystyle \sim _{L}} is an equivalence relation on strings, and thus it divides the set of all strings into equivalence classes. The Myhill–Nerode theorem states that a language L {\displaystyle L} is regular if and only if ∼ L {\displaystyle \sim _{L}} has a finite number of equivalence classes, and moreover, that this number is equal to the number of states in the minimal deterministic finite automaton (DFA) accepting L {\displaystyle L} . Furthermore, every minimal DFA for the language is isomorphic to the canonical one (Hopcroft & Ullman 1979). Generally, for any language, the constructed automaton is a state automaton acceptor. However, it does not necessarily have finitely many states. The Myhill–Nerode theorem shows that finiteness is necessary and sufficient for language regularity. Some authors refer to the ∼ L {\displaystyle \sim _{L}} relation as Nerode congruence, in honor of Anil Nerode. == Use and consequences == The Myhill–Nerode theorem may be used to show that a language L {\displaystyle L} is regular by proving that the number of equivalence classes of ∼ L {\displaystyle \sim _{L}} is finite. This may be done by an exhaustive case analysis in which, beginning from the empty string, distinguishing extensions are used to find additional equivalence classes until no more can be found. For example, the language consisting of binary representations of numbers that can be divided by 3 is regular. Given two binary strings x , y {\displaystyle x,y} , extending them by one digit gives 2 x + b , 2 y + b {\displaystyle 2x+b,2y+b} , so 2 x + b ≡ 2 y + b mod 3 {\displaystyle 2x+b\equiv 2y+b\mod 3} iff x ≡ y mod 3 {\displaystyle x\equiv y\mod 3} . Thus, 00 {\displaystyle 00} (or 11 {\displaystyle 11} ), 01 {\displaystyle 01} , and 10 {\displaystyle 10} are the only distinguishing extensions, resulting in the 3 classes. The minimal automaton accepting our language would have three states corresponding to these three equivalence classes. Another immediate corollary of the theorem is that if for a language L {\displaystyle L} the relation ∼ L {\displaystyle \sim _{L}} has infinitely many equivalence classes, it is not regular. It is this corollary that is frequently used to prove that a language is not regular. == Generalizations == The Myhill–Nerode theorem can be generalized to tree automata.

    Read more →
  • Automated machine learning

    Automated machine learning

    Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. It is the combination of automation and ML. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment. AutoML was proposed as an artificial intelligence-based solution to the growing challenge of applying machine learning. The high degree of automation in AutoML aims to allow non-experts to make use of machine learning models and techniques without requiring them to become experts in machine learning. Automating the process of applying machine learning end-to-end additionally offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform hand-designed models. Common techniques used in AutoML include hyperparameter optimization, meta-learning and neural architecture search. == Comparison to the standard approach == In a typical machine learning application, practitioners have a set of input data points to be used for training. The raw data may not be in a form that all algorithms can be applied to. To make the data amenable for machine learning, an expert may have to apply appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods. After these steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their model. If deep learning is used, the architecture of the neural network must also be chosen manually by the machine learning expert. Each of these steps may be challenging, resulting in significant hurdles to using machine learning. AutoML aims to simplify these steps for non-experts, and to make it easier for them to use machine learning techniques correctly and effectively. AutoML plays an important role within the broader approach of automating data science, which also includes challenging tasks such as data engineering, data exploration and model interpretation and prediction. == Targets of automation == Automated machine learning can target various stages of the machine learning process. Steps to automate are: Data preparation and ingestion (from raw data and miscellaneous formats) Column type detection; e.g., Boolean, discrete numerical, continuous numerical, or text Column intent detection; e.g., target/label, stratification field, numerical feature, categorical text feature, or free text feature Task detection; e.g., binary classification, regression, clustering, or ranking Feature engineering Feature selection Feature extraction Meta-learning and transfer learning Detection and handling of skewed data and/or missing values Model selection - choosing which machine learning algorithm to use, often including multiple competing software implementations Ensembling - a form of consensus where using multiple models often gives better results than any single model Hyperparameter optimization of the learning algorithm and featurization Neural architecture search Pipeline selection under time, memory, and complexity constraints Selection of evaluation metrics and validation procedures Problem checking Leakage detection Misconfiguration detection Analysis of obtained results Creating user interfaces and visualizations == Challenges and Limitations == There are a number of key challenges being tackled around automated machine learning. A big issue surrounding the field is referred to as "development as a cottage industry". This phrase refers to the issue in machine learning where development relies on manual decisions and biases of experts. This is contrasted to the goal of machine learning which is to create systems that can learn and improve from their own usage and analysis of the data. Basically, it's the struggle between how much experts should get involved in the learning of the systems versus how much freedom they should be giving the machines. However, experts and developers must help create and guide these machines to prepare them for their own learning. To create this system, it requires labor intensive work with knowledge of machine learning algorithms and system design. Additionally, other challenges include meta-learning and computational resource allocation.

    Read more →
  • Gato (DeepMind)

    Gato (DeepMind)

    Gato is a deep neural network for a range of complex tasks that exhibits multimodality. It can perform tasks such as engaging in a dialogue, playing video games, controlling a robot arm to stack blocks, and more. == Overview == Gato was created by researchers at London-based AI firm DeepMind. It is a transformer, like GPT-3. According to MIT Technology Review, the system "learns multiple different tasks at the same time, which means it can switch between them without having to forget one skill before learning another" whereas "[t]he AI systems of today are called “narrow,” meaning they can only do a specific, restricted set of tasks such as generate text", and according to The Independent, it is a "'generalist agent' that can carry out a huge range of complex tasks, from stacking blocks to writing poetry". It uses supervised learning with 1.2B parameters. The technology has been described as "general purpose" artificial intelligence and a "step toward" artificial general intelligence.

    Read more →
  • The Best Free AI Chatbot for Beginners

    The Best Free AI Chatbot for Beginners

    Trying to pick the best AI chatbot? An AI chatbot is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI chatbot slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →