AI For Business Strategy Mit

AI For Business Strategy Mit — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Winner-take-all in action selection

    Winner-take-all in action selection

    Winner-take-all is a computer science concept that has been widely applied in behavior-based robotics as a method of action selection for intelligent agents. Winner-take-all systems work by connecting modules (task-designated areas) in such a way that when one action is performed it stops all other actions from being performed, so only one action is occurring at a time. The name comes from the idea that the "winner" action takes all of the motor system's power. == History == In the 1980s and 1990s, many roboticists and cognitive scientists were attempting to find speedier and more efficient alternatives to the traditional world modeling method of action selection. In 1982, Jerome A. Feldman and D.H. Ballard published the "Connectionist Models and Their Properties", referencing and explaining winner-take-all as a method of action selection. Feldman's architecture functioned on the simple rule that in a network of interconnected action modules, each module will set its own output to zero if it reads a higher input than its own in any other module. In 1986, Rodney Brooks introduced behavior-based artificial intelligence. Winner-take-all architectures for action selection soon became a common feature of behavior-based robots, because selection occurred at the level of the action modules (bottom-up) rather than at a separate cognitive level (top-down), producing a tight coupling of stimulus and reaction. == Types of winner-take-all architectures == === Hierarchy === In the hierarchical architecture, actions or behaviors are programmed in a high-to-low priority list, with inhibitory connections between all the action modules. The agent performs low-priority behaviors until a higher-priority behavior is stimulated, at which point the higher behavior inhibits all other behaviors and takes over the motor system completely. Prioritized behaviors are usually key to the immediate survival of the agent, while behaviors of lower priority are less time-sensitive. For example, "run away from predator" would be ranked above "sleep." While this architecture allows for clear programming of goals, many roboticists have moved away from the hierarchy because of its inflexibility. === Heterarchy and fully distributed === In the heterarchy and fully distributed architecture, each behavior has a set of pre-conditions to be met before it can be performed, and a set of post-conditions that will be true after the action has been performed. These pre- and post-conditions determine the order in which behaviors must be performed and are used to causally connect action modules. This enables each module to receive input from other modules as well as from the sensors, so modules can recruit each other. For example, if the agent's goal were to reduce thirst, the behavior "drink" would require the pre-condition of having water available, so the module would activate the module in charge of "find water". The activations organize the behaviors into a sequence, even though only one action is performed at a time. The distribution of larger behaviors across modules makes this system flexible and robust to noise. Some critics of this model hold that any existing set of division rules for the predecessor and conflictor connections between modules produce sub-par action selection. In addition, the feedback loop used in the model can in some circumstances lead to improper action selection. === Arbiter and centrally coordinated === In the arbiter and centrally coordinated architecture, the action modules are not connected to each other but to a central arbiter. When behaviors are triggered, they begin "voting" by sending signals to the arbiter, and the behavior with the highest number of votes is selected. In these systems, bias is created through the "voting weight", or how often a module is allowed to vote. Some arbiter systems take a different spin on this type of winner-take-all by using a "compromise" feature in the arbiter. Each module is able to vote for or against each smaller action in a set of actions, and the arbiter selects the action with the most votes, meaning that it benefits the most behavior modules. This can be seen as violating the general rule against creating representations of the world in behavior-based AI, established by Brooks. By performing command fusion, the system is creating a larger composite pool of knowledge than is obtained from the sensors alone, forming a composite inner representation of the environment. Defenders of these systems argue that forbidding world-modeling puts unnecessary constraints on behavior-based robotics, and that agents benefits from forming representations and can still remain reactive.

    Read more →
  • Stephen Wolfram

    Stephen Wolfram

    Stephen Wolfram ( WUUL-frəm; born 29 August 1959) is a British-American computer scientist, physicist, and businessman. He is known for his work in computer algebra and theoretical physics. In 2012, he was named a fellow of the American Mathematical Society. As a businessman, Wolfram is the founder and CEO of the software company Wolfram Research, where he works as chief designer of Mathematica and the Wolfram Alpha answer engine. == Early life == === Family === Stephen Wolfram was born in London in 1959 to Hugo and Sybil Wolfram, both German Jewish refugees to the United Kingdom. His maternal grandmother was British psychoanalyst Kate Friedlander. Wolfram's father, Hugo Wolfram, was a textile manufacturer and served as managing director of the Lurex Company—makers of the fabric Lurex. Wolfram's mother, Sybil Wolfram, was a Fellow and Tutor in Philosophy at Lady Margaret Hall at University of Oxford from 1964 to 1993. Wolfram is married to a mathematician. They have four children together. === Education === Wolfram was educated at Eton College, but left prematurely in 1976. As a young child, Wolfram had difficulties learning arithmetic. He entered St. John's College, Oxford, at age 17 and left in 1978 without graduating to attend the California Institute of Technology the following year, where he received a PhD in particle physics in 1980. Wolfram's thesis committee was composed of Richard Feynman, Peter Goldreich, Frank J. Sciulli, and Steven Frautschi, and chaired by Richard D. Field. == Early career == Wolfram, at the age of 15, began research in applied quantum field theory and particle physics and published scientific papers in peer-reviewed scientific journals; by the time he left Oxford, he had published ten such papers. Following his PhD, Wolfram joined the faculty at Caltech and became the youngest recipient of a MacArthur Fellowship in 1981, at age 21. == Later career == === Complex systems and cellular automata === In 1983, Wolfram left for the School of Natural Sciences of the Institute for Advanced Study in Princeton. By that time, he was no longer interested in particle physics. Instead, he began pursuing investigations into cellular automata, mainly with computer simulations. He produced a series of papers investigating the class of elementary cellular automata, conceiving the Wolfram code, a naming system for one-dimensional cellular automata, and a classification scheme for the complexity of their behaviour. He conjectured that the Rule 110 cellular automaton might be Turing complete, which a research assistant to Wolfram, Matthew Cook, later proved correct. Wolfram sued Cook and temporarily blocked publication of the work on Rule 110 for allegedly violating a non-disclosure agreement until Wolfram could publish the work in his controversial book A New Kind of Science. Wolfram's cellular-automata work came to be cited in more than 10,000 papers. In the mid-1980s, Wolfram worked on simulations of physical processes (such as turbulent fluid flow) with cellular automata on the Connection Machine alongside Richard Feynman and helped initiate the field of complex systems. In 1984, he was a participant in the Founding Workshops of the Santa Fe Institute, along with Nobel laureates Murray Gell-Mann, Manfred Eigen, and Philip Warren Anderson, and future laureate Frank Wilczek. In 1986, he founded the Center for Complex Systems Research (CCSR) at the University of Illinois Urbana–Champaign. In 1987, he founded the journal Complex Systems. === Symbolic Manipulation Program === Wolfram led the development of the computer algebra system SMP (Symbolic Manipulation Program) in the Caltech physics department during 1979–1981. A dispute with the administration over the intellectual property rights regarding SMP—patents, copyright, and faculty involvement in commercial ventures—eventually led him to resign from Caltech. SMP was further developed and marketed commercially by Inference Corp. of Los Angeles during 1983–1988. === Mathematica === In 1986, Wolfram left the Institute for Advanced Study for the University of Illinois Urbana–Champaign, where he had founded their Center for Complex Systems Research, and started to develop the computer algebra system Mathematica, which was released on 23 June 1988, when he left academia. In 1987, he founded Wolfram Research, which continues to develop and market the program. === A New Kind of Science === From 1992 to 2002, Wolfram worked on his controversial book A New Kind of Science, which presents an empirical study of simple computational systems. Additionally, it argues that for fundamental reasons these types of systems, rather than traditional mathematics, are needed to model and understand complexity in nature. Wolfram's conclusion is that the universe is discrete in its nature, and runs on fundamental laws that can be described as simple programs. He predicts that a realization of this within scientific communities will have a revolutionary influence on physics, chemistry, biology, and most other scientific areas, hence the book's title. The book was met with skepticism and criticism that Wolfram took credit for the work of others and made conclusions without evidence to support them. === Wolfram Alpha computational knowledge engine === In March 2009, Wolfram announced Wolfram Alpha, an answer engine. Wolfram Alpha launched in May 2009, and a paid-for version with extra features launched in February 2012 that was met with criticism for its high price, which later dropped from $50 to $2. The engine is based on natural language processing and a large library of rules-based algorithms. The application programming interface allows other applications to extend and enhance Wolfram Alpha. === Touchpress === In 2010, Wolfram co-founded Touchpress with Theodore Gray, Max Whitby, and John Cromie. The company specialised in creating in-depth premium apps and games covering a wide range of educational subjects designed for children, parents, students, and educators. Touchpress published more than 100 apps. The company is no longer active. === Wolfram Language === In March 2014, at the annual South by Southwest (SXSW) event, Wolfram officially announced the Wolfram Language as a new general multi-paradigm programming language, though it was previously available through Mathematica and not an entirely new programming language. The documentation for the language was pre-released in October 2013 to coincide with the bundling of Mathematica and the Wolfram Language on every Raspberry Pi computer with some controversy because of the proprietary nature of the Wolfram Language. While the Wolfram Language has existed for over 30 years as the primary programming language used in Mathematica, it was not officially named until 2014, and is not widely used. === Wolfram Physics Project === In April 2020, Wolfram announced the "Wolfram Physics Project" as an effort to reduce and explain all the laws of physics within a paradigm of a hypergraph that is transformed by minimal rewriting rules that obey the Church–Rosser property. The effort is a continuation of the ideas he originally described in A New Kind of Science. Wolfram claims that "From an extremely simple model, we're able to reproduce special relativity, general relativity and the core results of quantum mechanics." Physicists are generally unimpressed with Wolfram's claim, and say his results are non-quantitative and arbitrary. == Personal interests and activities == Wolfram has a log of personal analytics, including emails received and sent, keystrokes made, meetings and events attended, recordings of phone calls, and even physical movement dating back to the 1980s. In the preface of A New Kind of Science, he noted that he recorded over 100 million keystrokes and 100 mouse miles. He has said that personal analytics "can give us a whole new dimension to experiencing our lives." Wolfram was a scientific consultant for the 2016 film Arrival. He and his son Christopher Wolfram wrote some of the code featured on screen, such as the code in graphics depicting an analysis of the alien logograms, for which they used the Wolfram Language.

    Read more →
  • Stefan Schaal

    Stefan Schaal

    Stefan Schaal (born 1961) is a German-American computer scientist specializing in robotics, machine learning, autonomous systems, and computational neuroscience. == Education and career == Schaal was born in Frankfurt am Main in Germany, Schaal grew up in the North Bavarian town of Nürnberg. After graduating from school, he served in the German army in the Ski Patrol Division of Bad Reichenhall, where he honorably discharged with the rank of a Lieutenant. Schaal studied mechanical engineering at the Technical University of Munich, graduating in 1987 with a Diploma degree (summa cum laude). Subsequently, Schaal did his Ph.D. in computer aided design and artificial intelligence at the Technical University of Munich and the Massachusetts Institute of Technology, receiving his Ph.D. in 1991 (Summa Cum Laude) under Klaus Ehrlenspiel. In 1991, Schaal was a Postdoctoral Fellow at the Department and Brain and Cognitive Science and the Artificial Intelligence Lab at the Massachusetts Institute of Technology, funded by the Alexander von Humboldt Foundation and the German Academic Scholarship Foundation. Starting from 1992, he became an invited researcher at the ATR Computational Neuroscience Labs in Japan, where he created a robotics lab focusing on biological principles of motor control and learning. In 1994, Schaal moved to the Georgia Institute of Technology as an adjunct assistant professor, and also held the same rank at the Pennsylvania State University. In 1996, Schaal assumed a group leader position in the ERATO Kawato Dynamic Brain Project in Japan. Schaal joined the University of Southern California (USC) in 1997, where he advanced from the ranks of assistant professor, to associate professor, to full professor. In 2009, Schaal became a founder in defining and creating the Max Planck Institute for Intelligent Systems in Tübingen and Stuttgart, Germany, an institute focusing on principles of perception-action-learning systems in synthetic intelligence. In 2012, Schaal founded the Autonomous Motion Department (AMD) at this institute, while maintaining a partial appointment at USC. Stefan Schaal joined Google X as lead of a robotics research team in late 2018. == Research == Stefan Schaal's interests focus on autonomous perception-action-learning systems, in particular anthropomorphic robotic systems. He works on topics of machine learning for control, control theory, computational neuroscience for neuromotor control, experimental robotics, reinforcement learning, artificial intelligence, and nonlinear dynamical systems. Stefan has co-authored more than 400 publications in top conferences and journals, and served as organizer on various top conferences in machine learning and robotics. He has received numerous best paper awards and honors in his scientific community. Stefan Schaal has been noted as one of the five leaders in robotics in 2011, and among the top robotics experts in the world. == Controversy == In 2018, the German newsjournal Der Spiegel published an article reporting on his double affiliation with USC and the Max-Planck Society, both with full salaries, which was apparently unknown to either party. Schaal rejected the allegations, but was forced to leave his position at the Max Planck Institute.

    Read more →
  • OCR-B

    OCR-B

    OCR-B is a monospace font developed in 1968 by Adrian Frutiger for Monotype by following the European Computer Manufacturer's Association standard. Its function was to facilitate the optical character recognition operations by specific electronic devices, originally for financial and bank-oriented uses. It was accepted as the world standard in 1973. It follows the ISO 1073-2:1976 (E) standard, refined in 1979 ("letterpress" design, size I). It includes all ASCII symbols, and other symbols needed in the bank environment. It is widely used for the human readable digits in UPC/EAN barcodes. It is also used for machine-readable passports. It shares that purpose with OCR-A, but it is easier for the human eye and brain to read and it has a less technical look than OCR-A. == History == In June 1961, the European Computer Manufacturers Association (ECMA) started standardization activities related to Optical Character Recognition (OCR). After evaluating existing OCR designs, it was decided to develop two new fonts: A stylized design with just digits, called “Class A”; and a more conventional type design with broader character coverage, called “Class B”. In February 1965, ECMA proposed a design for the “Class B” font to ISO, who adopted it as international standard ISO 1073-2 in October 1965. The first revision contained three font sizes: I, II and III. The specification included a Letterpress design, intended for high-quality printing equipment; and a rounded-edge Constant Strokewidth design for impact printers with reduced typographic quality. In September 1969, ECMA started work to revise its published standard. To make OCR-B more widely accepted, the shapes of some characters were slightly modified. The new revision removed font size II, which had been rarely used in practice; it deleted five character shapes; and it added a new font size IV. ECMA published the second edition of OCR-B in October 1971. In March 1976, ECMA published a third revision of its ECMA-11 specification. It added the symbols § and ¥ to OCR-B; two types of erasure marks (█) for blackening out mis-printed characters were added; and the length of the Vertical bar was changed to match ISO 1073-2. In 1993, Turkey proposed extending ISO 1073-2 to include the Turkish letters Ğğ, İı, and Şş. The request was generalized to extend OCR-B with a number of Latin and Greek letters used in European languages. A revision of the ISO 1073-2:1976 standard was therefore started, producing three successive draft documents. The final draft would have extended OCR-B with 40 Latin and 10 Greek letters; for six Latin letters, the draft gave new alternate shapes. A request to extend OCR-B with Vietnamese accents was rejected. Other than previous versions of the standard, which specified glyph shapes via reference drawings, the new revision would have included the shapes in machine-readable form. However, industry support for testing the new font could not be secured at the time, so the revision effort was halted in 1997. The working group described their findings in a technical report. In June 1998, the European Committee for Standardization published a report for adding the Euro sign to OCR-B. The report proposed both a single-stroked and a double-stroked variant of the Euro sign, leaving the decision to further testing of OCR performance. Testing was difficult: the theoretical design methods used when the OCR-B glyphs were originally developed could no longer be reproduced, and the technological constraints of the 1960s were also not entirely relevant anymore in the OCR environments of the 1990s. A new test method was devised, using present-time OCR technology. The tests found no difference in OCR performance between the two Euro variants, and recommended the adoption of the double-stroked variant as it matches the conventional glyph shape. The project did not have funds to thoroughly test the glyph extensions of the 1993 proposal; initial results were inconclusive. == Availability == Microsoft Office ships a version of Letterpress OCR-B produced by Monotype. It covers Windows-1252. Many vendors, including Adobe, still sell their versions of OCR-A and OCR-B. The TeX typesetting system has a public domain Constant Strokewidth OCR-B font in METAFONT definition form. It was created by Norbert Swartz in 1995 and updated in 2010. It has a setting for square stroke ends. The definition has also been translated to METATYPE1, so the rounded version is available in TrueType and OpenType too. A version of Constant Strokewidth OCR-B by Matthew Anderson has extended character coverage. It is available under CC-BY 4.0.

    Read more →
  • Hierarchical Risk Parity

    Hierarchical Risk Parity

    Hierarchical Risk Parity (HRP) is an advanced investment portfolio optimization framework developed in 2016 by Marcos López de Prado at Guggenheim Partners and Cornell University. HRP is a probabilistic graph-based alternative to the prevailing mean-variance optimization (MVO) framework developed by Harry Markowitz in 1952, and for which he received the Nobel Prize in economic sciences. HRP algorithms apply discrete mathematics and machine learning techniques to create diversified and robust investment portfolios that outperform MVO methods out-of-sample. HRP aims to address the limitations of traditional portfolio construction methods, particularly when dealing with highly correlated assets. Following its publication, HRP has been implemented in numerous open-source libraries, and received multiple extensions. == Key features == HRP portfolios have been proposed as a robust alternative to traditional quadratic optimization methods, including the Critical Line Algorithm (CLA) of Markowitz. HRP addresses three central issues commonly associated with quadratic optimizers: numerical instability, excessive concentration in a small number of assets, and poor out-of-sample performance. HRP leverages techniques from graph theory and machine learning to construct diversified portfolios using only the information embedded in the covariance matrix. Unlike quadratic programming methods, HRP does not require the covariance matrix to be invertible. Consequently, HRP remains applicable even in cases where the covariance matrix is ill-conditioned or singular—conditions under which standard optimizers fail. Monte Carlo simulations indicate that HRP achieves lower out-of-sample variance than CLA, despite the fact that minimizing variance is the explicit optimization objective of CLA. Furthermore, HRP portfolios exhibit lower realized risk compared to those generated by traditional risk parity methodologies. Empirical backtests have demonstrated that HRP would have historically outperformed conventional portfolio construction techniques. Algorithms within the HRP framework are characterized by the following features: Machine Learning Approach: HRP employs hierarchical clustering, a machine learning technique, to group similar assets based on their correlations. This allows the algorithm to identify the underlying hierarchical structure of the portfolio, and avoid that errors spread through the entire network. Risk-Based Allocation: The algorithm allocates capital based on risk, ensuring that assets only compete with similar assets for representation in the portfolio. This approach leads to better diversification across different risk sources, while avoiding the instability associated with noisy returns estimates. Covariance Matrix Handling: Unlike traditional methods like Mean-Variance Optimization, HRP does not require inverting the covariance matrix. This makes it more stable and applicable to portfolios with a large number of assets, particularly when the covariance matrix's condition number is high. == The problem: Markowitz's Curse == Portfolio construction is perhaps the most recurrent financial problem. On a daily basis, investment managers must build portfolios that incorporate their views and forecasts on risks and returns. Despite the theoretical elegance of Markowitz's mean-variance framework, its practical implementation is hindered by several limitations that undermine the reliability of solutions derived from the Critical Line Algorithm (CLA). A principal concern is the high sensitivity of optimal portfolios to small perturbations in expected returns: even minor forecasting errors can result in significantly different allocations (Michaud, 1998). Given the inherent difficulty of producing accurate return forecasts, numerous researchers have advocated for approaches that forgo expected returns entirely and instead rely solely on the covariance structure of asset returns. This has given rise to risk-based allocation methods, among which risk parity is a widely cited example (Jurczenko, 2015). While eliminating return forecasts mitigates some instability, it does not eliminate it. Quadratic programming techniques employed in portfolio optimization require the inversion of a positive-definite covariance matrix, meaning all eigenvalues must be strictly positive. When the matrix is numerically ill-conditioned—that is, when the ratio of its largest to smallest eigenvalue (its condition number) is large—matrix inversion becomes unreliable and prone to significant numerical errors (Bailey and López de Prado, 2012). The condition number of a covariance, correlation, or any symmetric (and thus diagonalizable) matrix is defined as the absolute value of the ratio between its largest and smallest eigenvalues in modulus. The figure on the right presents the sorted eigenvalues of several correlation matrices; the condition number is represented by the ratio of the first to last eigenvalues in each sequence. A diagonal correlation matrix, which is equal to its own inverse, exhibits the minimum possible condition number. As the number of correlated (or multicollinear) assets in a portfolio increases, the condition number rises. At high levels, this leads to severe numerical instability, whereby slight modifications in any matrix entry may result in drastically different inverses. This phenomenon, often referred to as Markowitz’s curse, encapsulates the paradox wherein increased correlation among assets heightens the theoretical need for diversification, yet simultaneously increases the likelihood of unstable optimization outcomes. Consequently, the potential benefits of diversification are frequently overshadowed by estimation errors. These problems are exacerbated as the dimensionality of the covariance matrix increases. The estimation of each covariance term consumes degrees of freedom, and in general, a minimum of 1 2 N ( N + 1 ) {\displaystyle {\frac {1}{2}}N(N+1)} independent and identically distributed (IID) observations is required to estimate a non-singular covariance matrix of dimension N {\displaystyle N} . For example, constructing an invertible covariance matrix of dimension 50 necessitates at least five years of daily IID observations. However, empirical evidence suggests that the correlation structure of financial assets is highly unstable over such extended periods. These difficulties are highlighted by the observation that even naïve allocation strategies—such as equally weighted portfolios—have frequently outperformed both mean-variance and risk-based optimizations in out-of-sample tests (De Miguel et al., 2009). == The solution: Hierarchical Risk Parity == The HRP algorithm addresses Markowitz's curse in three steps: Hierarchical Clustering: Assets are grouped into clusters based on their correlations, forming a hierarchical tree structure. Quasi-Diagonalization: The correlation matrix is reordered based on the clustering results, revealing a block diagonal structure. Recursive Bisection: Weights are assigned to assets through a top-down approach, splitting the portfolio into smaller sub-portfolios and allocating capital based on inverse variance. === Step 1: Hierarchical clustering === Given a T × N {\displaystyle T\times N} matrix of asset returns X {\displaystyle X} , where each column represents a time series of returns for one of N {\displaystyle N} assets over T {\displaystyle T} time periods, a hierarchical clustering process can be used to construct a tree-based representation of asset relationships. First, we compute the N × N {\displaystyle N\times N} correlation matrix ρ = ρ i , j i , j = 1 . . . N {\displaystyle \rho ={\rho _{i,j}}\;{i,j=1\;...\;N}} , where ρ i , j = c o r r ( X i , X j ) {\displaystyle \rho _{i,j}=\mathrm {corr} (X_{i},X_{j})} . From this, a pairwise distance matrix D = d i , j {\displaystyle D={d_{i,j}}} is defined using the transformation: d i , j = 1 2 ( 1 − ρ i , j ) {\displaystyle d_{i,j}={\sqrt {{\frac {1}{2}}(1-\rho _{i,j})}}} This distance function defines a proper metric space, satisfying non-negativity, identity of indiscernibles, symmetry, and the triangle inequality. Next, a secondary distance matrix D ~ = d ~ i , j {\displaystyle {\tilde {D}}={{\tilde {d}}_{i,j}}} is computed, where each entry measures the Euclidean distance between the distance profiles of two assets: d ~ i , j = ∑ n = 1 N ( d n , i − d n , j ) 2 {\displaystyle {\tilde {d}}_{i,j}={\sqrt {\sum _{n=1}^{N}(d_{n,i}-d_{n,j})^{2}}}} While d i , j {\displaystyle d_{i,j}} reflects correlation-based proximity between two assets, d ~ i , j {\displaystyle {\tilde {d}}_{i,j}} quantifies dissimilarity across the entire system, as it depends on all pairwise distances. Hierarchical clustering proceeds by identifying the pair ( i , j ) {\displaystyle (i,j)} with the smallest value of d ~ i , j {\displaystyle {\tilde {d}}_{i,j}} (for i ≠ j {\displaystyle i\neq j} ), and forming a new cluster u [ 1 ] = ( i , j ) {\displaystyle u[1]=(i,j)} .

    Read more →
  • Probabilistic automaton

    Probabilistic automaton

    In mathematics and computer science, the probabilistic automaton (PA) is a generalization of the nondeterministic finite automaton; it includes the probability of a given transition into the transition function, turning it into a transition matrix. Thus, the probabilistic automaton also generalizes the concepts of a Markov chain and of a subshift of finite type. The languages recognized by probabilistic automata are called stochastic languages; these include the regular languages as a subset. The number of stochastic languages is uncountable. The concept was introduced by Michael O. Rabin in 1963; a certain special case is sometimes known as the Rabin automaton (not to be confused with the subclass of ω-automata also referred to as Rabin automata). In recent years, a variant has been formulated in terms of quantum probabilities, the quantum finite automaton. == Informal Description == For a given initial state and input character, a deterministic finite automaton (DFA) has exactly one next state, and a nondeterministic finite automaton (NFA) has a set of next states. A probabilistic automaton (PA) instead has a weighted set (or vector) of next states, where the weights must sum to 1 and therefore can be interpreted as probabilities (making it a stochastic vector). The notions states and acceptance must also be modified to reflect the introduction of these weights. The state of the machine as a given step must now also be represented by a stochastic vector of states, and a state accepted if its total probability of being in an acceptance state exceeds some cut-off. A PA is in some sense a half-way step from deterministic to non-deterministic, as it allows a set of next states but with restrictions on their weights. However, this is somewhat misleading, as the PA utilizes the notion of the real numbers to define the weights, which is absent in the definition of both DFAs and NFAs. This additional freedom enables them to decide languages that are not regular, such as the p-adic languages with irrational parameters. As such, PAs are more powerful than both DFAs and NFAs (which are famously equally powerful). == Formal Definition == The probabilistic automaton may be defined as an extension of a nondeterministic finite automaton ( Q , Σ , δ , q 0 , F ) {\displaystyle (Q,\Sigma ,\delta ,q_{0},F)} , together with two probabilities: the probability P {\displaystyle P} of a particular state transition taking place, and with the initial state q 0 {\displaystyle q_{0}} replaced by a stochastic vector giving the probability of the automaton being in a given initial state. For the ordinary non-deterministic finite automaton, one has a finite set of states Q {\displaystyle Q} a finite set of input symbols Σ {\displaystyle \Sigma } a transition function δ : Q × Σ → ℘ ( Q ) {\displaystyle \delta :Q\times \Sigma \to \wp (Q)} a set of states F {\displaystyle F} distinguished as accepting (or final) states F ⊆ Q {\displaystyle F\subseteq Q} . Here, ℘ ( Q ) {\displaystyle \wp (Q)} denotes the power set of Q {\displaystyle Q} . By use of currying, the transition function δ : Q × Σ → ℘ ( Q ) {\displaystyle \delta :Q\times \Sigma \to \wp (Q)} of a non-deterministic finite automaton can be written as a membership function δ : Q × Σ × Q → { 0 , 1 } {\displaystyle \delta :Q\times \Sigma \times Q\to \{0,1\}} so that δ ( q , a , q ′ ) = 1 {\displaystyle \delta (q,a,q^{\prime })=1} if q ′ ∈ δ ( q , a ) {\displaystyle q^{\prime }\in \delta (q,a)} and 0 {\displaystyle 0} otherwise. The curried transition function can be understood to be a matrix with matrix entries [ θ a ] q q ′ = δ ( q , a , q ′ ) {\displaystyle \left[\theta _{a}\right]_{qq^{\prime }}=\delta (q,a,q^{\prime })} The matrix θ a {\displaystyle \theta _{a}} is then a square matrix, whose entries are zero or one, indicating whether a transition q → a q ′ {\displaystyle q{\stackrel {a}{\rightarrow }}q^{\prime }} is allowed by the NFA. Such a transition matrix is always defined for a non-deterministic finite automaton. The probabilistic automaton replaces these matrices by a family of right stochastic matrices P a {\displaystyle P_{a}} , for each symbol a in the alphabet Σ {\displaystyle \Sigma } so that the probability of a transition is given by [ P a ] q q ′ {\displaystyle \left[P_{a}\right]_{qq^{\prime }}} A state change from some state to any state must occur with probability one, of course, and so one must have ∑ q ′ [ P a ] q q ′ = 1 {\displaystyle \sum _{q^{\prime }}\left[P_{a}\right]_{qq^{\prime }}=1} for all input letters a {\displaystyle a} and internal states q {\displaystyle q} . The initial state of a probabilistic automaton is given by a row vector v {\displaystyle v} , whose components are the probabilities of the individual initial states q {\displaystyle q} , that add to 1: ∑ q [ v ] q = 1 {\displaystyle \sum _{q}\left[v\right]_{q}=1} The transition matrix acts on the right, so that the state of the probabilistic automaton, after consuming the input string a b c {\displaystyle abc} , would be v P a P b P c {\displaystyle vP_{a}P_{b}P_{c}} In particular, the state of a probabilistic automaton is always a stochastic vector, since the product of any two stochastic matrices is a stochastic matrix, and the product of a stochastic vector and a stochastic matrix is again a stochastic vector. This vector is sometimes called the distribution of states, emphasizing that it is a discrete probability distribution. Formally, the definition of a probabilistic automaton does not require the mechanics of the non-deterministic automaton, which may be dispensed with. Formally, a probabilistic automaton PA is defined as the tuple ( Q , Σ , P , v , F ) {\displaystyle (Q,\Sigma ,P,v,F)} . A Rabin automaton is one for which the initial distribution v {\displaystyle v} is a coordinate vector; that is, has zero for all but one entries, and the remaining entry being one. == Stochastic languages == The set of languages recognized by probabilistic automata are called stochastic languages. They include the regular languages as a subset. Let F = Q accept ⊆ Q {\displaystyle F=Q_{\text{accept}}\subseteq Q} be the set of "accepting" or "final" states of the automaton. By abuse of notation, Q accept {\displaystyle Q_{\text{accept}}} can also be understood to be the column vector that is the membership function for Q accept {\displaystyle Q_{\text{accept}}} ; that is, it has a 1 at the places corresponding to elements in Q accept {\displaystyle Q_{\text{accept}}} , and a zero otherwise. This vector may be contracted with the internal state probability, to form a scalar. The language recognized by a specific automaton is then defined as L η = { s ∈ Σ ∗ | v P s Q accept > η } {\displaystyle L_{\eta }=\{s\in \Sigma ^{}\vert vP_{s}Q_{\text{accept}}>\eta \}} where Σ ∗ {\displaystyle \Sigma ^{}} is the set of all strings in the alphabet Σ {\displaystyle \Sigma } (so that is the Kleene star). The language depends on the value of the cut-point η {\displaystyle \eta } , normally taken to be in the range 0 ≤ η < 1 {\displaystyle 0\leq \eta <1} . A language is called η-stochastic if and only if there exists some PA that recognizes the language, for fixed η {\displaystyle \eta } . A language is called stochastic if and only if there is some 0 ≤ η < 1 {\displaystyle 0\leq \eta <1} for which L η {\displaystyle L_{\eta }} is η-stochastic. A cut-point is said to be an isolated cut-point if and only if there exists a δ > 0 {\displaystyle \delta >0} such that | v P ( s ) Q accept − η | ≥ δ {\displaystyle \vert vP(s)Q_{\text{accept}}-\eta \vert \geq \delta } for all s ∈ Σ ∗ {\displaystyle s\in \Sigma ^{}} == Properties == Every regular language is stochastic, and more strongly, every regular language is η-stochastic. A weak converse is that every 0-stochastic language is regular; however, the general converse does not hold: there are stochastic languages that are not regular. Every η-stochastic language is stochastic, for some 0 < η < 1 {\displaystyle 0<\eta <1} . Every stochastic language is representable by a Rabin automaton. If η {\displaystyle \eta } is an isolated cut-point, then L η {\displaystyle L_{\eta }} is a regular language. == p-adic languages == The p-adic languages provide an example of a stochastic language that is not regular, and also show that the number of stochastic languages is uncountable. A p-adic language is defined as the set of strings L η ( p ) = { n 1 n 2 n 3 … | 0 ≤ n k < p and 0. n 1 n 2 n 3 … > η } {\displaystyle L_{\eta }(p)=\{n_{1}n_{2}n_{3}\ldots \vert 0\leq n_{k}\eta \}} in the letters 0 , 1 , 2 , … , ( p − 1 ) {\displaystyle 0,1,2,\ldots ,(p-1)} . That is, a p-adic language is merely the set of real numbers in [0, 1], written in base-p, such that they are greater than η {\displaystyle \eta } . It is straightforward to show that all p-adic languages are stochastic. In particular, this implies that the number of stochastic languages is uncountable. A p-adic

    Read more →
  • PROMT

    PROMT

    ProMT is a lead Russian developer of language translation software for businesses and private users since 1991. The company provides on-premises software based on neural technologies. == History == On March 6, 1998, ProMT launched a free online translation services, which is now known as PROMT.One. In 1997, ProMT and the French company Softissimo developed a line of products for the European company Reverso. == Technology == Historically, ProMT systems used rule-based machine translation (RBMT) technology. In 2011 a hybrid approach which combined rule-based and statistical MT was implemented. In 2019, ProMT introduced its new neural technology and flagship solution - PROMT Neural Translation Server. Since then all MT systems developed by ProMT are based on neural machine translation. The software can run on Microsoft Windows, Linux, MacOS, iOS and Android and works in offline mode providing secure machine translation. As of 2025, it translates 62 languages from and to English, German, and Russian.

    Read more →
  • AI Code-review Tools Reviews: What Actually Works in 2026

    AI Code-review Tools Reviews: What Actually Works in 2026

    Shopping for the best AI code-review tool? An AI code-review tool is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI code-review tool slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Nextcloud

    Nextcloud

    Nextcloud is a modular workspace platform designed to provide teams and businesses with a comprehensive environment for digital collaboration. Beyond central data management, it integrates office suites like Collabora Online and EuroOffice office suites. for seamless, cooperative workflows. The platform features built-in tools for chat, videoconferencing, and a privacy-focused AI assistant capable of running entirely on local LLMs. Supported by a rich ecosystem of apps, it can be hosted in the cloud or on premises and can scale up to millions of users. It has been translated into over 100 languages. == Features == Nextcloud files are stored in conventional directory structures, accessible via WebDAV if necessary. A SQLite, MySQL/MariaDB or PostgreSQL database is required to provide additional functionality like permissions, shares, and comments. Nextcloud can synchronize with local clients running Windows (Windows 8.1 and above), macOS (10.14 or later), Linux and FreeBSD. Nextcloud permits user and group administration locally or via different backends like OpenID or LDAP. Content can be shared inside the system by defining granular read/write permissions between users and groups. Nextcloud users can create public URLs when sharing files. Logging of file-related actions, as well as disallowing access based on file access rules is also available. Security options like brute-force protection and multi-factor authentication using TOTP, WebAuthn, Oauth2, and OpenID Connect are available. Nextcloud has planned new features such as monitoring capabilities, full-text search and Kerberos authentication, as well as audio/video conferencing, expanded federation and smaller user interface improvements. == History == In April 2016 Frank Karlitschek and most core contributors left ownCloud Inc. These included some of ownCloud's staff according to sources near to the ownCloud community. Karlitschek and many of these contributors went on to fork ownCloud, creating Nextcloud. The fork was preceded by a blog post of Karlitschek announcing his departure and raising questions about the management of the ownCloud, its community, and priorities between growth, money, and sustainability. There have been no official statements about the reason for the fork. However, Karlitschek mentioned the fork several times in a talk at the 2018 FOSDEM conference and in two appearances on the FLOSS Weekly podcast, emphasizing cultural mismatch between open source developers and business oriented people not used to the open source community. On June 2, within 12 hours of the announcement of the fork, the American entity "ownCloud Inc." announced that it is shutting down with immediate effect, stating that "[...] main lenders in the US have cancelled our credit. Following American law, we are forced to close the doors of ownCloud, Inc. with immediate effect and terminate the contracts of 8 employees." ownCloud Inc. accused Karlitschek of poaching developers, while Nextcloud developers such as Arthur Schiwon stated that he "decided to quit because not everything in the ownCloud Inc. company world evolved as I imagined". ownCloud GmbH continued operations, secured financing from new investors and took over the business of ownCloud Inc. In April 2018 Informationstechnikzentrum Bund (ITZBund) reported Nextcloud won the tender for "Bundescloud" (Germany government cloud) project. In August 2019 it was announced that the governments of France, Sweden and the Netherlands would use Nextcloud for file transfer. In January 2020 Nextcloud 18 "Nextcloud Hub" was released. The major change was direct integration with an Office suite (OnlyOffice) and Nextcloud announced that their goal was to compete with Office 365 and Google Docs. A partnership with Ionos was revealed – its hosting location in Germany and compliance with GDPR should support the goal of data sovereignty. In spring 2020 remote work and web conferencing usage increased due to the COVID-19 pandemic and Nextcloud released version 19 with chat and videoconferencing Talk app integrated into the application core. Communication with an optional "high performance back-end" allows self-hosting of web conferences with more than 10 participants. Collabora Online was introduced as another integrated office suite. In August 2021 Nextcloud was chosen as a collaboration platform for European cloud software GAIA-X. In a September 2021 European Commission report it was mentioned as "the most widely deployed Open Source content collaboration platform" Following the 2025 United States tariffs against the European Union, fear of overreliance on US cloud providers such as Microsoft 365 and Google Workspace increased, with Nextcloud being one of the foremost contenders to replace them. Some governmental organisations including the European Data Protection Supervisor and the German state of Schleswig-Holstein have since switched from Microsoft's Sharepoint to Nextcloud. According to Nextcloud, during the first 5 months of 2025, customer interest in the software had tripled.

    Read more →
  • Best AI Virtual Assistants in 2026

    Best AI Virtual Assistants in 2026

    Shopping for the best AI virtual assistant? An AI virtual assistant is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI virtual assistant slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Jaime Carbonell

    Jaime Carbonell

    Jaime Guillermo Carbonell (July 29, 1953 – February 28, 2020) was a computer scientist who made seminal contributions to the development of natural language processing tools and technologies. His research in machine translation resulted in the development of several state-of-the-art language translation and artificial intelligence systems. He earned his B.S. degrees in Physics and in Mathematics from MIT in 1975 and did his Ph.D. under Dr. Roger Schank at Yale University in 1979. He joined Carnegie Mellon University as an assistant professor of computer science in 1979 and moved to Pittsburgh. He was affiliated with the Language Technologies Institute, Computer Science Department, Machine Learning Department, and Computational Biology Department at Carnegie Mellon. His interests spanned several areas of artificial intelligence, language technologies and machine learning. In particular, his research focused on areas such as text mining (extraction, categorization, novelty detection) and in new theoretical frameworks such as a unified utility-based theory bridging information retrieval, summarization, free-text question-answering and related tasks. He also worked on machine translation, both high-accuracy knowledge-based MT and machine learning for corpus-based MT (such as generalized example-based MT). == Career == Carbonell was the Allen Newell Professor of Computer Science and head of the Language Technologies Institute at Carnegie Mellon University. He joined Carnegie Mellon in 1979, and became a key faculty member in the artificial intelligence area. He was appointed full professor in 1987, Newell Chair in 1995, and University Professor in 2012. He completed his undergraduate studies at MIT. He received dual degrees in Mathematics and Physics. He received his Ph.D. in computer science from Yale University in 1979. At the time of his appointment, Carbonell was the youngest chaired professor in the School of Computer Science at CMU. His research spanned several areas of computer science, mostly in artificial intelligence, including: machine learning, data and text mining, natural language processing, very-large-scale knowledge bases, translingual information retrieval and automated summarization. He wrote more than 300 technical papers and gave over 500 invited or refereed-paper presentations (colloquia, seminars, panels, conferences, keynotes, etc.). He died following a long illness on February 28, 2020. Mona Talat Diab became the director of CMU's Language Technologies Institute in 2023. == Research == Carbonell created MMR (maximal marginal relevance) technology for text summarization and informational novelty detection in search engines, invention of transformational analogy, a generalized method for case-based reasoning (CBR) to re-use, modify and compose past successful plans for increasingly complex problems and knowledge-based interlingual machine translation. He was instrumental in setting up the Computational Biolinguistics Program, a joint venture between Carnegie Mellon and the University of Pittsburgh, which combines Language Technologies and Machine Learning to model and predict genomic, proteomic and glycomic 3D structures. Carbonell also did work in machine learning. He organized the first four machine learning conferences, starting with CMU in 1981. The Language Technologies Institute (LTI), founded and directed by Carbonell, achieved top honors in multiple areas. These areas include machine translation, search engines (including founding of Lycos by Michael Mauldin, one of Carbonell’s PhD students), speech synthesis, and education. LTI remains the original, largest and best-known institute for language technologies, with over $12M in annual funding and 200 researchers (faculty, staff, PhD students, MS students, visiting scholars etc.). Carbonell made major technical contributions in several fields, including (1) Creation of MMR (maximal marginal relevance) technology for text summarization and informational novelty detection in search engines,(2) Proactive machine learning for multi-source cost-sensitive active learning, (3) Linked conditional random fields for predicting tertiary and quaternary protein folds, (4) Symmetric optimal phrasal alignment method for trainable example-based and statistical machine translation, (5) Series- anomaly modeling for financial fraud detection and syndromic surveillance, (6) Knowledge-based interlingual machine translation, (7) Robust case-frame parsing, (8) Seeded version-space learning and (9) Invention of transformational and derivational analogy, generalized methods for case-based reasoning (CBR) to re-use, modify and compose past successful plans for increasingly complex problems. The teams led by Carbonell achieved top honors in many areas such as first scalable high-accuracy interlingual machine translation (1991), first speech-to-speech machine translation (1992), first large-scale spider and search engine (1994), and first trainable, large-scale protein-structure topology predictor (2005). Modern machine learning, co-founded by Carbonell, Michalski and Mitchell, is a fundamental enabling technology in search engines, data mining and social networking. Starting in 1980, he co-edited the first three books on ML, launched the ML conferences and was a co-founder and editor-in-chief of ML Journal. Carbonell’s innovations have led to several successful start-ups: Carnegie Group (AI expertsystems), Lycos (web search), Wisdom (financial optimization & ML), Carnegie Speech (spoken-language tutoring), Dynamix (data mining and pattern discovery), and Meaningful Machines (context-based machine translation). Carbonell was the founding director of The Language Technology Institute, the preeminent global institution in language studies, unparalleled in size and scope and has since been adopted/imitated in Germany (DFKI), Japan (Tokyo Univ.), and the US (Johns Hopkins). == Awards and honors == Okawa Prize, 2015 Best paper award, “Translingual Search” w/Yang, International Joint Conference on AI, 1997 Allen Newell endowed chair, Carnegie Mellon University, 1995 Elected fellow of AAAI, 1991 Computer Science teaching award, Carnegie Mellon University, 1987 Sperry Fellowship for excellence in AI research, 1986 Herbert Simon teaching award, 1986 "Recognition of Service" award from the ACM for the SIGART presidency, 1983–1985 Provided congressional testimony on machine translation, 1990 == Selected works == === Books === 1983. (with Ryszard S. Michalski & Tom M. Mitchell, Eds.) Machine learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann. 1986. (with Ryszard S. Michalski & Tom Mitchell, Eds.) Machine learning: An artificial intelligence approach. Vol. II. Los Altos, CA: Morgan-Kaufmann. 1986. (with Ryszard S. Michalski & Tom Mitchell, Eds.) Machine Learning: A Guide to Current Research. Kluwer Academic Publishers. == Contributions == “Protein Quaternary Fold Recognition Using Conditional Graphical Models” IJCAI 2007 (w/Liu et al.) “Context-Based Machine Translation” AMTA 2006 (w/Klein et al.) “SCRFs: A New Approach for Protein Fold Recognition,’’ Journal of Computational Biology, 13,2, 2006 (w/Liu et al) “MT for Resource-Poor Languages Using Elicitation-Based Learning” Machine Translation, 2004 ‘‘Learning Approaches for Detecting and Tracking News Events,’’ IEEE Trans I.S., 14, 4, 2000 (w/Yang)

    Read more →
  • Büchi automaton

    Büchi automaton

    In computer science and automata theory, a deterministic Büchi automaton is a theoretical machine which either accepts or rejects infinite inputs. Such a machine has a set of states and a transition function, which determines which state the machine should move to from its current state when it reads the next input character. Some states are accepting states and one state is the start state. The machine accepts an input if and only if it will pass through an accepting state infinitely many times as it reads the input. A non-deterministic Büchi automaton, later referred to just as a Büchi automaton, has a transition function which may have multiple outputs, leading to many possible paths for the same input; it accepts an infinite input if and only if some possible path is accepting. Deterministic and non-deterministic Büchi automata generalize deterministic finite automata and nondeterministic finite automata to infinite inputs. Each are types of ω-automata. Büchi automata recognize the ω-regular languages, the infinite word version of regular languages. They are named after the Swiss mathematician Julius Richard Büchi, who invented them in 1962. Büchi automata are often used in model checking as an automata-theoretic version of a formula in linear temporal logic. == Formal definition == Formally, a deterministic Büchi automaton is a tuple A = ( Q , Σ , δ , q 0 , F ) {\textstyle A=(Q,\Sigma ,\delta ,q_{0},\mathbf {F} )} that consists of the following components: Q {\textstyle Q} is a finite set. The elements of Q {\textstyle Q} are called the states of A {\textstyle A} . Σ {\textstyle \Sigma } is a finite set called the alphabet of A {\textstyle A} . δ : Q × Σ → Q {\textstyle \delta \colon Q\times \Sigma \to Q} is a function, called the transition function of A {\textstyle A} . q 0 {\textstyle q_{0}} is an element of Q {\textstyle Q} , called the initial state of A {\textstyle A} . F ⊆ Q {\textstyle \mathbf {F} \subseteq Q} is the acceptance condition. A run i _ = i 0 i 1 i 2 ⋯ ∈ Σ ω {\displaystyle {\underline {i}}=i_{0}i_{1}i_{2}\cdots \in \Sigma ^{\omega }} is an infinite string of inputs of A {\displaystyle A} . By calling δ {\displaystyle \delta } recursively, we can extend it to a function δ ω : Σ ω → Q ω {\displaystyle \delta ^{\omega }:\Sigma ^{\omega }\to Q^{\omega }} . A state q ∈ Q {\displaystyle q\in Q} is said to occur infinitely often for a run i _ {\displaystyle {\underline {i}}} when the set { n ∈ N ∣ δ ω ( i _ ) n = q } {\displaystyle \{n\in \mathbb {N} \mid \delta ^{\omega }({\underline {i}})_{n}=q\}} is infinite. Let I n f ( i _ ) {\displaystyle \mathrm {Inf} ({\underline {i}})} be the set of states occurring infinitely often for i _ {\displaystyle {\underline {i}}} . The language of A {\displaystyle A} is then the set of runs of A {\displaystyle A} in which at least one of the infinitely-often occurring states is in F {\textstyle \mathbf {F} } ; in symbols: L ( A ) = { i _ ∈ Σ ω ∣ I n f ( i _ ) ∩ F ≠ ∅ } . {\displaystyle L(A)=\{{\underline {i}}\in \Sigma ^{\omega }\mid \mathrm {Inf} ({\underline {i}})\cap \mathbf {F} \neq \varnothing \}.} In a (non-deterministic) Büchi automaton, the transition function δ {\textstyle \delta } is replaced with a transition relation Δ {\textstyle \Delta } that returns a set of states, and the single initial state q 0 {\textstyle q_{0}} is replaced by a set I {\textstyle I} of initial states. Generally, the term Büchi automaton without qualifier refers to non-deterministic Büchi automata. For more comprehensive formalism see also ω-automaton. == Closure properties == The set of Büchi automata is closed under the following operations. Let A = ( Q A , Σ , Δ A , I A , F A ) {\displaystyle A=(Q_{A},\Sigma ,\Delta _{A},I_{A},{F}_{A})} and B = ( Q B , Σ , Δ B , I B , F B ) {\displaystyle B=(Q_{B},\Sigma ,\Delta _{B},I_{B},{F}_{B})} be Büchi automata and C = ( Q C , Σ , Δ C , I C , F C ) {\displaystyle C=(Q_{C},\Sigma ,\Delta _{C},I_{C},{F}_{C})} be a finite automaton. Union: There is a Büchi automaton that recognizes the language L ( A ) ∪ L ( B ) . {\displaystyle L(A)\cup L(B).} Proof: If we assume, w.l.o.g., Q A ∩ Q B {\displaystyle Q_{A}\cap Q_{B}} is empty then L ( A ) ∪ L ( B ) {\displaystyle L(A)\cup L(B)} is recognized by the Büchi automaton ( Q A ∪ Q B , Σ ∪ Σ , Δ A ∪ Δ B , I A ∪ I B , F A ∪ F B ) . {\displaystyle (Q_{A}\cup Q_{B},\Sigma \cup \Sigma ,\Delta _{A}\cup \Delta _{B},I_{A}\cup I_{B},{F}_{A}\cup {F}_{B}).} Intersection: There is a Büchi automaton that recognizes the language L ( A ) ∩ L ( B ) . {\displaystyle L(A)\cap L(B).} Proof: The Büchi automaton A ′ = ( Q ′ , Σ , Δ ′ , I ′ , F ′ ) {\displaystyle A'=(Q',\Sigma ,\Delta ',I',F')} recognizes L ( A ) ∩ L ( B ) , {\displaystyle L(A)\cap L(B),} where Q ′ = Q A × Q B × { 1 , 2 } {\displaystyle Q'=Q_{A}\times Q_{B}\times \{1,2\}} Δ ′ = Δ 1 ∪ Δ 2 {\displaystyle \Delta '=\Delta _{1}\cup \Delta _{2}} Δ 1 = { ( ( q A , q B , 1 ) , a , ( q A ′ , q B ′ , i ) ) | ( q A , a , q A ′ ) ∈ Δ A and ( q B , a , q B ′ ) ∈ Δ B and if q A ∈ F A then i = 2 else i = 1 } {\displaystyle \Delta _{1}=\{((q_{A},q_{B},1),a,(q'_{A},q'_{B},i))|(q_{A},a,q'_{A})\in \Delta _{A}{\text{ and }}(q_{B},a,q'_{B})\in \Delta _{B}{\text{ and if }}q_{A}\in F_{A}{\text{ then }}i=2{\text{ else }}i=1\}} Δ 2 = { ( ( q A , q B , 2 ) , a , ( q A ′ , q B ′ , i ) ) | ( q A , a , q A ′ ) ∈ Δ A and ( q B , a , q B ′ ) ∈ Δ B and if q B ∈ F B then i = 1 else i = 2 } {\displaystyle \Delta _{2}=\{((q_{A},q_{B},2),a,(q'_{A},q'_{B},i))|(q_{A},a,q'_{A})\in \Delta _{A}{\text{ and }}(q_{B},a,q'_{B})\in \Delta _{B}{\text{ and if }}q_{B}\in F_{B}{\text{ then }}i=1{\text{ else }}i=2\}} I ′ = I A × I B × { 1 } {\displaystyle I'=I_{A}\times I_{B}\times \{1\}} F ′ = { ( q A , q B , 2 ) | q B ∈ F B } {\displaystyle F'=\{(q_{A},q_{B},2)|q_{B}\in F_{B}\}} By construction, r ′ = ( q A 0 , q B 0 , i 0 ) , ( q A 1 , q B 1 , i 1 ) , … {\displaystyle r'=(q_{A}^{0},q_{B}^{0},i^{0}),(q_{A}^{1},q_{B}^{1},i^{1}),\dots } is a run of automaton A' on input word w {\textstyle w} if r A = q A 0 , q A 1 , … {\displaystyle r_{A}=q_{A}^{0},q_{A}^{1},\dots } is run of A {\textstyle A} on w {\textstyle w} and r B = q B 0 , q B 1 , … {\displaystyle r_{B}=q_{B}^{0},q_{B}^{1},\dots } is run of B {\textstyle B} on w {\textstyle w} . r A {\textstyle r_{A}} is accepting and r B {\textstyle r_{B}} is accepting if r ′ {\textstyle r'} is concatenation of an infinite series of finite segments of 1-states (states with third component 1) and 2-states (states with third component 2) alternatively. There is such a series of segments of r ′ {\textstyle r'} if r ′ {\textstyle r'} is accepted by A ′ {\textstyle A'} . Concatenation: There is a Büchi automaton that recognizes the language L ( C ) ⋅ L ( A ) . {\displaystyle L(C)\cdot L(A).} Proof: If we assume, w.l.o.g., Q C ∩ Q A {\displaystyle Q_{C}\cap Q_{A}} is empty then the Büchi automaton A ′ = ( Q C ∪ Q A , Σ , Δ ′ , I ′ , F A ) {\displaystyle A'=(Q_{C}\cup Q_{A},\Sigma ,\Delta ',I',F_{A})} recognizes L ( C ) ⋅ L ( A ) {\displaystyle L(C)\cdot L(A)} , where Δ ′ = Δ A ∪ Δ C ∪ { ( q , a , q ′ ) | q ′ ∈ I A and ∃ f ∈ F C . ( q , a , f ) ∈ Δ C } {\displaystyle \Delta '=\Delta _{A}\cup \Delta _{C}\cup \{(q,a,q')|q'\in I_{A}{\text{ and }}\exists f\in F_{C}.(q,a,f)\in \Delta _{C}\}} if I C ∩ F C is empty then I ′ = I C otherwise I ′ = I C ∪ I A {\displaystyle {\text{ if }}I_{C}\cap F_{C}{\text{ is empty then }}I'=I_{C}{\text{ otherwise }}I'=I_{C}\cup I_{A}} ω-closure: If L ( C ) {\displaystyle L(C)} does not contain the empty word then there is a Büchi automaton that recognizes the language L ( C ) ω . {\displaystyle L(C)^{\omega }.} Proof: The Büchi automaton that recognizes L ( C ) ω {\displaystyle L(C)^{\omega }} is constructed in two stages. First, we construct a finite automaton A ′ {\textstyle A'} such that A ′ {\textstyle A'} also recognizes L ( C ) {\displaystyle L(C)} but there are no incoming transitions to initial states of A ′ {\textstyle A'} . So, A ′ = ( Q C ∪ { q new } , Σ , Δ ′ , { q new } , F C ) , {\displaystyle A'=(Q_{C}\cup \{q_{\text{new}}\},\Sigma ,\Delta ',\{q_{\text{new}}\},F_{C}),} where Δ ′ = Δ C ∪ { ( q new , a , q ′ ) | ∃ q ∈ I C . ( q , a , q ′ ) ∈ Δ C } . {\displaystyle \Delta '=\Delta _{C}\cup \{(q_{\text{new}},a,q')|\exists q\in I_{C}.(q,a,q')\in \Delta _{C}\}.} Note that L ( C ) = L ( A ′ ) {\displaystyle L(C)=L(A')} because L ( C ) {\displaystyle L(C)} does not contain the empty string. Second, we will construct the Büchi automaton A ″ {\textstyle A''} that recognize L ( C ) ω {\displaystyle L(C)^{\omega }} by adding a loop back to the initial state of A ′ {\textstyle A'} . So, A ″ = ( Q C ∪ { q new } , Σ , Δ ″ , { q new } , { q new } ) {\displaystyle A''=(Q_{C}\cup \{q_{\text{new}}\},\Sigma ,\Delta '',\{q_{\text{new}}\},\{q_{\text{new}}\})} , where Δ ″ = Δ ′ ∪ { ( q , a , q new ) | ∃ q ′ ∈ F C . ( q , a , q ′ ) ∈ Δ ′ } . {\displaystyle \Delta ''=\Delta '\cup \{(q,a,q_{\text{new}})|\exists q'\in F_{C}.(q,a,q')\in \Delta '\}.} Complementation:

    Read more →
  • Highway network

    Highway network

    In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks. The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem, thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways"). Highway Networks have found use in text sequence labeling and speech recognition tasks. In 2014, the state of the art was training deep neural networks with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet (December). ResNet behaves like an open-gated Highway Net. == Model == The model has two gates in addition to the H ( W H , x ) {\displaystyle H(W_{H},x)} gate: the transform gate T ( W T , x ) {\displaystyle T(W_{T},x)} and the carry gate C ( W C , x ) {\displaystyle C(W_{C},x)} . The latter two gates are non-linear transfer functions (specifically sigmoid by convention). The function H {\displaystyle H} can be any desired transfer function. The carry gate is defined as: C ( W C , x ) = 1 − T ( W T , x ) {\displaystyle C(W_{C},x)=1-T(W_{T},x)} while the transform gate is just a gate with a sigmoid transfer function. == Structure == The structure of a hidden layer in the Highway Network follows the equation: y = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ C ( x , W C ) = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ ( 1 − T ( x , W T ) ) {\displaystyle {\begin{aligned}y=H(x,W_{H})\cdot T(x,W_{T})+x\cdot C(x,W_{C})\\=H(x,W_{H})\cdot T(x,W_{T})+x\cdot (1-T(x,W_{T}))\end{aligned}}} == Related work == Sepp Hochreiter analyzed the vanishing gradient problem in 1991 and attributed to it the reason why deep learning did not work well. To overcome this problem, Long Short-Term Memory (LSTM) recurrent neural networks have residual connections with a weight of 1.0 in every LSTM cell (called the constant error carrousel) to compute y t + 1 = F ( x t ) + x t {\textstyle y_{t+1}=F(x_{t})+x_{t}} . During backpropagation through time, this becomes the residual formula y = F ( x ) + x {\textstyle y=F(x)+x} for feedforward neural networks. This enables training very deep recurrent neural networks with a very long time span t. A later LSTM version published in 2000 modulates the identity LSTM connections by so-called "forget gates" such that their weights are not fixed to 1.0 but can be learned. In experiments, the forget gates were initialized with positive bias weights, thus being opened, addressing the vanishing gradient problem. As long as the forget gates of the 2000 LSTM are open, it behaves like the 1997 LSTM. The Highway Network of May 2015 applies these principles to feedforward neural networks. It was reported to be "the first very deep feedforward network with hundreds of layers". It is like a 2000 LSTM with forget gates unfolded in time, while the later Residual Nets have no equivalent of forget gates and are like the unfolded original 1997 LSTM. If the skip connections in Highway Networks are "without gates," or if their gates are kept open (activation 1.0), they become Residual Networks. The residual connection is a special case of the "short-cut connection" or "skip connection" by Rosenblatt (1961) and Lang & Witbrock (1988) which has the form x ↦ F ( x ) + A x {\displaystyle x\mapsto F(x)+Ax} . Here the randomly initialized weight matrix A does not have to be the identity mapping. Every residual connection is a skip connection, but almost all skip connections are not residual connections. The original Highway Network paper not only introduced the basic principle for very deep feedforward networks, but also included experimental results with 20, 50, and 100 layers networks, and mentioned ongoing experiments with up to 900 layers. Networks with 50 or 100 layers had lower training error than their plain network counterparts, but no lower training error than their 20 layers counterpart (on the MNIST dataset, Figure 1 in ). No improvement on test accuracy was reported with networks deeper than 19 layers (on the CIFAR-10 dataset; Table 1 in ). The ResNet paper, however, provided strong experimental evidence of the benefits of going deeper than 20 layers. It argued that the identity mapping without modulation is crucial and mentioned that modulation in the skip connection can still lead to vanishing signals in forward and backward propagation (Section 3 in ). This is also why the forget gates of the 2000 LSTM were initially opened through positive bias weights: as long as the gates are open, it behaves like the 1997 LSTM. Similarly, a Highway Net whose gates are opened through strongly positive bias weights behaves like a ResNet. The skip connections used in modern neural networks (e.g., Transformers) are dominantly identity mappings.

    Read more →
  • Krohn–Rhodes theory

    Krohn–Rhodes theory

    In mathematics and computer science, the Krohn–Rhodes theory (or algebraic automata theory) is an approach to the study of finite semigroups and automata that seeks to decompose them in terms of elementary components. These components correspond to finite aperiodic semigroups and finite simple groups that are combined in a feedback-free manner (called a "wreath product" or "cascade"). Krohn and Rhodes found a general decomposition for finite automata. The authors discovered and proved an unexpected major result in finite semigroup theory, revealing a deep connection between finite automata and semigroups. Decidability of Krohn-Rhodes complexity long motivated much work in semigroup theory. In June 2024, Stuart Margolis, John Rhodes, and Anne Schilling announced a proof that the complexity is decidable. == Definitions and description of the Krohn–Rhodes theorem == Let T {\displaystyle T} be a semigroup. A semigroup S {\displaystyle S} that is a homomorphic image of a subsemigroup of T {\displaystyle T} is said to be a divisor of T {\displaystyle T} . The Krohn–Rhodes theorem for finite semigroups states that every finite semigroup S {\displaystyle S} is a divisor of a finite alternating wreath product of finite simple groups, each a divisor of S {\displaystyle S} , and finite aperiodic semigroups (which contain no nontrivial subgroups). In the automata formulation, the Krohn–Rhodes theorem for finite automata states that given a finite automaton A {\displaystyle A} with states Q {\displaystyle Q} and input alphabet I {\displaystyle I} , output alphabet U {\displaystyle U} , then one can expand the states to Q ′ {\displaystyle Q'} such that the new automaton A ′ {\displaystyle A'} embeds into a cascade of "simple", irreducible automata: In particular, A {\displaystyle A} is emulated by a feed-forward cascade of (1) automata whose transformation semigroups are finite simple groups and (2) automata that are banks of flip-flops running in parallel. The new automaton A ′ {\displaystyle A'} has the same input and output symbols as A {\displaystyle A} . Here, both the states and inputs of the cascaded automata have a very special hierarchical coordinate form. Moreover, each simple group (prime) or non-group irreducible semigroup (subsemigroup of the flip-flop monoid) that divides the transformation semigroup of A {\displaystyle A} must divide the transformation semigroup of some component of the cascade, and only the primes that must occur as divisors of the components are those that divide A {\displaystyle A} 's transformation semigroup. == Group complexity == The Krohn–Rhodes complexity (also called group complexity or just complexity) of a finite semigroup S is the least number of groups in a wreath product of finite groups and finite aperiodic semigroups of which S is a divisor. All finite aperiodic semigroups have complexity 0, while non-trivial finite groups have complexity 1. In fact, there are semigroups of every non-negative integer complexity. For example, for any n greater than 1, the multiplicative semigroup of all (n+1) × (n+1) upper-triangular matrices over any fixed finite field has complexity n (Kambites, 2007). A major open problem in finite semigroup theory is the decidability of complexity: is there an algorithm that will compute the Krohn–Rhodes complexity of a finite semigroup, given its multiplication table? Upper bounds and ever more precise lower bounds on complexity have been obtained (see, e.g. Rhodes & Steinberg, 2009). Rhodes has conjectured that the problem is decidable. In June 2024, Stuart Margolis, John Rhodes, and Anne Schilling announced a proof in the affirmative of the conjecture, though as of 2025 the result has yet to be confirmed. == History and applications == At a conference in 1962, Kenneth Krohn and John Rhodes announced a method for decomposing a (deterministic) finite automaton into "simple" components that are themselves finite automata. This joint work, which has implications for philosophy, comprised both Krohn's doctoral thesis at Harvard University and Rhodes' doctoral thesis at MIT. Simpler proofs, and generalizations of the theorem to infinite structures, have been published since then (see Chapter 4 of Rhodes and Steinberg's 2009 book The q-Theory of Finite Semigroups for an overview). In the 1965 paper by Krohn and Rhodes, the proof of the theorem on the decomposition of finite automata (or, equivalently sequential machines) made extensive use of the algebraic semigroup structure. Later proofs contained major simplifications using finite wreath products of finite transformation semigroups. The theorem generalizes the Jordan–Hölder decomposition for finite groups (in which the primes are the finite simple groups), to all finite transformation semigroups (for which the primes are again the finite simple groups plus all subsemigroups of the "flip-flop" (see above)). Both the group and more general finite automata decomposition require expanding the state-set of the general, but allow for the same number of input symbols. In the general case, these are embedded in a larger structure with a hierarchical "coordinate system". One must be careful in understanding the notion of "prime" as Krohn and Rhodes explicitly refer to their theorem as a "prime decomposition theorem" for automata. The components in the decomposition, however, are not prime automata (with prime defined in a naïve way); rather, the notion of prime is more sophisticated and algebraic: the semigroups and groups associated to the constituent automata of the decomposition are prime (or irreducible) in a strict and natural algebraic sense with respect to the wreath product (Eilenberg, 1976). Also, unlike earlier decomposition theorems, the Krohn–Rhodes decompositions usually require expansion of the state-set, so that the expanded automaton covers (emulates) the one being decomposed. These facts have made the theorem difficult to understand and challenging to apply in a practical way—until recently, when computational implementations became available (Egri-Nagy & Nehaniv 2005, 2008). H.P. Zeiger (1967) proved an important variant called the holonomy decomposition (Eilenberg 1976). The holonomy method appears to be relatively efficient and has been implemented computationally by A. Egri-Nagy (Egri-Nagy & Nehaniv 2005). Meyer and Thompson (1969) give a version of Krohn–Rhodes decomposition for finite automata that is equivalent to the decomposition previously developed by Hartmanis and Stearns, but for useful decompositions, the notion of expanding the state-set of the original automaton is essential (for the non-permutation automata case). Many proofs and constructions now exist of Krohn–Rhodes decompositions (e.g., [Krohn, Rhodes & Tilson 1968], [Ésik 2000], [Diekert et al. 2012]), with the holonomy method the most popular and efficient in general (although not in all cases). [Zimmermann 2010] gives an elementary proof of the theorem. Owing to the close relation between monoids and categories, a version of the Krohn–Rhodes theorem is applicable to category theory. This observation and a proof of an analogous result were offered by Wells (1980). The Krohn–Rhodes theorem for semigroups/monoids is an analogue of the Jordan–Hölder theorem for finite groups (for semigroups/monoids rather than groups). As such, the theorem is a deep and important result in semigroup/monoid theory. The theorem was also surprising to many mathematicians and computer scientists since it had previously been widely believed that the semigroup/monoid axioms were too weak to admit a structure theorem of any strength, and prior work (Hartmanis & Stearns) was only able to show much more rigid and less general decomposition results for finite automata. Work by Egri-Nagy and Nehaniv (2005, 2008–) continues to further automate the holonomy version of the Krohn–Rhodes decomposition extended with the related decomposition for finite groups (so-called Frobenius–Lagrange coordinates) using the computer algebra system GAP. Applications outside of the semigroup and monoid theories are now computationally feasible. They include computations in biology and biochemical systems (e.g. Egri-Nagy & Nehaniv 2008), artificial intelligence, finite-state physics, psychology, and game theory (see, for example, Rhodes 2009).

    Read more →
  • The Best Free AI Voice Assistant for Beginners

    The Best Free AI Voice Assistant for Beginners

    Looking for the best AI voice assistant? An AI voice assistant is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI voice assistant slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →