AI Face Upgrade

AI Face Upgrade — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • How Data Happened

    How Data Happened

    How Data Happened: A History from the Age of Reason to the Age of Algorithms is a 2023 non-fiction book written by Columbia University professors Chris Wiggins and Matthew L. Jones. The book explores the history of data and statistics from the end of the 18th century to the present day. == Content == The book starts at the end of the 18th century, when European states began tabulating physical resources, and ends at the present day, when algorithms manipulate our personal information as a commodity. It looks at the rise of data and statistics, and how early statistical methods were used to justify eugenics, quantify supposed racial differences, and develop military and industrial applications. The authors also discuss the impact of the internet and e-commerce on data collection, the rise of data science, and the consequences of government-run surveillance systems collecting vast amounts of personal data for customized, targeted advertising. They emphasize the importance of privacy and democracy and propose remedies to the problems caused by mass data collection, including stronger regulation of the tech industry and collective action by its employees. The book is a historical analysis that provides context for understanding the debates surrounding data and its control. The book has 336 pages and was published in 2023 by W. W. Norton & Company.

    Read more →
  • Extended affix grammar

    Extended affix grammar

    In computer science, extended affix grammars (EAGs) are a formal grammar formalism for describing the context free and context sensitive syntax of language, both natural language and programming languages. EAGs are a member of the family of two-level grammars; more specifically, a restriction of Van Wijngaarden grammars with the specific purpose of making parsing feasible. Like Van Wijngaarden grammars, EAGs have hyperrules that form a context-free grammar except in that their nonterminals may have arguments, known as affixes, the possible values of which are supplied by another context-free grammar, the metarules. EAGs were introduced and studied by D.A. Watt in 1974; recognizers were developed at the University of Nijmegen between 1985 and 1995. The EAG compiler developed there will generate either a recogniser, a transducer, a translator, or a syntax directed editor for a language described in the EAG formalism. The formalism is quite similar to Prolog, to the extent that it borrowed its cut operator. EAGs have been used to write grammars of natural languages such as English, Spanish, and Hungarian. The aim was to verify the grammars by making them parse corpora of text (corpus linguistics); hence, parsing had to be sufficiently practical. However, the parse tree explosion problem that ambiguities in natural language tend to produce in this type of approach is worsened for EAGs because each choice of affix value may produce a separate parse, even when several different values are equivalent. The remedy proposed was to switch to the much simpler Affix Grammar over a Finite Lattice (AGFL) instead, in which metagrammars can only produce simple finite languages.

    Read more →
  • Markov chain central limit theorem

    Markov chain central limit theorem

    In the mathematical theory of random processes, the Markov chain central limit theorem has a conclusion somewhat similar in form to that of the classic central limit theorem (CLT) of probability theory, but the quantity in the role taken by the variance in the classic CLT has a more complicated definition. See also the general form of Bienaymé's identity. == Statement == Suppose that: the sequence X 1 , X 2 , X 3 , … {\textstyle X_{1},X_{2},X_{3},\ldots } of random elements of some set is a Markov chain that has a stationary probability distribution; and the initial distribution of the process, i.e. the distribution of X 1 {\textstyle X_{1}} , is the stationary distribution, so that X 1 , X 2 , X 3 , … {\textstyle X_{1},X_{2},X_{3},\ldots } are identically distributed. In the classic central limit theorem these random variables would be assumed to be independent, but here we have only the weaker assumption that the process has the Markov property; and g {\textstyle g} is some (measurable) real-valued function for which var ⁡ ( g ( X 1 ) ) < + ∞ . {\textstyle \operatorname {var} (g(X_{1}))<+\infty .} Now let μ = E ⁡ ( g ( X 1 ) ) , μ ^ n = 1 n ∑ k = 1 n g ( X k ) σ 2 := lim n → ∞ var ⁡ ( n μ ^ n ) = lim n → ∞ n var ⁡ ( μ ^ n ) = var ⁡ ( g ( X 1 ) ) + 2 ∑ k = 1 ∞ cov ⁡ ( g ( X 1 ) , g ( X 1 + k ) ) . {\displaystyle {\begin{aligned}\mu &=\operatorname {E} (g(X_{1})),\\{\widehat {\mu }}_{n}&={\frac {1}{n}}\sum _{k=1}^{n}g(X_{k})\\\sigma ^{2}&:=\lim _{n\to \infty }\operatorname {var} ({\sqrt {n}}{\widehat {\mu }}_{n})=\lim _{n\to \infty }n\operatorname {var} ({\widehat {\mu }}_{n})=\operatorname {var} (g(X_{1}))+2\sum _{k=1}^{\infty }\operatorname {cov} (g(X_{1}),g(X_{1+k})).\end{aligned}}} Then as n → ∞ , {\textstyle n\to \infty ,} we have n ( μ ^ n − μ ) → D Normal ( 0 , σ 2 ) , {\displaystyle {\sqrt {n}}({\hat {\mu }}_{n}-\mu )\ {\xrightarrow {\mathcal {D}}}\ {\text{Normal}}(0,\sigma ^{2}),} where the decorated arrow indicates convergence in distribution. == Monte Carlo Setting == The Markov chain central limit theorem can be guaranteed for functionals of general state space Markov chains under certain conditions. In particular, this can be done with a focus on Monte Carlo settings. An example of the application in a MCMC (Markov Chain Monte Carlo) setting is the following: Consider a simple hard spheres model on a grid. Suppose X = { 1 , … , n 1 } × { 1 , … , n 2 } ⊆ Z 2 {\displaystyle X=\{1,\ldots ,n_{1}\}\times \{1,\ldots ,n_{2}\}\subseteq Z^{2}} . A proper configuration on X {\displaystyle X} consists of coloring each point either black or white in such a way that no two adjacent points are white. Let χ {\displaystyle \chi } denote the set of all proper configurations on X {\displaystyle X} , N χ ( n 1 , n 2 ) {\displaystyle N_{\chi }(n_{1},n_{2})} be the total number of proper configurations and π be the uniform distribution on χ {\displaystyle \chi } so that each proper configuration is equally likely. Suppose our goal is to calculate the typical number of white points in a proper configuration; that is, if W ( x ) {\displaystyle W(x)} is the number of white points in x ∈ χ {\displaystyle x\in \chi } then we want the value of E π W = ∑ x ∈ χ W ( x ) N χ ( n 1 , n 2 ) {\displaystyle E_{\pi }W=\sum _{x\in \chi }{\frac {W(x)}{N_{\chi }{\bigl (}n_{1},n_{2}{\bigr )}}}} If n 1 {\displaystyle n_{1}} and n 2 {\displaystyle n_{2}} are even moderately large then we will have to resort to an approximation to E π W {\displaystyle E_{\pi }W} . Consider the following Markov chain on χ {\displaystyle \chi } . Fix p ∈ ( 0 , 1 ) {\displaystyle p\in (0,1)} and set X 1 = x 1 {\displaystyle X_{1}=x_{1}} where x 1 ∈ χ {\displaystyle x_{1}\in \chi } is an arbitrary proper configuration. Randomly choose a point ( x , y ) ∈ X {\displaystyle (x,y)\in X} and independently draw U ∼ U n i f o r m ( 0 , 1 ) {\displaystyle U\sim \mathrm {Uniform} (0,1)} . If u ≤ p {\displaystyle u\leq p} and all of the adjacent points are black then color ( x , y ) {\displaystyle (x,y)} white leaving all other points alone. Otherwise, color ( x , y ) {\displaystyle (x,y)} black and leave all other points alone. Call the resulting configuration X 1 {\displaystyle X_{1}} . Continuing in this fashion yields a Harris ergodic Markov chain { X 1 , X 2 , X 3 , … } {\displaystyle \{X_{1},X_{2},X_{3},\ldots \}} having π {\displaystyle \pi } as its invariant distribution. It is now a simple matter to estimate E π W {\displaystyle E_{\pi }W} with w n ¯ = ∑ i = 1 n W ( X i ) / n {\displaystyle {\overline {w_{n}}}=\sum _{i=1}^{n}W(X_{i})/n} . Also, since χ {\displaystyle \chi } is finite (albeit potentially large) it is well known that X {\displaystyle X} will converge exponentially fast to π {\displaystyle \pi } which implies that a CLT holds for w n ¯ {\displaystyle {\overline {w_{n}}}} . == Implications == Not taking into account the additional terms in the variance which stem from correlations (e.g. serial correlations in markov chain monte carlo simulations) can result in the problem of pseudoreplication when computing e.g. the confidence intervals for the sample mean.

    Read more →
  • DeepL Translator

    DeepL Translator

    DeepL is a German AI research company known for its language AI platform, which includes DeepL Translator and DeepL Voice, and for DeepL Agent, an AI agent capable of planning workflows and using office systems and tools autonomously, in response to natural language instructions. Its algorithm uses the transformer architecture. It offers a paid subscription for additional features and access to its translation application programming interface. DeepL was founded in 2017 by Jaroslaw Kutylowski and is a unicorn, valued at $2 billion after a Series C funding round raised $300 million in May 2024. Its more than 200,000 business customers include a large proportion of the Fortune 500. == History == The translating system was first developed within Linguee by a team led by Chief Technology Officer Jarosław Kutyłowski in 2016. It was launched as DeepL Translator on 28 August 2017 and offered translations between English, German, French, Spanish, Italian, Polish and Dutch. At its launch, it claimed to have surpassed its competitors in blind tests and BLEU scores, including Google Translate, Amazon Translate, Microsoft Translator and Facebook's translation feature. With the release of DeepL in 2017, Linguee's company name was changed to DeepL GmbH, and it is also financed by advertising on its sister site, linguee.com. Support for Portuguese and Russian was added on 5 December 2018. In July 2019, Jarosław Kutyłowski became the CEO of DeepL GmbH and restructured the company into a Societas Europaea in 2021. Translation software for Microsoft Windows and macOS was released in September 2019. Support for Chinese (simplified) and Japanese was added on 19 March 2020, which the company claimed to have surpassed the aforementioned competitors as well as Baidu and Youdao. Then, 13 more European languages were added in March 2021: Bulgarian, Czech, Danish, Estonian, Finnish, Greek, Hungarian, Latvian, Lithuanian, Romanian, Slovak, Slovenian, and Swedish, bringing the total number of supported languages to 24. On 25 May 2022, support for Indonesian and Turkish was added, and support for Ukrainian was added on 14 September 2022. In January 2023, the company reached a valuation of 1 billion euro and became the most valued startup company in Cologne. At the end of the month, support for Korean and Norwegian (Bokmål) was also added. In May 2024, the company announced an investment of US$300 million at AI. In January 2026, more languages were supported, including Luxembourgish and Irish. == Services == === Translation method === The service uses a proprietary algorithm with convolutional neural networks (CNNs) that have been trained with the Linguee database. According to the developers, the service uses a newer improved architecture of neural networks, resulting in a more natural sound of translations than by competing services. The translation is generated using a supercomputer that reaches 5.1 petaflops and is operated in Iceland with hydropower. DeepL's data centers are located at the EcoDataCenter in Falun, Sweden, which is a data center for sustainability. In general, CNNs are slightly more suitable for long coherent word sequences, but they have so far not been used by the competition because of their weaknesses compared to recurrent neural networks. The weaknesses of DeepL are compensated for by supplemental techniques, some of which are publicly known. === Translator and subscription === The translator can be used for free with a maximum limit of 1,500 characters per translation. Microsoft Word and PowerPoint files in Office Open XML file formats (.docx and .pptx) and PDF files up to 5MB in size can also be translated. It offers paid subscription DeepL Pro, which has been available since March 2018 and includes application programming interface access and a software plug-in for computer-assisted translation tools, including SDL Trados Studio. Unlike the free version, translated texts are stated to not be saved on the server; also, the character limit is removed. The monthly pricing model includes a set amount of text, with texts beyond that being calculated according to the number of characters. ==== Supported languages ==== As of May 2026, the translation service supports the following languages: Additionally, these languages are currently in beta, indicated by an asterisk after their name in the language picker: === DeepL Write === In November 2022, DeepL launched a tool to improve monolingual texts in English and German, called DeepL Write. In December, the company removed access and informed journalists that it was only for internal use and that DeepL Write would be relaunched in early 2023. The public beta version was then released on January 17, 2023. In the summer of 2024, DeepL announced the availability of two more languages in DeepL Write: French and Spanish. By January 2024, DeepL had added an additional two: Portuguese (European and Brazilian) and Italian. === DeepL Agent === In November 2025, DeepL launched an AI agent called DeepL Agent which is capable of operating business applications in a human-like manner. == Reception == The reception of DeepL has been generally positive. TechCrunch appreciates it for the accuracy of its translations and stating that it was more accurate and nuanced than Google Translate. Le Monde thanks its developers for translating French text into more "French-sounding" expressions. RTL Z stated that DeepL Translator "offers better translations […] when it comes to Dutch to English and vice versa". La Repubblica, and a Latin American website, "WWWhat's new?", showed praise as well. A 2018 paper by the University of Bologna evaluated the Italian-to-German translation capabilities and found the preliminary results to be similar in quality to Google Translate. In September 2021, Slator remarked that the language industry response was more measured than the press and noted that DeepL is still highly regarded by users. A reviewer noted in 2018 that DeepL had far fewer languages available for translation than competing products. == Awards and honors == DeepL won the 2020 Webby Award for Best Practices and the 2020 Webby Award for Technical Achievement (Apps, Mobile, and Features), both in the category Apps, Mobile & Voice. In April 2025, DeepL was featured in the Forbes AI 50 list.

    Read more →
  • Subvocal recognition

    Subvocal recognition

    Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital output, aural or text-based. A silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds. It works by the computer identifying the phonemes that an individual pronounces from nonauditory sources of information about their speech movements. These are then used to recreate the speech using speech synthesis. == Input methods == Silent speech interface systems have been created using ultrasound and optical camera input of tongue and lip movements. Electromagnetic devices are another technique for tracking tongue and lip movements. The detection of speech movements by electromyography of speech articulator muscles and the larynx is another technique. Another source of information is the vocal tract resonance signals that get transmitted through bone conduction called non-audible murmurs. They have also been created as a brain–computer interface using brain activity in the motor cortex obtained from intracortical microelectrodes. == Uses == Such devices are created as aids to those unable to create the sound phonation needed for audible speech such as after laryngectomies. Another use is for communication when speech is masked by background noise or distorted by self-contained breathing apparatus. A further practical use is where a need exists for silent communication, such as when privacy is required in a public place, or hands-free data silent transmission is needed during a military or security operation. In 2002, the Japanese company NTT DoCoMo announced it had created a silent mobile phone using electromyography and imaging of lip movement. The company stated that "the spur to developing such a phone was ridding public places of noise," adding that, "the technology is also expected to help people who have permanently lost their voice." The feasibility of using silent speech interfaces for practical communication has since then been shown. In 2019, Arnav Kapur, a researcher from the Massachusetts Institute of Technology, conducted a study known as AlterEgo. Its implementation of the silent speech interface enables direct communication between the human brain and external devices through stimulation of the speech muscles. By leveraging neural signals associated with speech and language, the AlterEgo system deciphers the user's intended words and translates them into text or commands without the need for audible speech. == Research and patents == With a grant from the U.S. Army, research into synthetic telepathy using subvocalization is taking place at the University of California, Irvine under lead scientist Mike D'Zmura. NASA's Ames Research Laboratory in Mountain View, California, under the supervision of Charles Jorgensen is conducting subvocalization research. The Brain Computer Interface R&D program at Wadsworth Center under the New York State Department of Health has confirmed the existing ability to decipher consonants and vowels from imagined speech, which allows for brain-based communication using imagined speech, however using EEGs instead of subvocalization techniques. US Patents on silent communication technologies include: US Patent 6587729 "Apparatus for audibly communicating speech using the radio frequency hearing effect", US Patent 5159703 "Silent subliminal presentation system", US Patent 6011991 "Communication system and method including brain wave analysis and/or use of brain activity", US Patent 3951134 "Apparatus and method for remotely monitoring and altering brain waves". Latter two rely on brain wave analysis. == In fiction == The decoding of silent speech using a computer played an important role in Arthur C. Clarke's story and Stanley Kubrick's associated film A Space Odyssey. In this, HAL 9000, a computer controlling spaceship Discovery One, bound for Jupiter, discovers a plot to deactivate it by the mission astronauts Dave Bowman and Frank Poole through lip reading their conversations. In Orson Scott Card's series (including Ender's Game), the artificial intelligence can be spoken to while the protagonist wears a movement sensor in his jaw, enabling him to converse with the AI without making noise. He also wears an ear implant. In Speaker for the Dead and subsequent novels, author Orson Scott Card described an ear implant, called a "jewel", that allows subvocal communication with computer systems. Author Robert J. Sawyer made use of subvocal recognition to allow silent commands to the cybernetic 'companion implants' used by the advanced Neanderthal characters in his Neanderthal Parallax trilogy of science fiction novels. In Earth, David Brin depicts this technology and its uses as a normal gear in the near future. In Down and Out in the Magic Kingdom, Cory Doctorow has cellphone technology become silent through a cochlear implant and miking the throat to pick up subvocalization. William Gibson's Sprawl Trilogy frequently uses sub-vocalization systems in various devices. In Kage Baker's Company novels, the immortal cyborgs communicate subvocally. In the Hugo Award-winning Hyperion Cantos by Dan Simmons, the characters often use subvocalization to communicate. In the Culture novels by Iain M. Banks, more highly advanced species often communicate subvocally through their technology. In Deus Ex: Human Revolution (2011), the protagonist is augmented with a subvocalization implant for sending covert communications (and a corresponding cochlear implant for receiving covert communications). In the tabletop RPG and video game series Shadowrun, player characters can communicate via subvocal microphones in some instances. In Paranoia, all citizens can speak to the computer via their "cerebral cortech" implants. Alistair Reynolds Revelation Space trilogy frequently uses sub-vocalization systems in various devices.

    Read more →
  • Supervised learning

    Supervised learning

    In machine learning, supervised learning (SL) is a type of machine learning paradigm where an algorithm learns to map input data to a specific output based on example input-output pairs. This process involves training a statistical model using labeled data, meaning each piece of input data is provided with the correct output. The term "supervised" refers to the role of a teacher or supervisor who provides this training data, guiding the algorithm towards correct predictions. For instance, if you want a model to identify cats in images, supervised learning would involve feeding it many images of cats (inputs) that are explicitly labeled "cat" (outputs). The goal of supervised learning is for the trained model to accurately predict the output for new, unseen data. This requires the algorithm to effectively generalize from the training examples, a quality measured by its generalization error. Supervised learning is commonly used for tasks like classification (predicting a category, e.g., spam or not spam) and regression (predicting a continuous value, e.g., house prices). == Steps to follow == To solve a given problem of supervised learning, the following steps must be performed: Determine the type of training samples. Before doing anything else, the user should decide what kind of data is to be used as a training set. In the case of handwriting analysis, for example, this might be a single handwritten character, an entire handwritten word, an entire sentence of handwriting, or a full paragraph of handwriting. Gather a training set. The training set needs to be representative of the real-world use of the function. Thus, a set of input objects is gathered together with corresponding outputs, either from human experts or from measurements. Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the curse of dimensionality; but should contain enough information to accurately predict the output. Determine the structure of the learned function and corresponding learning algorithm. For example, one may choose to use support-vector machines or decision trees. Complete the design. Run the learning algorithm on the gathered training set. Some supervised learning algorithms require the user to determine certain control parameters. These parameters may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate from the training set. == Algorithm choice == A wide range of supervised learning algorithms are available, each with its strengths and weaknesses. There is no single learning algorithm that works best on all supervised learning problems (see the No free lunch theorem). There are four major issues to consider in supervised learning: === Bias–variance tradeoff === A first issue is the tradeoff between bias and variance. Imagine that we have available several different, but equally good, training data sets. A learning algorithm is biased for a particular input x {\displaystyle x} if, when trained on each of these data sets, it is systematically incorrect when predicting the correct output for x {\displaystyle x} . A learning algorithm has high variance for a particular input x {\displaystyle x} if it predicts different output values when trained on different training sets. The prediction error of a learned classifier is related to the sum of the bias and the variance of the learning algorithm. Generally, there is a tradeoff between bias and variance. A learning algorithm with low bias must be "flexible" so that it can fit the data well. But if the learning algorithm is too flexible, it will fit each training data set differently, and hence have high variance. A key aspect of many supervised learning methods is that they are able to adjust this tradeoff between bias and variance (either automatically or by providing a bias/variance parameter that the user can adjust). === Function complexity and amount of training data === The second issue is of the amount of training data available relative to the complexity of the "true" function (classifier or regression function). If the true function is simple, then an "inflexible" learning algorithm with high bias and low variance will be able to learn it from a small amount of data. But if the true function is highly complex (e.g., because it involves complex interactions among many different input features and behaves differently in different parts of the input space), then the function will only be able to learn with a large amount of training data paired with a "flexible" learning algorithm with low bias and high variance. === Dimensionality of the input space === A third issue is the dimensionality of the input space. If the input feature vectors have large dimensions, learning the function can be difficult even if the true function only depends on a small number of those features. This is because the many "extra" dimensions can confuse the learning algorithm and cause it to have high variance. Hence, input data of large dimensions typically requires tuning the classifier to have low variance and high bias. In practice, if the engineer can manually remove irrelevant features from the input data, it will likely improve the accuracy of the learned function. In addition, there are many algorithms for feature selection that seek to identify the relevant features and discard the irrelevant ones. This is an instance of the more general strategy of dimensionality reduction, which seeks to map the input data into a lower-dimensional space prior to running the supervised learning algorithm. === Noise in the output values === A fourth issue is the degree of noise in the desired output values (the supervisory target variables). If the desired output values are often incorrect (because of human error or sensor errors), then the learning algorithm should not attempt to find a function that exactly matches the training examples. Attempting to fit the data too carefully leads to overfitting. You can overfit even when there are no measurement errors (stochastic noise) if the function you are trying to learn is too complex for your learning model. In such a situation, the part of the target function that cannot be modeled "corrupts" your training data – this phenomenon has been called deterministic noise. When either type of noise is present, it is better to go with a higher bias, lower variance estimator. In practice, there are several approaches to alleviate noise in the output values such as early stopping to prevent overfitting as well as detecting and removing the noisy training examples prior to training the supervised learning algorithm. There are several algorithms that identify noisy training examples and removing the suspected noisy training examples prior to training has decreased generalization error with statistical significance. === Other factors to consider === Other factors to consider when choosing and applying a learning algorithm include the following: Heterogeneity of the data. If the feature vectors include features of many different kinds (discrete, discrete ordered, counts, continuous values), some algorithms are easier to apply than others. Many algorithms, including support-vector machines, linear regression, logistic regression, neural networks, and nearest neighbor methods, require that the input features be numerical and scaled to similar ranges (e.g., to the [-1,1] interval). Methods that employ a distance function, such as nearest neighbor methods and support-vector machines with Gaussian kernels, are particularly sensitive to this. An advantage of decision trees is that they easily handle heterogeneous data. Redundancy in the data. If the input features contain redundant information (e.g., highly correlated features), some learning algorithms (e.g., linear regression, logistic regression, and distance-based methods) will perform poorly because of numerical instabilities. These problems can often be solved by imposing some form of regularization. Presence of interactions and non-linearities. If each of the features makes an independent contribution to the output, then algorithms based on linear functions (e.g., linear regression, logistic regression, support-vector machines, naive Bayes) and distance functions (e.g., nearest neighbor methods, support-vector machines with Gaussian kernels) generally perform well. However, if there are complex interactions among features, then algorithms such as decision trees and neural networks work better, becaus

    Read more →
  • Armin B. Cremers

    Armin B. Cremers

    Armin Bernd Cremers (born June 7, 1946) is a German mathematician and computer scientist. He is a professor in the computer science institute at the University of Bonn, Germany. He is most notable for his contributions to several fields of discrete mathematics including formal languages and automata theory. In more recent years he has been recognized for his work in artificial intelligence, machine learning and robotics as well as in geoinformatics and deductive databases. == Life and work == Armin B. Cremers studied mathematics and physics at the University of Karlsruhe, Germany. After his graduate diploma (1971) and PhD (1972), both in mathematics, both summa cum laude, he received his academic lectureship qualification for computer science (1974), all from the University of Karlsruhe. Following an invitation by Seymour Ginsburg, he joined the University of Southern California (USC), Los Angeles, in 1973 where he worked until 1976 as an assistant professor of electrical engineering and computer science. With Ginsburg he initiated Grammar Forms, a new formalism for grammatical families. In 1976 A. B. Cremers returned to Germany and was appointed to full professor of computer science at the University of Dortmund, where he remained until 1990, holding the chair for information systems. During the same time he continued working as a visiting research professor at USC, where together with Thomas N. Hibbard he developed the concept of Data Spaces, a comprehensive computational model, in theory and applications. At the University of Dortmund A. B. Cremers served as chairman of the computer science department and, since early 1985, as vice president for Research and Junior Scientific Staff. In this position he was liaison for the development of the Technology Center Dortmund Archived 2021-05-09 at the Wayback Machine. He was the initiator and founding director of the Center for Expert Systems Dortmund (ZEDO) and the NRW State Research Collaborative in Artificial Intelligence (KI-NRW). From 1988 to 1996 he was also a member of the supervisory board of the German National Research Center for Mathematics and Data Processing (GMD). Since 1990 A. B. Cremers has been professor and director of computer science and head of the research group in artificial intelligence at the University of Bonn. From Bonn he has contributed fundamentally to artificial intelligence and robotics (with Wolfram Burgard, Dieter Fox, Sebastian Thrun among his students), and to the development of software engineering, particularly in civil engineering, and information systems, particularly in the geosciences. The paper "The Interactive Museum Tour-Guide Robot" won the AAAI Classic Paper award of 2016. Together with Matthias Jarke A. B. Cremers established the Bonn-Aachen International Center for Information Technology (B-IT) in 2001 and led this as Founding Scientific Director from the University of Bonn side until his retirement from teaching in 2014. From 2004 to 2008 Cremers was Dean of the School of Mathematics and Natural Sciences, and from April 2009 to July 2014 University Vice President for Planning and Finance. He is member of advisory boards, e.g., as well as Chairman of the University Council of the University of Koblenz-Landau.

    Read more →
  • The Best Free AI Essay Writer for Beginners

    The Best Free AI Essay Writer for Beginners

    In search of the best AI essay writer? An AI essay writer is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI essay writer slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Energy-based model

    Energy-based model

    An energy-based model (EBM), also called Canonical Ensemble Learning (CEL) or Learning via Canonical Ensemble (LCE), is an application of canonical ensemble formulation from statistical physics for learning from data. The approach prominently appears in generative artificial intelligence. EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models. An EBM learns the characteristics of a target dataset and generates a similar but larger dataset. EBMs detect the latent variables of a dataset and generate new datasets with a similar distribution. Energy-based generative neural networks is a class of generative models, which aim to learn explicit probability distributions of data in the form of energy-based models, the energy functions of which are parameterized by modern deep neural networks. Boltzmann machines are a special form of energy-based models with a specific parametrization of the energy. == Description == For a given input x {\displaystyle x} , the model describes an energy E θ ( x ) {\displaystyle E_{\theta }(x)} such that the Boltzmann distribution P θ ( x ) = e − β E θ ( x ) Z ( θ ) {\displaystyle P_{\theta }(x)={e^{-\beta E_{\theta }(x)} \over Z(\theta )}} is a probability (density), and typically β = 1 {\displaystyle \beta =1} . Since the normalization constant: Z ( θ ) := ∫ x ∈ X e − β E θ ( x ) d x {\displaystyle Z(\theta ):=\int _{x\in X}e^{-\beta E_{\theta }(x)}dx} (also known as the partition function) depends on all the Boltzmann factors of all possible inputs x {\displaystyle x} , it cannot be easily computed or reliably estimated during training simply using standard maximum likelihood estimation. However, for maximizing the likelihood during training, the gradient of the log-likelihood of a single training example x {\displaystyle x} is given by using the chain rule: ∂ θ log ⁡ ( P θ ( x ) ) = E x ′ ∼ P θ [ ∂ θ E θ ( x ′ ) ] − ∂ θ E θ ( x ) ( ∗ ) {\displaystyle \partial _{\theta }\log \left(P_{\theta }(x)\right)=\mathbb {E} _{x'\sim P_{\theta }}[\partial _{\theta }E_{\theta }(x')]-\partial _{\theta }E_{\theta }(x)\,()} The expectation in the above formula for the gradient can be approximately estimated by drawing samples x ′ {\displaystyle x'} from the distribution P θ {\displaystyle P_{\theta }} using Markov chain Monte Carlo (MCMC). Early energy-based models, such as the 2003 Boltzmann machine by Hinton, estimated this expectation via blocked Gibbs sampling. Newer approaches make use of more efficient Stochastic Gradient Langevin Dynamics (LD), drawing samples using: x 0 ′ ∼ P 0 , x i + 1 ′ = x i ′ − α 2 ∂ E θ ( x i ′ ) ∂ x i ′ + ϵ {\displaystyle x_{0}'\sim P_{0},x_{i+1}'=x_{i}'-{\frac {\alpha }{2}}{\frac {\partial E_{\theta }(x_{i}')}{\partial x_{i}'}}+\epsilon } , where ϵ ∼ N ( 0 , α ) {\displaystyle \epsilon \sim {\mathcal {N}}(0,\alpha )} . A replay buffer of past values x i ′ {\displaystyle x_{i}'} is used with LD to initialize the optimization module. The parameters θ {\displaystyle \theta } of the neural network are therefore trained in a generative manner via MCMC-based maximum likelihood estimation: the learning process follows an "analysis by synthesis" scheme, where within each learning iteration, the algorithm samples the synthesized examples from the current model by a gradient-based MCMC method (e.g., Langevin dynamics or Hybrid Monte Carlo), and then updates the parameters θ {\displaystyle \theta } based on the difference between the training examples and the synthesized ones – see equation ( ∗ ) {\displaystyle ()} . This process can be interpreted as an alternating mode seeking and mode shifting process, and also has an adversarial interpretation. Essentially, the model learns a function E θ {\displaystyle E_{\theta }} that associates low energies to correct values, and higher energies to incorrect values. After training, given a converged energy model E θ {\displaystyle E_{\theta }} , the Metropolis–Hastings algorithm can be used to draw new samples. The acceptance probability is given by: P a c c ( x i → x ∗ ) = min ( 1 , P θ ( x ∗ ) P θ ( x i ) ) . {\displaystyle P_{acc}(x_{i}\to x^{})=\min \left(1,{\frac {P_{\theta }(x^{})}{P_{\theta }(x_{i})}}\right).} == History == The term "energy-based models" was first coined in a 2003 JMLR paper where the authors defined a generalisation of independent components analysis to the overcomplete setting using EBMs. Other early work on EBMs proposed models that represented energy as a composition of latent and observable variables. == Characteristics == EBMs demonstrate useful properties: Simplicity and stability. The EBM is the only object that needs to be designed and trained. Separate networks need not be trained to ensure balance. Adaptive computation time. An EBM can generate sharp, diverse samples or (more quickly) coarse, less diverse samples. Given infinite time, this procedure produces true samples. Flexibility. In Variational Autoencoders (VAE) and flow-based models, the generator learns a map from a continuous space to a (possibly) discontinuous space containing different data modes. EBMs can learn to assign low energies to disjoint regions (multiple modes). Adaptive generation. EBM generators are implicitly defined by the probability distribution, and automatically adapt as the distribution changes (without training), allowing EBMs to address domains where generator training is impractical, as well as minimizing mode collapse and avoiding spurious modes from out-of-distribution samples. Compositionality. Individual models are unnormalized probability distributions, allowing models to be combined through product of experts or other hierarchical techniques. == Experimental results == On image datasets such as CIFAR-10 and ImageNet 32x32, an EBM model generated high-quality images relatively quickly. It supported combining features learned from one type of image for generating other types of images. It was able to generalize using out-of-distribution datasets, outperforming flow-based and autoregressive models. EBM was relatively resistant to adversarial perturbations, behaving better than models explicitly trained against them with training for classification. == Applications == Target applications include natural language processing, robotics and computer vision. The first energy-based generative neural network is the generative ConvNet proposed in 2016 for image patterns, where the neural network is a convolutional neural network. The model has been generalized to various domains to learn distributions of videos, and 3D voxels. They are made more effective in their variants. They have proven useful for data generation (e.g., image synthesis, video synthesis, 3D shape synthesis, etc.), data recovery (e.g., recovering videos with missing pixels or image frames, 3D super-resolution, etc), data reconstruction (e.g., image reconstruction and linear interpolation ). == Alternatives == EBMs compete with techniques such as variational autoencoders (VAEs), generative adversarial networks (GANs) or normalizing flows. == Extensions == === Joint energy-based models === Joint energy-based models (JEM), proposed in 2020 by Grathwohl et al., allow any classifier with softmax output to be interpreted as energy-based model. The key observation is that such a classifier is trained to predict the conditional probability p θ ( y | x ) = e f → θ ( x ) [ y ] ∑ j = 1 K e f → θ ( x ) [ j ] for y = 1 , … , K and f → θ = ( f 1 , … , f K ) ∈ R K , {\displaystyle p_{\theta }(y|x)={\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{\sum _{j=1}^{K}e^{{\vec {f}}_{\theta }(x)[j]}}}\ \ {\text{ for }}y=1,\dotsc ,K{\text{ and }}{\vec {f}}_{\theta }=(f_{1},\dotsc ,f_{K})\in \mathbb {R} ^{K},} where f → θ ( x ) [ y ] {\displaystyle {\vec {f}}_{\theta }(x)[y]} is the y-th index of the logits f → {\displaystyle {\vec {f}}} corresponding to class y. Without any change to the logits it was proposed to reinterpret the logits to describe a joint probability density: p θ ( y , x ) = e f → θ ( x ) [ y ] Z ( θ ) , {\displaystyle p_{\theta }(y,x)={\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}},} with unknown partition function Z ( θ ) {\displaystyle Z(\theta )} and energy E θ ( x , y ) = − f θ ( x ) [ y ] {\displaystyle E_{\theta }(x,y)=-f_{\theta }(x)[y]} . By marginalization, we obtain the unnormalized density p θ ( x ) = ∑ y p θ ( y , x ) = ∑ y e f → θ ( x ) [ y ] Z ( θ ) =: e − E θ ( x ) , {\displaystyle p_{\theta }(x)=\sum _{y}p_{\theta }(y,x)=\sum _{y}{\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}}=:e^{-E_{\theta }(x)},} therefore, E θ ( x ) = − log ⁡ ( ∑ y e f → θ ( x ) [ y ] Z ( θ ) ) , {\displaystyle E_{\theta }(x)=-\log \left(\sum _{y}{\frac {e^{{\vec {f}}_{\theta }(x)[y]}}{Z(\theta )}}\right),} so that any classifier can be used to define an energy function E θ ( x ) {\displaystyle E_{\theta }(x)} .

    Read more →
  • AI Image Generators: Free vs Paid (2026)

    AI Image Generators: Free vs Paid (2026)

    Looking for the best AI image generator? An AI image generator is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI image generator slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • The Best Free AI Clip Maker for Beginners

    The Best Free AI Clip Maker for Beginners

    Looking for the best AI clip maker? An AI clip maker is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI clip maker slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • AI Code-review Tools: Free vs Paid (2026)

    AI Code-review Tools: Free vs Paid (2026)

    Comparing the best AI code-review tool? An AI code-review tool is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI code-review tool slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • ViEWER

    ViEWER

    ViEWER, the Virtual Environment Workbench for Education and Research, is a proprietary, freeware computer program for Microsoft Windows written by researchers at the University of Idaho for the study of visual perception and complex immersive three-dimensional environments. It was created using C++ and OpenGL, and has been used by Dr. Brian Dyre, Dr. Steffen Werner, Dr. Ernesto Bustamante, Dr. Ben Barton, and their undergraduate and graduate researchers in visual perception, signal detection, and child-safety experiments.

    Read more →
  • Adobe Enhanced Speech

    Adobe Enhanced Speech

    Adobe Enhanced Speech is an online artificial intelligence software tool by Adobe that aims to significantly improve the quality of recorded speech that may be badly muffled, reverberated, full of artifacts, tinny, etc. and convert it to a studio-grade, professional level, regardless of the initial input's clarity. Users may upload mp3 or wav files up to an hour long and a gigabyte in size to the site to convert them relatively quickly, then being free to listen to the converted version, toggle back-and-forth and alternate between it and the original as it plays, and download it. Currently in beta and free to the public, it has been used in the restoration of old movies and the creation of professional-quality podcasts, narrations, etc. by those without sufficient microphones. Although the model still has some current limitations, such as not being compatible with singing and occasional issues with excessively muffled source audio resulting in a light lisp in the improved version, it is otherwise noted as incredibly effective and efficient in its purpose. Utilizing advanced machine learning algorithms to distinguish between speech and background sounds, it enhances the quality of the speech by filtering out the noise and artifacts, adjusting the pitch and volume levels, and normalizing the audio. This is accomplished by the network having been trained on a large dataset of speech samples from a diverse range of sources and then being fine-tuned to optimize the output.

    Read more →
  • Max Welling

    Max Welling

    Max Welling (born 1968) is a Dutch computer scientist in machine learning at the University of Amsterdam. In August 2017, the university spin-off Scyfer BV, co-founded by Welling, was acquired by Qualcomm. He has since then served as a Vice President of Technology at Qualcomm Netherlands. He is also a Distinguished Scientist at Microsoft Research AI4Science, based in Amsterdam. Welling received his PhD in physics with a thesis on quantum gravity under the supervision of Nobel laureate Gerard 't Hooft (1998) at the Utrecht University. He has published over 250 peer-reviewed articles in machine learning, computer vision, statistics and physics, and has most notably invented variational autoencoders (VAEs), together with Diederik P Kingma. In 2025 Welling was elected member of the Royal Netherlands Academy of Arts and Sciences.

    Read more →