Variational autoencoder

Variational autoencoder

In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling in 2013. It is part of the families of probabilistic graphical models and variational Bayesian methods. In addition to being seen as an autoencoder neural network architecture, variational autoencoders can also be studied within the mathematical formulation of variational Bayesian methods, connecting a neural encoder network to its decoder through a probabilistic latent space (for example, as a multivariate Gaussian distribution) that corresponds to the parameters of a variational distribution. Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data. Both networks are typically trained together with the usage of the reparameterization trick, although the variance of the noise model can be learned separately. Although this type of model was initially designed for unsupervised learning, its effectiveness has been proven for semi-supervised learning and supervised learning. == Overview of architecture and operation == A variational autoencoder is a generative model with a prior and noise distribution respectively. Usually such models are trained using the expectation-maximization meta-algorithm (e.g. probabilistic PCA, (spike & slab) sparse coding). Such a scheme optimizes a lower bound of the data likelihood, which is usually computationally intractable, and in doing so requires the discovery of q-distributions, or variational posteriors. These q-distributions are normally parameterized for each individual data point in a separate optimization process. However, variational autoencoders use a neural network as an amortized approach to jointly optimize across data points. In that way, the same parameters are reused for multiple data points, which can result in massive memory savings. The first neural network takes as input the data points themselves, and outputs parameters for the variational distribution. As it maps from a known input space to the low-dimensional latent space, it is called the encoder. The decoder is the second neural network of this model. It is a function that maps from the latent space to the input space, e.g. as the means of the noise distribution. It is possible to use another neural network that maps to the variance, however this can be omitted for simplicity. In such a case, the variance can be optimized with gradient descent. To optimize this model, one needs to know two terms: the "reconstruction error", and the Kullback–Leibler divergence (KL-D). Both terms are derived from the free energy expression of the probabilistic model, and therefore differ depending on the noise distribution and the assumed prior of the data, here referred to as p-distribution. For example, a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed noise; however, tasks such as binarized MNIST require a Bernoulli noise. The KL-D from the free energy expression maximizes the probability mass of the q-distribution that overlaps with the p-distribution, which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of the free energy expression, and requires a sampling approximation to compute its expectation value. More recent approaches replace Kullback–Leibler divergence (KL-D) with various statistical distances, see "Statistical distance VAE variants" below. == Formulation == From the point of view of probabilistic modeling, one wants to maximize the likelihood of the data x {\displaystyle x} by their chosen parameterized probability distribution p θ ( x ) = p ( x | θ ) {\displaystyle p_{\theta }(x)=p(x|\theta )} . This distribution is usually chosen to be a Gaussian N ( x | μ , σ ) {\displaystyle N(x|\mu ,\sigma )} which is parameterized by μ {\displaystyle \mu } and σ {\displaystyle \sigma } respectively, and as a member of the exponential family it is easy to work with as a noise distribution. Simple distributions are easy enough to maximize, however distributions where a prior is assumed over the latents z {\displaystyle z} results in intractable integrals. Let us find p θ ( x ) {\displaystyle p_{\theta }(x)} via marginalizing over z {\displaystyle z} . p θ ( x ) = ∫ z p θ ( x , z ) d z , {\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }({x,z})\,dz,} where p θ ( x , z ) {\displaystyle p_{\theta }({x,z})} represents the joint distribution under p θ {\displaystyle p_{\theta }} of the observable data x {\displaystyle x} and its latent representation or encoding z {\displaystyle z} . According to the chain rule, the equation can be rewritten as p θ ( x ) = ∫ z p θ ( x | z ) p θ ( z ) d z {\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }({x|z})p_{\theta }(z)\,dz} In the vanilla variational autoencoder, z {\displaystyle z} is usually taken to be a finite-dimensional vector of real numbers, and p θ ( x | z ) {\displaystyle p_{\theta }({x|z})} to be a Gaussian distribution. Then p θ ( x ) {\displaystyle p_{\theta }(x)} is a mixture of Gaussian distributions. It is now possible to define the set of the relationships between the input data and its latent representation as Prior p θ ( z ) {\displaystyle p_{\theta }(z)} Likelihood p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} Posterior p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} Unfortunately, the computation of p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} is expensive and in most cases intractable. To speed up the calculus to make it feasible, it is necessary to introduce a further function to approximate the posterior distribution as q ϕ ( z | x ) ≈ p θ ( z | x ) {\displaystyle q_{\phi }({z|x})\approx p_{\theta }({z|x})} with ϕ {\displaystyle \phi } defined as the set of real values that parametrize q {\displaystyle q} . This is sometimes called amortized inference, since by "investing" in finding a good q ϕ {\displaystyle q_{\phi }} , one can later infer z {\displaystyle z} from x {\displaystyle x} quickly without doing any integrals. In this way, the problem is to find a good probabilistic autoencoder, in which the conditional likelihood distribution p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} is computed by the probabilistic decoder, and the approximated posterior distribution q ϕ ( z | x ) {\displaystyle q_{\phi }(z|x)} is computed by the probabilistic encoder. Parametrize the encoder as E ϕ {\displaystyle E_{\phi }} , and the decoder as D θ {\displaystyle D_{\theta }} . == Evidence lower bound (ELBO) == Like many deep learning approaches that use gradient-based optimization, VAEs require a differentiable loss function to update the network weights through backpropagation. For variational autoencoders, the idea is to jointly optimize the generative model parameters θ {\displaystyle \theta } to reduce the reconstruction error between the input and the output, and ϕ {\displaystyle \phi } to make q ϕ ( z | x ) {\displaystyle q_{\phi }({z|x})} as close as possible to p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} . As reconstruction loss, mean squared error and cross entropy are often used. The Kullback–Leibler divergence D K L ( q ϕ ( z | x ) ∥ p θ ( z | x ) ) {\displaystyle D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))} can be used as a loss function to squeeze q ϕ ( z | x ) {\displaystyle q_{\phi }({z|x})} under p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} . This divergence loss expands to D K L ( q ϕ ( z | x ) ∥ p θ ( z | x ) ) = E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( z | x ) ] = E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( x ) p θ ( x , z ) ] = ln ⁡ p θ ( x ) + E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( x , z ) ] . {\displaystyle {\begin{aligned}D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }(z|x)}{p_{\theta }(z|x)}}\right]\\&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})p_{\theta }(x)}{p_{\theta }(x,z)}}\right]\\&=\ln p_{\theta }(x)+\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})}{p_{\theta }(x,z)}}\right].\end{aligned}}} Now, define the evidence lower bound (ELBO): L θ , ϕ ( x ) := E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ p θ ( x , z ) q ϕ ( z | x ) ] = ln ⁡ p θ ( x ) − D K L ( q ϕ ( ⋅ | x ) ∥ p θ ( ⋅ | x ) ) {\displaystyle L_{\theta ,\phi }(x):=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\ln p_{\theta }(x)-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }({\cdot |x}))} Maximizing the ELBO θ ∗ , ϕ ∗ = argmax θ , ϕ L θ , ϕ ( x ) {\dis

Argumentation framework

In artificial intelligence and related fields, an argumentation framework is a way to deal with contentious information and draw conclusions from it using formalized arguments. In an abstract argumentation framework, entry-level information is a set of abstract arguments that, for instance, represent data or a proposition. Conflicts between arguments are represented by a binary relation on the set of arguments. In concrete terms, an argumentation framework is represented with a directed graph such that the nodes are the arguments, and the arrows represent the attack relation. There exist some extensions of the Dung's framework, like the logic-based argumentation frameworks or the value-based argumentation frameworks. == Abstract argumentation frameworks == === Formal framework === Abstract argumentation frameworks, also called argumentation frameworks à la Dung, are defined formally as a pair: A set of abstract elements called arguments, denoted A {\displaystyle A} A binary relation on A {\displaystyle A} , called attack relation, denoted R {\displaystyle R} For instance, the argumentation system S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } with A = { a , b , c , d } {\displaystyle A=\{a,b,c,d\}} and R = { ( a , b ) , ( b , c ) , ( d , c ) } {\displaystyle R=\{(a,b),(b,c),(d,c)\}} contains four arguments ( a , b , c {\displaystyle a,b,c} and d {\displaystyle d} ) and three attacks ( a {\displaystyle a} attacks b {\displaystyle b} , b {\displaystyle b} attacks c {\displaystyle c} and d {\displaystyle d} attacks c {\displaystyle c} ). Dung defines some notions : an argument a ∈ A {\displaystyle a\in A} is acceptable with respect to E ⊆ A {\displaystyle E\subseteq A} if and only if E {\displaystyle E} defends a {\displaystyle a} , that is ∀ b ∈ A {\displaystyle \forall b\in A} such that ( b , a ) ∈ R , ∃ c ∈ E {\displaystyle (b,a)\in R,\exists c\in E} such that ( c , b ) ∈ R {\displaystyle (c,b)\in R} , a set of arguments E {\displaystyle E} is conflict-free if there is no attack between its arguments, formally : ∀ a , b ∈ E , ( a , b ) ∉ R {\displaystyle \forall a,b\in E,(a,b)\not \in R} , a set of arguments E {\displaystyle E} is admissible if and only if it is conflict-free and all its arguments are acceptable with respect to E {\displaystyle E} . === Different semantics of acceptance === ==== Extensions ==== To decide if an argument can be accepted or not, or if several arguments can be accepted together, Dung defines several semantics of acceptance that allows, given an argumentation system, sets of arguments (called extensions) to be computed. For instance, given S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } , E {\displaystyle E} is a complete extension of S {\displaystyle S} only if it is an admissible set and every acceptable argument with respect to E {\displaystyle E} belongs to E {\displaystyle E} , E {\displaystyle E} is a preferred extension of S {\displaystyle S} only if it is a maximal element (with respect to the set-theoretical inclusion) among the admissible sets with respect to S {\displaystyle S} , E {\displaystyle E} is a stable extension of S {\displaystyle S} only if it is a conflict-free set that attacks every argument that does not belong in E {\displaystyle E} (formally, ∀ a ∈ A ∖ E , ∃ b ∈ E {\displaystyle \forall a\in A\backslash E,\exists b\in E} such that ( b , a ) ∈ R {\displaystyle (b,a)\in R} , E {\displaystyle E} is the (unique) grounded extension of S {\displaystyle S} only if it is the smallest element (with respect to set inclusion) among the complete extensions of S {\displaystyle S} . There exists some inclusions between the sets of extensions built with these semantics : Every stable extension is preferred, Every preferred extension is complete, The grounded extension is complete, If the system is well-founded (there exists no infinite sequence a 0 , a 1 , … , a n , … {\displaystyle a_{0},a_{1},\dots ,a_{n},\dots } such that ∀ i > 0 , ( a i + 1 , a i ) ∈ R {\displaystyle \forall i>0,(a_{i+1},a_{i})\in R} ), all these semantics coincide—only one extension is grounded, stable, preferred, and complete. Some other semantics have been defined. One introduce the notation E x t σ ( S ) {\displaystyle Ext_{\sigma }(S)} to note the set of σ {\displaystyle \sigma } -extensions of the system S {\displaystyle S} . In the case of the system S {\displaystyle S} in the figure above, E x t σ ( S ) = { { a , d } } {\displaystyle Ext_{\sigma }(S)=\{\{a,d\}\}} for every Dung's semantic—the system is well-founded. That explains why the semantics coincide, and the accepted arguments are: a {\displaystyle a} and d {\displaystyle d} . ==== Labellings ==== Labellings are a more expressive way than extensions to express the acceptance of the arguments. Concretely, a labelling is a mapping that associates every argument with a label in (the argument is accepted), out (the argument is rejected), or undec (the argument is undefined—not accepted or refused). One can also note a labelling as a set of pairs ( a r g u m e n t , l a b e l ) {\displaystyle ({\mathit {argument}},{\mathit {label}})} . Such a mapping does not make sense without additional constraint. The notion of reinstatement labelling guarantees the sense of the mapping. L {\displaystyle L} is a reinstatement labelling on the system S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } if and only if : ∀ a ∈ A , L ( a ) = i n {\displaystyle \forall a\in A,L(a)={\mathit {in}}} if and only if ∀ b ∈ A {\displaystyle \forall b\in A} such that ( b , a ) ∈ R , L ( b ) = o u t {\displaystyle (b,a)\in R,L(b)={\mathit {out}}} ∀ a ∈ A , L ( a ) = o u t {\displaystyle \forall a\in A,L(a)={\mathit {out}}} if and only if ∃ b ∈ A {\displaystyle \exists b\in A} such that ( b , a ) ∈ R {\displaystyle (b,a)\in R} and L ( b ) = i n {\displaystyle L(b)={\mathit {in}}} ∀ a ∈ A , L ( a ) = u n d e c {\displaystyle \forall a\in A,L(a)={\mathit {undec}}} if and only if L ( a ) ≠ i n {\displaystyle L(a)\neq {\mathit {in}}} and L ( a ) ≠ o u t {\displaystyle L(a)\neq {\mathit {out}}} One can convert every extension into a reinstatement labelling: the arguments of the extension are in, those attacked by an argument of the extension are out, and the others are undec. Conversely, one can build an extension from a reinstatement labelling just by keeping the arguments in. Indeed, Caminada proved that the reinstatement labellings and the complete extensions can be mapped in a bijective way. Moreover, the other Datung's semantics can be associated to some particular sets of reinstatement labellings. Reinstatement labellings distinguish arguments not accepted because they are attacked by accepted arguments from undefined arguments—that is, those that are not defended cannot defend themselves. An argument is undec if it is attacked by at least another undec. If it is attacked only by arguments out, it must be in, and if it is attacked some argument in, then it is out. The unique reinstatement labelling that corresponds to the system S {\displaystyle S} above is L = { ( a , i n ) , ( b , o u t ) , ( c , o u t ) , ( d , i n ) } {\displaystyle L=\{(a,{\mathit {in}}),(b,{\mathit {out}}),(c,{\mathit {out}}),(d,{\mathit {in}})\}} . === Inference from an argumentation system === In the general case when several extensions are computed for a given semantic σ {\displaystyle \sigma } , the agent that reasons from the system can use several mechanisms to infer information: Credulous inference: the agent accepts an argument if it belongs to at least one of the σ {\displaystyle \sigma } -extensions—in which case, the agent risks accepting some arguments that are not acceptable together ( a {\displaystyle a} attacks b {\displaystyle b} , and a {\displaystyle a} and b {\displaystyle b} each belongs to an extension) Skeptical inference: the agent accepts an argument only if it belongs to every σ {\displaystyle \sigma } -extension. In this case, the agent risks deducing too little information (if the intersection of the extensions is empty or has a very small cardinal). For these two methods to infer information, one can identify the set of accepted arguments, respectively C r σ ( S ) {\displaystyle Cr_{\sigma }(S)} the set of the arguments credulously accepted under the semantic σ {\displaystyle \sigma } , and S c σ ( S ) {\displaystyle Sc_{\sigma }(S)} the set of arguments accepted skeptically under the semantic σ {\displaystyle \sigma } (the σ {\displaystyle \sigma } can be missed if there is no possible ambiguity about the semantic). Of course, when there is only one extension (for instance, when the system is well-founded), this problem is very simple: the agent accepts arguments of the unique extension and rejects others. The same reasoning can be done with labellings that correspond to the chosen semantic : an argument can be accepted if it is in for each labelling and refused if it is out for each labelling, the others being in an undecided state (the status of the arguments can remind the

AI Essay Writers: Free vs Paid (2026)

Looking for the best AI essay writer? An AI essay writer is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI essay writer slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

Ginger Software

Ginger Software is an American and Israeli start-up specialized in natural language processing and AI. The main products are tools aiming to improve written communications, develop English speaking skills and boost productivity. The company was founded in 2008 by Yael Karov and Avner Zangvil. Ginger Software uses the context of complete sentences to suggest corrections. In December 2011, Ginger Software was one of nine projects approved by the Board of Governors of the Israel-U.S. Binational Industrial Research and Development Foundation for a funding of $8.1 million. The company also raised $3 million from private Israeli and US investors in 2009. In May, 2014 Intel acquired one of Ginger's business units and the rights to use the company's patented technology. == Founders == Before founding Ginger Software, Yael Karov had worked with Rosetta Genomics as its Chief Technology Officer and Vice President of Research and Development from 2003 to 2006, and with ClickSoftware Technologies as a Director of Research and Development from 1990 to 1994. Karov also founded Agentics, a company specializing in free-text classification of e-commerce product information based on natural language processing, in 1996. Avner Zangvil is the co-founder of Ginger Software. Zangvil co-founded Menta Software in 1996 with his brother Arnon Zangvil to develop a product that transforms any Windows-based application into a Web-enabled application usable from any remote computer running a Web browser. Menta was acquired by GraphOn Corporation in 2001. == Technology == Ginger Software uses patented software algorithms in the field of natural language processing. The company claims that the algorithm allows it to correct the written sentences with relatively high accuracy (eliminating up to 95 percent of writing errors), compared to standard spell checkers. Its unique algorithm allows the software to understand the context of the sentence rather than correcting based solely on a word. According to its founder, Karov, the software operates on the logic of sentence context in addition to the memory of a database of words. The company is at the heart of a growing revolution in the world of assistive technology. Ginger claims that the benefits of the software have been leveraged by native English and non-native speakers alike, and have also found value in niche markets like dyslexia management. They further claim that ESL users derive great benefit from the use of the software, as it lets them write error-free English text. Its use also extends to native English speaking business professionals and students who use it as a 'safety net' for their email edits, as well as international students writing in English. More recently, the company has focused on implementing its technology in mobile devices as an integral component of its mobile keyboard products. == Products == Ginger Software products include Ginger Page, a cross-platform writing enhancement app, and Ginger Keyboard which is available for Android devices. Ginger Writer can be used as an online service or installed on your PC or Mac. It supports MS-Word, MS-Outlook, MS-PowerPoint, Microsoft Edge, Chrome, and functions as a writing enhancement app for Android and iOS mobile devices. Its main feature is English grammar and spelling checker that runs seamlessly with the different user interfaces. It also has an advanced paraphrasing tool, contextual synonyms and definitions, translation and a text-to-speech function that enables users to hear sentences before and after correction. Ginger Keyboard for Android replaces the stock keyboard and functions as a productivity boosting keyboard app. Featuring a full set of advanced keyboard features like Stream (swipe-like) typing, adaptive word prediction, a wide variety of customizable themes and emoji, Ginger Keyboard is the only 3rd party keyboard to offer proofreading and other writing tools via one tap access to Ginger Page. == Target segment == Ginger Software started off targeting people with dyslexia. The algorithm underlying the software studies a vast pool of proper sentences in English and builds a model of proper language. The software does not analyze the text at the level of the word, but of the whole sentence. Dyslexics can have trouble choosing the right word – hence the attention to the sentence as a whole. From 2010, Ginger Software included a new target segment in its marketing outreach – users of English as a second language (ESL). Its contextual-based writing correction tool could benefit those who are not proficient in the English language. == Business model == The main business model for consumers is freemium. The free version offers contextual-based grammar and spelling checker with some limitations. Its premium features include unlimited access to Grammar Checker, the grammar and spelling checker, and Sentence Rephraser the rephrasing tool. Ginger Keyboard is free to download and use, although it does offer in-app purchases like themes and theme packs. It also disables your original spell checker. Ginger also provides a powerful Rest API which can correct full documents in one call.

Kalman filter

In statistics and control theory, Kalman filtering (also known as linear quadratic estimation) is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unknown variables that tend to be more accurate than those based on a single measurement, by estimating a joint probability distribution over the variables for each time-step. The filter is constructed as a mean squared error minimiser, but an alternative derivation of the filter is also provided showing how the filter relates to maximum likelihood statistics. The filter is named after Rudolf E. Kálmán. Kalman filtering has numerous technological applications. A common application is for guidance, navigation, and control of vehicles, particularly aircraft, spacecraft and ships positioned dynamically. Furthermore, Kalman filtering is much applied in time series analysis tasks such as signal processing and econometrics. Kalman filtering is also important for robotic motion planning and control, and can be used for trajectory optimization. Kalman filtering also works for modeling the central nervous system's control of movement. Due to the time delay between issuing motor commands and receiving sensory feedback, the use of Kalman filters provides a realistic model for making estimates of the current state of a motor system and issuing updated commands. The algorithm works via a two-phase process: a prediction phase and an update phase. In the prediction phase, the Kalman filter produces estimates of the current state variables, including their uncertainties. Once the outcome of the next measurement (necessarily corrupted with some error, including random noise) is observed, these estimates are updated using a weighted average, with more weight given to estimates with greater certainty. The algorithm is recursive. It can operate in real time, using only the present input measurements and the state calculated previously and its uncertainty matrix; no additional past information is required. Optimality of Kalman filtering assumes that errors have a normal (Gaussian) distribution. In the words of Rudolf E. Kálmán, "The following assumptions are made about random processes: Physical random phenomena may be thought of as due to primary random sources exciting dynamic systems. The primary sources are assumed to be independent gaussian random processes with zero mean; the dynamic systems will be linear." Regardless of Gaussianity, however, if the process and measurement covariances are known, then the Kalman filter is the best possible linear estimator in the minimum mean-square-error sense, although there may be better nonlinear estimators. It is a common misconception (perpetuated in the literature) that the Kalman filter cannot be rigorously applied unless all noise processes are assumed to be Gaussian. Extensions and generalizations of the method have also been developed, such as the extended Kalman filter and the unscented Kalman filter which work on nonlinear systems. The basis is a hidden Markov model such that the state space of the latent variables is continuous and all latent and observed variables have Gaussian distributions. Kalman filtering has been used successfully in multi-sensor fusion, and distributed sensor networks to develop distributed or consensus Kalman filtering. == History == The filtering method is named for Hungarian émigré Rudolf E. Kálmán, although Thorvald Nicolai Thiele and Peter Swerling developed a similar algorithm earlier. Richard S. Bucy of the Johns Hopkins Applied Physics Laboratory contributed to the theory, causing it to be known sometimes as Kalman–Bucy filtering. Kalman was inspired to derive the Kalman filter by applying state variables to the Wiener filtering problem. Stanley F. Schmidt is generally credited with developing the first implementation of a Kalman filter. He realized that the filter could be divided into two distinct parts, with one part for time periods between sensor outputs and another part for incorporating measurements. It was during a visit by Kálmán to the NASA Ames Research Center that Schmidt saw the applicability of Kálmán's ideas to the nonlinear problem of trajectory estimation for the Apollo program resulting in its incorporation in the Apollo navigation computer. This digital filter is sometimes termed the Stratonovich–Kalman–Bucy filter because it is a special case of a more general, nonlinear filter developed by the Soviet mathematician Ruslan Stratonovich. In fact, some of the special case linear filter's equations appeared in papers by Stratonovich that were published before the summer of 1961, when Kalman met with Stratonovich during a conference in Moscow. This Kalman filtering was first described and developed partially in technical papers by Swerling (1958), Kalman (1960) and Kalman and Bucy (1961). The Apollo computer used 2k of magnetic core RAM and 36k wire rope [...]. The CPU was built from ICs [...]. Clock speed was under 100 kHz [...]. The fact that the MIT engineers were able to pack such good software (one of the very first applications of the Kalman filter) into such a tiny computer is truly remarkable. Kalman filters have been vital in the implementation of the navigation systems of U.S. Navy nuclear ballistic missile submarines, and in the guidance and navigation systems of cruise missiles such as the U.S. Navy's Tomahawk missile and the U.S. Air Force's Air Launched Cruise Missile. They are also used in the guidance and navigation systems of reusable launch vehicles and the attitude control and navigation systems of spacecraft which dock at the International Space Station. == Overview of the calculation == Kalman filtering uses a system's dynamic model (e.g., physical laws of motion), known control inputs to that system, and multiple sequential measurements (such as from sensors) to form an estimate of the system's varying quantities (its state) that is better than the estimate obtained by using only one measurement alone. As such, it is a common sensor fusion and data fusion algorithm. Noisy sensor data, approximations in the equations that describe the system evolution, and external factors that are not accounted for, all limit how well it is possible to determine the system's state. The Kalman filter deals effectively with the uncertainty due to noisy sensor data and, to some extent, with random external factors. The Kalman filter produces an estimate of the state of the system as an average of the system's predicted state and of the new measurement using a weighted average. The purpose of the weights is that values with better (i.e., smaller) estimated uncertainty are "trusted" more. The weights are calculated from the covariance, a measure of the estimated uncertainty of the prediction of the system's state. The result of the weighted average is a new state estimate that lies between the predicted and measured state, and has a better estimated uncertainty than either alone. This process is repeated at every time step, with the new estimate and its covariance informing the prediction used in the following iteration. This means that Kalman filter works recursively and requires only the last "best guess", rather than the entire history, of a system's state to calculate a new state. The measurements' certainty-grading and current-state estimate are important considerations. It is common to discuss the filter's response in terms of the Kalman filter's gain. The Kalman gain is the weight given to the measurements and current-state estimate, and can be "tuned" to achieve a particular performance. With a high gain, the filter places more weight on the most recent measurements, and thus conforms to them more responsively. With a low gain, the filter conforms to the model predictions more closely. At the extremes, a high gain (close to one) will result in a more jumpy estimated trajectory, while a low gain (close to zero) will smooth out noise but decrease the responsiveness. When performing the actual calculations for the filter (as discussed below), the state estimate and covariances are coded into matrices because of the multiple dimensions involved in a single set of calculations. This allows for a representation of linear relationships between different state variables (such as position, velocity, and acceleration) in any of the transition models or covariances. == Example application == As an example application, consider the problem of determining the precise location of a truck. The truck can be equipped with a GPS unit that provides an estimate of the position within a few meters. The GPS estimate is likely to be noisy; readings 'jump around' rapidly, though remaining within a few meters of the real position. In addition, since the truck is expected to follow the laws of physics, its position can also be estimated by integrating its velocity over time, determined by keeping track of wheel revolutions and the

Database virtualization

Database virtualization is the decoupling of the database layer, which lies between the storage and application layers within the application stack. Virtualization of the database layer enables a shift away from the physical, toward the logical or virtual. Virtualization enables compute and storage resources to be pooled and allocated on demand. This enables both the sharing of single server resources for multi-tenancy, as well as the pooling of server resources into a single logical database or cluster. In both cases, database virtualization provides increased flexibility, more granular and efficient allocation of pooled resources, and more scalable computing. == Virtual data partitioning == The act of partitioning data stores as a database grows has been in use for several decades. There are two primary ways that data has been partitioned inside legacy data management systems: Shared-data databases: an architecture that assumes all database cluster nodes share a single partition. Inter-node communications are used to synchronize update activities performed by different nodes on the cluster. Shared-data data management systems are limited to single-digit node clusters. Shared-nothing databases: an architecture in which all data is segregated to internally managed partitions with clear, well-defined data location boundaries. Shared-nothing databases require manual partition management. In virtual partitioning, logical data is abstracted from physical data by autonomously creating and managing large numbers of data partitions (100s to 1000s). Because they are autonomously maintained, the resources required to manage the partitions are minimal. This kind of massive partitioning results in: Partitions that are small, efficiently managed, and load-balanced. Systems that do not require re-partitioning events to define additional partitions, even when the hardware is changed. “Shared-data” and “shared-nothing” architectures allow scalability through multiple data partitions and cross-partition querying and transaction processing without full partition scanning. == Horizontal data partitioning == Partitioning database sources from consumers is a fundamental concept. With greater numbers of database sources, inserting a horizontal data virtualization layer between the sources and consumers helps address this complexity. Rick van der Lans, the author of multiple books on SQL and relational databases, has defined data virtualization as "the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology." == Advantages == Added flexibility and agility for existing computing infrastructure. Enhanced database performance. Pooling and sharing computing resources, either splitting them (multi-tenancy) or combining them (clustering). Simplification of administration and management. Increased fault tolerance.

Moses (machine translation)

Moses is a statistical machine translation engine that can be used to train statistical models of text translation from a source language to a target language, developed by the University of Edinburgh. Moses then allows new source-language text to be decoded using these models to produce automatic translations in the target language. Training requires a parallel corpus of passages in the two languages, typically manually translated sentence pairs. Moses is free and open-source software, released under the GNU Library Public License (LGPL), and available as source code and binary files for Windows and Linux. Its development is supported mainly by the EuroMatrix project, with funding by the European Commission. Among its features are: A beam search algorithm that quickly finds the highest probability translation within a set of choices Phrase-based translation of short text chunks Handles words with multiple factored representations to enable integrating linguistic and other information (e.g., surface form, lemma and morphology, part-of-speech, word class) Decodes ambiguous forms of a source sentence, represented as a confusion network, to support integrating with upstream tools such as speech recognizers Support for large language models (LMs) such as IRSTLM (an exact LM using memory-mapping) and RandLM (an inexact LM based on Bloom filters)