AI For Business Guide

AI For Business Guide — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Weak supervision

    Weak supervision

    Weak supervision (also known as semi-supervised learning) is a paradigm in machine learning, the relevance and notability of which increased with the advent of large language models due to the large amount of data required to train them. It is characterized by using a combination of a small amount of human-labeled data (exclusively used in more expensive and time-consuming supervised learning paradigm), followed by a large amount of unlabeled data (used exclusively in unsupervised learning paradigm). In other words, the desired output values are provided only for a subset of the training data. The remaining data is unlabeled or imprecisely labeled. Intuitively, it can be seen as an exam and labeled data as sample problems that the teacher solves for the class as an aid in solving another set of problems. In the transductive setting, these unsolved problems act as exam questions. In the inductive setting, they become practice problems of the sort that will make up the exam. == Problem == The acquisition of labeled data for a learning problem often requires a skilled human agent (e.g. to transcribe an audio segment) or a physical experiment (e.g. determining the 3D structure of a protein or determining whether there is oil at a particular location). The cost associated with the labeling process thus may render large, fully labeled training sets infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value. Semi-supervised learning is also of theoretical interest in machine learning and as a model for human learning. == Technique == More formally, semi-supervised learning assumes a set of l {\displaystyle l} independently identically distributed examples x 1 , … , x l ∈ X {\displaystyle x_{1},\dots ,x_{l}\in X} with corresponding labels y 1 , … , y l ∈ Y {\displaystyle y_{1},\dots ,y_{l}\in Y} and u {\displaystyle u} unlabeled examples x l + 1 , … , x l + u ∈ X {\displaystyle x_{l+1},\dots ,x_{l+u}\in X} are processed. Semi-supervised learning combines this information to surpass the classification performance that can be obtained either by discarding the unlabeled data and doing supervised learning or by discarding the labels and doing unsupervised learning. Semi-supervised learning may refer to either transductive learning or inductive learning. The goal of transductive learning is to infer the correct labels for the given unlabeled data x l + 1 , … , x l + u {\displaystyle x_{l+1},\dots ,x_{l+u}} only. The goal of inductive learning is to infer the correct mapping from X {\displaystyle X} to Y {\displaystyle Y} . It is unnecessary (and, according to Vapnik's principle, imprudent) to perform transductive learning by way of inferring a classification rule over the entire input space; however, in practice, algorithms formally designed for transduction or induction are often used interchangeably. == Assumptions == In order to make any use of unlabeled data, some relationship to the underlying distribution of data must exist. Semi-supervised learning algorithms make use of at least one of the following assumptions: === Continuity / smoothness assumption === Points that are close to each other are more likely to share a label. This is also generally assumed in supervised learning and yields a preference for geometrically simple decision boundaries. In the case of semi-supervised learning, the smoothness assumption additionally yields a preference for decision boundaries in low-density regions, so few points are close to each other but in different classes. === Cluster assumption === The data tend to form discrete clusters, and points in the same cluster are more likely to share a label (although data that shares a label may spread across multiple clusters). This is a special case of the smoothness assumption and gives rise to feature learning with clustering algorithms. === Manifold assumption === The data lie approximately on a manifold of much lower dimension than the input space. In this case learning the manifold using both the labeled and unlabeled data can avoid the curse of dimensionality. Then learning can proceed using distances and densities defined on the manifold. The manifold assumption is practical when high-dimensional data are generated by some process that may be hard to model directly, but which has only a few degrees of freedom. For instance, human voice is controlled by a few vocal folds, and images of various facial expressions are controlled by a few muscles. In these cases, it is better to consider distances and smoothness in the natural space of the generating problem, rather than in the space of all possible acoustic waves or images, respectively. == History == The heuristic approach of self-training (also known as self-learning or self-labeling) is historically the oldest approach to semi-supervised learning, with examples of applications starting in the 1960s. The transductive learning framework was formally introduced by Vladimir Vapnik in the 1970s. Interest in inductive learning using generative models also began in the 1970s. A probably approximately correct learning bound for semi-supervised learning of a Gaussian mixture was demonstrated by Ratsaby and Venkatesh in 1995. == Methods == === Generative models === Generative approaches to statistical learning first seek to estimate p ( x | y ) {\displaystyle p(x|y)} , the distribution of data points belonging to each class. The probability p ( y | x ) {\displaystyle p(y|x)} that a given point x {\displaystyle x} has label y {\displaystyle y} is then proportional to p ( x | y ) p ( y ) {\displaystyle p(x|y)p(y)} by Bayes' rule. Semi-supervised learning with generative models can be viewed either as an extension of supervised learning (classification plus information about p ( x ) {\displaystyle p(x)} ) or as an extension of unsupervised learning (clustering plus some labels). Generative models assume that the distributions take some particular form p ( x | y , θ ) {\displaystyle p(x|y,\theta )} parameterized by the vector θ {\displaystyle \theta } . If these assumptions are incorrect, the unlabeled data may actually decrease the accuracy of the solution relative to what would have been obtained from labeled data alone. However, if the assumptions are correct, then the unlabeled data necessarily improves performance. The unlabeled data are distributed according to a mixture of individual-class distributions. In order to learn the mixture distribution from the unlabeled data, it must be identifiable, that is, different parameters must yield different summed distributions. Gaussian mixture distributions are identifiable and commonly used for generative models. The parameterized joint distribution can be written as p ( x , y | θ ) = p ( y | θ ) p ( x | y , θ ) {\displaystyle p(x,y|\theta )=p(y|\theta )p(x|y,\theta )} by using the chain rule. Each parameter vector θ {\displaystyle \theta } is associated with a decision function f θ ( x ) = argmax y p ( y | x , θ ) {\displaystyle f_{\theta }(x)={\underset {y}{\operatorname {argmax} }}\ p(y|x,\theta )} . The parameter is then chosen based on fit to both the labeled and unlabeled data, weighted by λ {\displaystyle \lambda } : argmax Θ ( log ⁡ p ( { x i , y i } i = 1 l | θ ) + λ log ⁡ p ( { x i } i = l + 1 l + u | θ ) ) {\displaystyle {\underset {\Theta }{\operatorname {argmax} }}\left(\log p(\{x_{i},y_{i}\}_{i=1}^{l}|\theta )+\lambda \log p(\{x_{i}\}_{i=l+1}^{l+u}|\theta )\right)} === Low-density separation === Another major class of methods attempts to place boundaries in regions with few data points (labeled or unlabeled). One of the most commonly used algorithms is the transductive support vector machine, or TSVM (which, despite its name, may be used for inductive learning as well). Whereas support vector machines for supervised learning seek a decision boundary with maximal margin over the labeled data, the goal of TSVM is a labeling of the unlabeled data such that the decision boundary has maximal margin over all of the data. In addition to the standard hinge loss ( 1 − y f ( x ) ) + {\displaystyle (1-yf(x))_{+}} for labeled data, a loss function ( 1 − | f ( x ) | ) + {\displaystyle (1-|f(x)|)_{+}} is introduced over the unlabeled data by letting y = sign ⁡ f ( x ) {\displaystyle y=\operatorname {sign} {f(x)}} . TSVM then selects f ∗ ( x ) = h ∗ ( x ) + b {\displaystyle f^{}(x)=h^{}(x)+b} from a reproducing kernel Hilbert space H {\displaystyle {\mathcal {H}}} by minimizing the regularized empirical risk: f ∗ = argmin f ( ∑ i = 1 l ( 1 − y i f ( x i ) ) + + λ 1 ‖ h ‖ H 2 + λ 2 ∑ i = l + 1 l + u ( 1 − | f ( x i ) | ) + ) {\displaystyle f^{}={\underset {f}{\operatorname {argmin} }}\left(\displaystyle \sum _{i=1}^{l}(1-y_{i}f(x_{i}))_{+}+\lambda _{1}\|h\|_{\mathcal {H}}^{2}+\lambda _{2}\sum _{i=l+1}^{l+u}(1-|f(x_{i})|)_{+}\right)} An exact solution is intractable due to the non-convex term ( 1 − | f ( x ) | ) + {\displayst

    Read more →
  • Karl Steinbuch

    Karl Steinbuch

    Karl W. Steinbuch (June 15, 1917 in Stuttgart-Bad Cannstatt – June 4, 2005 in Ettlingen) was a German computer scientist, cyberneticist, and electrical engineer. He was an early and influential researcher in German computer science, and was the developer of the Lernmatrix, an early implementation of artificial neural networks. From the late 1960s onwards the focus of his activity shifted from scientific research to right-wing political activism supporting the Neue Rechte. == Biography == Steinbuch joined the National Socialist German Students' League (NSDStB) and the Nazi Party. Steinbuch studied at the University of Stuttgart and in 1944 he received his PhD in physics. In 1948 he joined Standard Elektrik Lorenz (SEL, part of the ITT group) in Stuttgart, as a computer design engineer and later as a director of research and development, where he filed more than 70 patents. Steinbuch completed the first European fully transistorized computer, the ER 56 marketed by SEL. In 1958 he became professor and director of the Institute of Technology for information processing (ITIV) of the University of Karlsruhe, where he retired in 1980. In 1967 he began publishing books, in which he tried to influence German education policy. Together with books from colleagues like Jean Ziegler from Switzerland, Eric J. Hobsbawm from the UK, and John Naisbitt his books predicted what he regarded as the coming education disaster of the emerging civic lobby society. In 1957, together with Helmut Gröttrup, Steinbuch coined the term Informatik, the German word for computer science, which gave informatics, and the term kybernetische Anthropologie. == Awards and recognition == Wilhelm-Boelsche award - medal in Gold German non-fiction book award Gold medal award of the XXI. International Congresses on Aerospace Medicine Konrad Adenauer award of science Jakob Fugger award medal Medal of merit of the state of Baden-Wuerttemberg member, German Academy of Sciences Leopoldina member, International Academy of Science, Munich. grants from a state government grants program, named "Karl-Steinbuch-Stipendium" Steinbuch Centre for Computing at the Karlsruhe Institute of Technology named after him == Books == Steinbuch wrote several books and articles, including: 1957 Informatik: Automatische Informationsverarbeitung ("Informatics: automatic information processing"). 1963 Learning matrices and their applications (together with U. A. W. Piske) 1965 A critical comparison of two kinds of adaptive classification networks (together with Bernard Widrow) 1966 (1969): Die informierte Gesellschaft. Geschichte und Zukunft der Nachrichtentechnik (The informed society. History and Future of telecommunications) 1989: Die desinformierte Gesellschaft (The disinformed society) 1968: Falsch programmiert. Über das Versagen unserer Gesellschaft in der Gegenwart und vor der Zukunft und was eigentlich geschehen müßte. (as a bestseller listet in: Der Spiegel) (Programmed falsely. About our society's failure in the present and with respect to the future and what should be done.) 1969: Programm 2000. (as a bestseller listet in: Der Spiegel) 1971: Automat und Mensch. Auf dem Weg zu einer kybernetischen Anthropologie (Machine and Man. On the way to a cybernetic anthropology; 4th revised edition) 1971: Mensch Technik Zukunft. Probleme von Morgen (German non-fiction book award) (Man Technology Future. Problems of Tomorrow) 1973: Kurskorrektur (Correcting the Course) 1978: Maßlos informiert. Die Enteignung des Denkens (Excessively informed. The Deprivation of Thinking) 1984: Unsere manipulierte Demokratie. Müssen wir mit der linken Lüge leben? (Our Thought-controlled Democracy. Do we have to live with the leftist lie?)

    Read more →
  • Is an AI Essay Writer Worth It in 2026?

    Is an AI Essay Writer Worth It in 2026?

    Comparing the best AI essay writer? An AI essay writer is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI essay writer slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Sparse dictionary learning

    Sparse dictionary learning

    Sparse dictionary learning (also known as sparse coding or SDL) is a representation learning method which aims to find a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms, and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than any one of the signals being observed. These two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal, but also provide an improvement in sparsity and flexibility of the representation. One of the most important applications of sparse dictionary learning is in the field of compressed sensing or signal recovery. In compressed sensing, a high-dimensional signal can be recovered with only a few linear measurements, provided that the signal is sparse or near-sparse. Since not all signals satisfy this condition, it is crucial to find a sparse representation of that signal such as the wavelet transform or the directional gradient of a rasterized matrix. Once a matrix or a high-dimensional vector is transferred to a sparse space, different recovery algorithms like basis pursuit, CoSaMP, or fast non-iterative algorithms can be used to recover the signal. One of the key principles of dictionary learning is that the dictionary has to be inferred from the input data. The emergence of sparse dictionary learning methods was stimulated by the fact that in signal processing, one typically wants to represent the input data using a minimal amount of components. Before this approach, the general practice was to use predefined dictionaries such as Fourier or wavelet transforms. However, in certain cases, a dictionary that is trained to fit the input data can significantly improve the sparsity, which has applications in data decomposition, compression, and analysis, and has been used in the fields of image denoising and classification, and video and audio processing. Sparsity and overcomplete dictionaries have immense applications in image compression, image fusion, and inpainting. == Problem statement == Given the input dataset X = [ x 1 , . . . , x K ] , x i ∈ R d {\displaystyle X=[x_{1},...,x_{K}],x_{i}\in \mathbb {R} ^{d}} we wish to find a dictionary D ∈ R d × n : D = [ d 1 , . . . , d n ] {\displaystyle \mathbf {D} \in \mathbb {R} ^{d\times n}:D=[d_{1},...,d_{n}]} and a representation R = [ r 1 , . . . , r K ] , r i ∈ R n {\displaystyle R=[r_{1},...,r_{K}],r_{i}\in \mathbb {R} ^{n}} such that both ‖ X − D R ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}} is minimized and the representations r i {\displaystyle r_{i}} are sparse enough. This can be formulated as the following optimization problem: argmin D ∈ C , r i ∈ R n ∑ i = 1 K ‖ x i − D r i ‖ 2 2 + λ ‖ r i ‖ 0 {\displaystyle {\underset {\mathbf {D} \in {\mathcal {C}},r_{i}\in \mathbb {R} ^{n}}{\text{argmin}}}\sum _{i=1}^{K}\|x_{i}-\mathbf {D} r_{i}\|_{2}^{2}+\lambda \|r_{i}\|_{0}} , where C ≡ { D ∈ R d × n : ‖ d i ‖ 2 ≤ 1 ∀ i = 1 , . . . , n } {\displaystyle {\mathcal {C}}\equiv \{\mathbf {D} \in \mathbb {R} ^{d\times n}:\|d_{i}\|_{2}\leq 1\,\,\forall i=1,...,n\}} , λ > 0 {\displaystyle \lambda >0} C {\displaystyle {\mathcal {C}}} is required to constrain D {\displaystyle \mathbf {D} } so that its atoms would not reach arbitrarily high values allowing for arbitrarily low (but non-zero) values of r i {\displaystyle r_{i}} . λ {\displaystyle \lambda } controls the trade off between the sparsity and the minimization error. The minimization problem above is not convex because of the ℓ0-"norm" and solving this problem is NP-hard. In some cases L1-norm is known to ensure sparsity and so the above becomes a convex optimization problem with respect to each of the variables D {\displaystyle \mathbf {D} } and R {\displaystyle \mathbf {R} } when the other one is fixed, but it is not jointly convex in ( D , R ) {\displaystyle (\mathbf {D} ,\mathbf {R} )} . === Properties of the dictionary === The dictionary D {\displaystyle \mathbf {D} } defined above can be "undercomplete" if n < d {\displaystyle n d {\displaystyle n>d} with the latter being a typical assumption for a sparse dictionary learning problem. The case of a complete dictionary does not provide any improvement from a representational point of view and thus isn't considered. Undercomplete dictionaries represent the setup in which the actual input data lies in a lower-dimensional space. This case is strongly related to dimensionality reduction and techniques like principal component analysis which require atoms d 1 , . . . , d n {\displaystyle d_{1},...,d_{n}} to be orthogonal. The choice of these subspaces is crucial for efficient dimensionality reduction, but it is not trivial. And dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analysis or classification. However, their main downside is limiting the choice of atoms. Overcomplete dictionaries, however, do not require the atoms to be orthogonal (they will never have a basis anyway) thus allowing for more flexible dictionaries and richer data representations. An overcomplete dictionary which allows for sparse representation of signal can be a famous transform matrix (wavelets transform, fourier transform) or it can be formulated so that its elements are changed in such a way that it sparsely represents the given signal in a best way. Learned dictionaries are capable of giving sparser solutions as compared to predefined transform matrices. == Algorithms == As the optimization problem described above can be solved as a convex problem with respect to either dictionary or sparse coding while the other one of the two is fixed, most of the algorithms are based on the idea of iteratively updating one and then the other. The problem of finding an optimal sparse coding R {\displaystyle R} with a given dictionary D {\displaystyle \mathbf {D} } is known as sparse approximation (or sometimes just sparse coding problem). A number of algorithms have been developed to solve it (such as matching pursuit and LASSO) and are incorporated in the algorithms described below. === Method of optimal directions (MOD) === The method of optimal directions (or MOD) was one of the first methods introduced to tackle the sparse dictionary learning problem. The core idea of it is to solve the minimization problem subject to the limited number of non-zero components of the representation vector: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T} Here, F {\displaystyle F} denotes the Frobenius norm. MOD alternates between getting the sparse coding using a method such as matching pursuit and updating the dictionary by computing the analytical solution of the problem given by D = X R + {\displaystyle \mathbf {D} =XR^{+}} where R + {\displaystyle R^{+}} is a Moore-Penrose pseudoinverse. After this update D {\displaystyle \mathbf {D} } is renormalized to fit the constraints and the new sparse coding is obtained again. The process is repeated until convergence (or until a sufficiently small residue). MOD has proved to be a very efficient method for low-dimensional input data X {\displaystyle X} requiring just a few iterations to converge. However, due to the high complexity of the matrix-inversion operation, computing the pseudoinverse in high-dimensional cases is in many cases intractable. This shortcoming has inspired the development of other dictionary learning methods. === K-SVD === K-SVD is an algorithm that performs SVD at its core to update the atoms of the dictionary one by one and basically is a generalization of K-means. It enforces that each element of the input data x i {\displaystyle x_{i}} is encoded by a linear combination of not more than T 0 {\displaystyle T_{0}} elements in a way identical to the MOD approach: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T 0 {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T_{0}} This algorithm's essence is to first fix the dictionary, find the best possible R {\displaystyle R} under the above constraint (using Orthogonal Matching Pursuit) and then iteratively update the atoms of dictionary D {\displaystyle \mathbf {D} } in the following manner: ‖ X − D R ‖ F 2 = | X − ∑ i = 1 K d i x T i | F 2 = ‖ E k − d k x T k ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}=\left|X-\sum _{i=1}^{K}d_{i}x_{T}^{i}\right|_{F}^{2}=\|E_{k}-d_{k}x_{T}^{k}\|_{F}^{2}} The next steps of the algorithm include rank-1 approximation of the residual matrix E k {\displaystyle E_{k}} , updating d k {\displaystyle d_{k}} and enforcing the s

    Read more →
  • Highway network

    Highway network

    In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks. The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem, thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways"). Highway Networks have found use in text sequence labeling and speech recognition tasks. In 2014, the state of the art was training deep neural networks with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet (December). ResNet behaves like an open-gated Highway Net. == Model == The model has two gates in addition to the H ( W H , x ) {\displaystyle H(W_{H},x)} gate: the transform gate T ( W T , x ) {\displaystyle T(W_{T},x)} and the carry gate C ( W C , x ) {\displaystyle C(W_{C},x)} . The latter two gates are non-linear transfer functions (specifically sigmoid by convention). The function H {\displaystyle H} can be any desired transfer function. The carry gate is defined as: C ( W C , x ) = 1 − T ( W T , x ) {\displaystyle C(W_{C},x)=1-T(W_{T},x)} while the transform gate is just a gate with a sigmoid transfer function. == Structure == The structure of a hidden layer in the Highway Network follows the equation: y = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ C ( x , W C ) = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ ( 1 − T ( x , W T ) ) {\displaystyle {\begin{aligned}y=H(x,W_{H})\cdot T(x,W_{T})+x\cdot C(x,W_{C})\\=H(x,W_{H})\cdot T(x,W_{T})+x\cdot (1-T(x,W_{T}))\end{aligned}}} == Related work == Sepp Hochreiter analyzed the vanishing gradient problem in 1991 and attributed to it the reason why deep learning did not work well. To overcome this problem, Long Short-Term Memory (LSTM) recurrent neural networks have residual connections with a weight of 1.0 in every LSTM cell (called the constant error carrousel) to compute y t + 1 = F ( x t ) + x t {\textstyle y_{t+1}=F(x_{t})+x_{t}} . During backpropagation through time, this becomes the residual formula y = F ( x ) + x {\textstyle y=F(x)+x} for feedforward neural networks. This enables training very deep recurrent neural networks with a very long time span t. A later LSTM version published in 2000 modulates the identity LSTM connections by so-called "forget gates" such that their weights are not fixed to 1.0 but can be learned. In experiments, the forget gates were initialized with positive bias weights, thus being opened, addressing the vanishing gradient problem. As long as the forget gates of the 2000 LSTM are open, it behaves like the 1997 LSTM. The Highway Network of May 2015 applies these principles to feedforward neural networks. It was reported to be "the first very deep feedforward network with hundreds of layers". It is like a 2000 LSTM with forget gates unfolded in time, while the later Residual Nets have no equivalent of forget gates and are like the unfolded original 1997 LSTM. If the skip connections in Highway Networks are "without gates," or if their gates are kept open (activation 1.0), they become Residual Networks. The residual connection is a special case of the "short-cut connection" or "skip connection" by Rosenblatt (1961) and Lang & Witbrock (1988) which has the form x ↦ F ( x ) + A x {\displaystyle x\mapsto F(x)+Ax} . Here the randomly initialized weight matrix A does not have to be the identity mapping. Every residual connection is a skip connection, but almost all skip connections are not residual connections. The original Highway Network paper not only introduced the basic principle for very deep feedforward networks, but also included experimental results with 20, 50, and 100 layers networks, and mentioned ongoing experiments with up to 900 layers. Networks with 50 or 100 layers had lower training error than their plain network counterparts, but no lower training error than their 20 layers counterpart (on the MNIST dataset, Figure 1 in ). No improvement on test accuracy was reported with networks deeper than 19 layers (on the CIFAR-10 dataset; Table 1 in ). The ResNet paper, however, provided strong experimental evidence of the benefits of going deeper than 20 layers. It argued that the identity mapping without modulation is crucial and mentioned that modulation in the skip connection can still lead to vanishing signals in forward and backward propagation (Section 3 in ). This is also why the forget gates of the 2000 LSTM were initially opened through positive bias weights: as long as the gates are open, it behaves like the 1997 LSTM. Similarly, a Highway Net whose gates are opened through strongly positive bias weights behaves like a ResNet. The skip connections used in modern neural networks (e.g., Transformers) are dominantly identity mappings.

    Read more →
  • Nicolò Cesa-Bianchi

    Nicolò Cesa-Bianchi

    Nicolò Cesa-Bianchi (Italian pronunciation: [nikoˈlɔ tˈtʃɛːza ˈbjaŋki]) is an Italian computer scientist and Professor of Computer Science at the Department of Computer Science of the University of Milan. He is a researcher in the field of machine learning, and co-author of the books "Prediction, Learning, and Games" with Gabor Lugosi and "Regret analysis of stochastic and nonstochastic multi-armed bandit problems" with Sébastien Bubeck == Education and career == Cesa-Bianchi graduated in Computer Science from the University of Milan in 1988 where he received a PhD in Computer Science in 1993 supervised by Alberto Bertoni. During his PhD, he visited UC Santa Cruz where he worked with Manfred Warmuth and David Haussler. He did his postdoctoral studies at Graz University of Technology under the supervision of Wolfgang Maass. == Research == His research contributions focus on the following areas: design and analysis of machine learning algorithms, especially in online machine learning algorithms for multi-armed bandit problems, with applications to recommender systems and online auctions graph analytics, with applications to social networks and bioinformatics == Awards and honors == Cesa-Bianchi received a Google Research Award in 2010, a Xerox University Affairs Committee Award in 2011, a Criteo Faculty Award in 2017, a Google Faculty Award in 2018, and a IBM Academic Award in 2021. Since 2023 he is corresponding member of the Accademia dei Lincei.

    Read more →
  • Barbara Di Eugenio

    Barbara Di Eugenio

    Barbara Di Eugenio is an Italian-American computer scientist, the Collegiate Warren S. McCulloch Professor of Computer Science at the University of Illinois Chicago. Her research focuses on natural language processing and its applications to human–computer interaction, educational technology, and artificial intelligence in healthcare. == Education and career == Di Eugenio is originally from Turin. After an undergraduate education in Italy, she completed her Ph.D. in computer and information science in 1993 at the University of Pennsylvania. Her dissertation, Understanding Natural Language Instructions: A Computational Approach to Purpose Clauses, was supervised by Bonnie Webber. She became a faculty member at the University of Illinois Chicago in 1999, and at that time was the only woman faculty member in the Department of Electrical Engineering and Computer Science. == Recognition == In 2022, Di Eugenio received the Zenith Award of the Association for Women in Science. She was named as a Fellow of the Association for Computational Linguistics in 2023, "for outstanding contributions to natural language generation; intelligent tutoring systems; discourse; intercoder agreement; and applying multimodal interactive systems to health".

    Read more →
  • Bruno Zamborlin

    Bruno Zamborlin

    Bruno Zamborlin (born 1983 in Vicenza) is an AI researcher, entrepreneur and artist based in London, working in the field of human-computer interaction. His work focuses on converting physical objects into touch-sensitive, interactive surfaces using vibration sensors and artificial intelligence. In 2013, he founded Mogees Limited a start-up to transform everyday objects into musical instruments and games using a vibration sensor and a mobile phone. With HyperSurfaces, he converts physical surfaces of any material, shape and form into data-enabled-interactive surfaces using a vibration sensor and a coin-sized chipset. As an artist, he has created art installations around the world, with his most recent work comprising a unique series of "sound furnitures" that was showcased at the Italian Pavilion of the Venice Biennale 2023. He regularly performed with UK-based electronic music duo Plaid (Warp Records). He is also honorary visiting research fellow at Goldsmiths, University of London. == Early life and education == From 2008-2011, Zamborlin worked at the IRCAM (Institute for Research and Coordination Acoustic Musical) – Centre Pompidou as a member of the Sound Music Movement Interaction team. Under the supervision of Frederic Bevilacqua, he started experimenting with the use of artificial intelligence and human movements, and contributed to the creation of Gesture Follower, a software used to analyse body movements of performers and dancers through motion sensors in order to control sound and visual media in real-time, slowing down or speeding up their reproduction based on the speed the gestures are performed. He has lived in London since 2011, where he developed a joint PhD between Goldsmiths, University of London and IRCAM - Centre Pompidou/Pierre and Marie Curie University Paris in AI, focussing on the concept of Interactive Machine Learning applied to digital musical instruments and performing arts. == Career == Zamborlin founded Mogees Limited in 2013 in London, with IRCAM being amongst the early partners. Mogees transform physical objects into musical instruments and games using a vibration sensor and a series of apps for smartphones and desktop. After a campaign on Kickstarter in 2014, Mogees was used both by common users and artists such as Rodrigo y Gabriela, Jean-Michel Jarre and Plaid. The algorithms implemented in these apps employ a special version of physical modelling sound synthesis, where the vibration produced by users when interacting with the physical object are used as exciter for a digital resonator which runs in the app. The result is a hybrid, half acoustic and half digital sound which is a function of both software and acoustic properties of the physical object the users decide to play. In 2017, Zamborlin founded HyperSurfaces together with computational artist Parag K Mital. to merge "the physical and the digital worlds". HyperSurfaces technology converts any surface made of any material, shape and size into data-enabled interactive objects, employing a vibration sensor and proprietary AI algorithms running on a coin-sized chipset. The vibrations generated by people's interactions on the surface are converted into an electric signal by a piezoelectric sensor and analysed in realtime by AI algorithms that run on the chipset. Anytime the AI recognises in the vibration signal one of the events that have been predefined by the user beforehand, a corresponding notification message is generated in realtime and sent to some application. The technology can be applied to anything ranging from button-less human-computer interaction applications for automotive and smart home to the Internet of things. Because the AI algorithms employed by HyperSurfaces run locally on a chipset, without the need to access cloud-based services, they are considered to be part of the field of edge computing. Also, because the AI can be trained beforehand to recognise the events its users are interested in, HyperSurfaces algorithms belong to the field of supervised machine learning. == Selected awards == IRISA Prix Jeune Chercheur, 13 October 2012 NeMoDe, New Economic Models in the Digital Economy, 25 October 2012 == Patents and academic publications == United States pending US10817798B2, Bruno Zamborlin & Carmine Emanuele Cella, "Method to recognize a gesture and corresponding device", published 27 April 2016, assigned to Mogees Limited GB Pending WO/2019/086862, Bruno Zamborlin; Conor Barry & Alessandro Saccoia et al., "A user interface for vehicles", published 9 May 2019, assigned to Mogees Limited GB Pending WO/2019/086863, Bruno Zamborlin; Conor Barry & Alessandro Saccoia et al., "Trigger for game events", published 9 May 2019, assigned to Mogees Limited Bevilacqua, Frédéric; Zamborlin, Bruno; Sypniewski, Anthony; Schnell, Norbert; Guédy, Fabrice; Rasamimanana, Nicolas (2010). "Continuous Realtime Gesture Following and Recognition". Gesture in Embodied Communication and Human-Computer Interaction. Lecture Notes in Computer Science. Vol. 5934. pp. 73–84. doi:10.1007/978-3-642-12553-9_7. ISBN 978-3-642-12552-2. S2CID 16251822. Retrieved 17 January 2021. Rasamimanana, Nicolas; Bevilacqua, Frédéric; Schnell, Norbert; Guédy, Fabrice; Flety, Emmanuel; Maestracci, Come; Zamborlin, Bruno (January 2010). "Modular musical objects towards embodied control of digital music". Proceedings of the fifth international conference on Tangible, embedded, and embodied interaction. Tei '11. pp. 9–12. doi:10.1145/1935701.1935704. ISBN 9781450304788. S2CID 10782645. Retrieved 17 January 2021. Bevilacqua, Frédéric; Schnell, Norbert; Rasamimanana, Nicolas; Zamborlin, Bruno; Guedy, Fabrice (2011). "Online Gesture Analysis and Control of Audio Processing". Musical Robots and Interactive Multimodal Systems. Springer Tracts in Advanced Robotics. Vol. 74. pp. 127–142. doi:10.1007/978-3-642-22291-7_8. ISBN 978-3-642-22290-0. Retrieved 17 January 2021. Zamborlin, Bruno; Bevilacqua, Frédéric; Gillies, Marco; D'Inverno, Mark (15 January 2014). "Fluid gesture interaction design: Applications of continuous recognition for the design of modern gestural interfaces". ACM Transactions on Interactive Intelligent Systems. 3 (4): 22:1–22:30. doi:10.1145/2543921. S2CID 7887245. Retrieved 17 January 2021. Leslie, Grace; Zamborlin, Bruno; Schnell, Norbert; Jodlowski, Pierre (15 June 2010). "A Collaborative, Interactive Sound Installation". Proceedings of the International Computer Music Conference. Retrieved 17 January 2021. Kimura, Mari; Rasamimanana, Nicolas; Bevilacqua, Frédéric; Zamborlin, Bruno; Schnell, Bruno; Flety, Emmanuel (2012). "Extracting Human Expression For Interactive Composition with the Augmented Violin". International Conference on New Interfaces for Musical Expression. Retrieved 17 January 2021. Ferretti, Stefano; Roccetti, Marco; Zamborlin, Bruno (13 January 2009). "On SPAWC: Discussion on a Musical Signal Parser and Well-Formed Composer". 2009 6th IEEE Consumer Communications and Networking Conference. pp. 1–5. doi:10.1109/CCNC.2009.4784966. ISBN 978-1-4244-2308-8. S2CID 14213587. Zamborlin, Bruno; Partesana, Giorgio; Liuni, Marco (15 May 2011). "(LAND)MOVES". Conference on New Interfaces for Musical Expression, NIME: 537–538. Retrieved 17 January 2021.

    Read more →
  • Statistical shape analysis

    Statistical shape analysis

    Statistical shape analysis is an analysis of the geometrical properties of some given set of shapes by statistical methods. For instance, it could be used to quantify differences between male and female gorilla skull shapes, normal and pathological bone shapes, leaf outlines with and without herbivory by insects, etc. Important aspects of shape analysis are to obtain a measure of distance between shapes, to estimate mean shapes from (possibly random) samples, to estimate shape variability within samples, to perform clustering and to test for differences between shapes. One of the main methods used is principal component analysis (PCA). Statistical shape analysis has applications in various fields, including medical imaging, computer vision, computational anatomy, sensor measurement, and geographical profiling. == Landmark-based techniques == In the point distribution model, a shape is determined by a finite set of coordinate points, known as landmark points. These landmark points often correspond to important identifiable features such as the corners of the eyes. Once the points are collected some form of registration is undertaken. This can be a baseline methods used by Fred Bookstein for geometric morphometrics in anthropology. Or an approach like Procrustes analysis which finds an average shape. David George Kendall investigated the statistical distribution of the shape of triangles, and represented each triangle by a point on a sphere. He used this distribution on the sphere to investigate ley lines and whether three stones were more likely to be co-linear than might be expected. Statistical distribution like the Kent distribution can be used to analyse the distribution of such spaces. Alternatively, shapes can be represented by curves or surfaces representing their contours, by the spatial region they occupy. == Shape deformations == Differences between shapes can be quantified by investigating deformations transforming one shape into another. In particular a diffeomorphism preserves smoothness in the deformation. This was pioneered in D'Arcy Thompson's On Growth and Form before the advent of computers. Deformations can be interpreted as resulting from a force applied to the shape. Mathematically, a deformation is defined as a mapping from a shape x to a shape y by a transformation function Φ {\displaystyle \Phi } , i.e., y = Φ ( x ) {\displaystyle y=\Phi (x)} . Given a notion of size of deformations, the distance between two shapes can be defined as the size of the smallest deformation between these shapes. Diffeomorphometry is the focus on comparison of shapes and forms with a metric structure based on diffeomorphisms, and is central to the field of Computational anatomy. Diffeomorphic registration, introduced in the 90's, is now an important player with existing codes bases organized around ANTS, DARTEL, DEMONS, LDDMM, StationaryLDDMM, and FastLDDMM are examples of actively used computational codes for constructing correspondences between coordinate systems based on sparse features and dense images. Voxel-based morphometry (VBM) is an important technology built on many of these principles. Methods based on diffeomorphic flows are also used. For example, deformations could be diffeomorphisms of the ambient space, resulting in the LDDMM (Large Deformation Diffeomorphic Metric Mapping) framework for shape comparison.

    Read more →
  • METAL MT

    METAL MT

    A machine translation system developed at the University of Texas and at Siemens which ran on Lisp Machines. == Background == Originally titled the Linguistics Research System (LRS), it was later renamed METAL (Mechanical Translation and Analysis of Languages). It started life as a German-English system funded by the USAF. == 1980 == A copy of the Weidner Multi-Lingual Word Processing software was requested by the German Government for the Siemens Corporation of Germany in September 1980 and was nicknamed the Siemens-Weidner Engine (originally English-German). This revolutionary multilingual word processing engine became foundational in the development of the Metal MT project, according to John White of the Siemens Corporation. After the Metal MT, development Rights to the Siemens-Weidner Engine were sold to a Belgium company, Lernout & Hauspie. The Siemens copy of the Weidner Multilingual Word Processing software has since been acquired through the purchase of assets of Lernout & Hauspie by Bowne Global Solutions, Inc., which was later acquired by Lionbridge Technologies, Inc. and is demonstrated in their itranslator software.

    Read more →
  • Wasserstein GAN

    Wasserstein GAN

    The Wasserstein Generative Adversarial Network (WGAN) is a variant of generative adversarial network (GAN) proposed in 2017 that aims to "improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches". Compared with the original GAN discriminator, the Wasserstein GAN discriminator provides a better learning signal to the generator. This allows the training to be more stable when generator is learning distributions in very high dimensional spaces. == Motivation == === The GAN game === The original GAN method is based on the GAN game, a zero-sum game with 2 players: generator and discriminator. The game is defined over a probability space ( Ω , B , μ r e f ) {\displaystyle (\Omega ,{\mathcal {B}},\mu _{ref})} , The generator's strategy set is the set of all probability measures μ G {\displaystyle \mu _{G}} on ( Ω , B ) {\displaystyle (\Omega ,{\mathcal {B}})} , and the discriminator's strategy set is the set of measurable functions D : Ω → [ 0 , 1 ] {\displaystyle D:\Omega \to [0,1]} . The objective of the game is L ( μ G , D ) := E x ∼ μ r e f [ ln ⁡ D ( x ) ] + E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] . {\displaystyle L(\mu _{G},D):=\mathbb {E} _{x\sim \mu _{ref}}[\ln D(x)]+\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))].} The generator aims to minimize it, and the discriminator aims to maximize it. A basic theorem of the GAN game states that Repeat the GAN game many times, each time with the generator moving first, and the discriminator moving second. Each time the generator μ G {\displaystyle \mu _{G}} changes, the discriminator must adapt by approaching the ideal D ∗ ( x ) = d μ r e f d ( μ r e f + μ G ) . {\displaystyle D^{}(x)={\frac {d\mu _{ref}}{d(\mu _{ref}+\mu _{G})}}.} Since we are really interested in μ r e f {\displaystyle \mu _{ref}} , the discriminator function D {\displaystyle D} is by itself rather uninteresting. It merely keeps track of the likelihood ratio between the generator distribution and the reference distribution. At equilibrium, the discriminator is just outputting 1 2 {\displaystyle {\frac {1}{2}}} constantly, having given up trying to perceive any difference. Concretely, in the GAN game, let us fix a generator μ G {\displaystyle \mu _{G}} , and improve the discriminator step-by-step, with μ D , t {\displaystyle \mu _{D,t}} being the discriminator at step t {\displaystyle t} . Then we (ideally) have L ( μ G , μ D , 1 ) ≤ L ( μ G , μ D , 2 ) ≤ ⋯ ≤ max μ D L ( μ G , μ D ) = 2 D J S ( μ r e f ‖ μ G ) − 2 ln ⁡ 2 , {\displaystyle L(\mu _{G},\mu _{D,1})\leq L(\mu _{G},\mu _{D,2})\leq \cdots \leq \max _{\mu _{D}}L(\mu _{G},\mu _{D})=2D_{JS}(\mu _{ref}\|\mu _{G})-2\ln 2,} so we see that the discriminator is actually lower-bounding D J S ( μ r e f ‖ μ G ) {\displaystyle D_{JS}(\mu _{ref}\|\mu _{G})} . === Wasserstein distance === Thus, we see that the point of the discriminator is mainly as a critic to provide feedback for the generator, about "how far it is from perfection", where "far" is defined as Jensen–Shannon divergence. Naturally, this brings the possibility of using a different criteria of farness. There are many possible divergences to choose from, such as the f-divergence family, which would give the f-GAN. The Wasserstein GAN is obtained by using the Wasserstein metric, which satisfies a "dual representation theorem" that renders it highly efficient to compute: A proof can be found in the main page on Wasserstein metric. == Definition == By the Kantorovich-Rubenstein duality, the definition of Wasserstein GAN is clear:A Wasserstein GAN game is defined by a probability space ( Ω , B , μ r e f ) {\displaystyle (\Omega ,{\mathcal {B}},\mu _{ref})} , where Ω {\displaystyle \Omega } is a metric space, and a constant K > 0 {\displaystyle K>0} . There are 2 players: generator and discriminator (also called "critic"). The generator's strategy set is the set of all probability measures μ G {\displaystyle \mu _{G}} on ( Ω , B ) {\displaystyle (\Omega ,{\mathcal {B}})} . The discriminator's strategy set is the set of measurable functions of type D : Ω → R {\displaystyle D:\Omega \to \mathbb {R} } with bounded Lipschitz-norm: ‖ D ‖ L ≤ K {\displaystyle \|D\|_{L}\leq K} . The Wasserstein GAN game is a zero-sum game, with objective function L W G A N ( μ G , D ) := E x ∼ μ G [ D ( x ) ] − E x ∼ μ r e f [ D ( x ) ] . {\displaystyle L_{WGAN}(\mu _{G},D):=\mathbb {E} _{x\sim \mu _{G}}[D(x)]-\mathbb {E} _{x\sim \mu _{ref}}[D(x)].} The generator goes first, and the discriminator goes second. The generator aims to minimize the objective, and the discriminator aims to maximize the objective: min μ G max D L W G A N ( μ G , D ) . {\displaystyle \min _{\mu _{G}}\max _{D}L_{WGAN}(\mu _{G},D).} By the Kantorovich-Rubenstein duality, for any generator strategy μ G {\displaystyle \mu _{G}} , the optimal reply by the discriminator is D ∗ {\displaystyle D^{}} , such that L W G A N ( μ G , D ∗ ) = K ⋅ W 1 ( μ G , μ r e f ) . {\displaystyle L_{WGAN}(\mu _{G},D^{})=K\cdot W_{1}(\mu _{G},\mu _{ref}).} Consequently, if the discriminator is good, the generator would be constantly pushed to minimize W 1 ( μ G , μ r e f ) {\displaystyle W_{1}(\mu _{G},\mu _{ref})} , and the optimal strategy for the generator is just μ G = μ r e f {\displaystyle \mu _{G}=\mu _{ref}} , as it should. == Comparison with GAN == In the Wasserstein GAN game, the discriminator provides a better gradient than in the GAN game. Consider for example a game on the real line where both μ G {\displaystyle \mu _{G}} and μ r e f {\displaystyle \mu _{ref}} are Gaussian. Then the optimal Wasserstein critic D W G A N {\displaystyle D_{WGAN}} and the optimal GAN discriminator D {\displaystyle D} are plotted as below: For fixed discriminator, the generator needs to minimize the following objectives: For GAN, E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] {\displaystyle \mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))]} . For Wasserstein GAN, E x ∼ μ G [ D W G A N ( x ) ] {\displaystyle \mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)]} . Let μ G {\displaystyle \mu _{G}} be parametrized by θ {\displaystyle \theta } , then we can perform stochastic gradient descent by using two unbiased estimators of the gradient: ∇ θ E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ] = E x ∼ μ G [ ln ⁡ ( 1 − D ( x ) ) ⋅ ∇ θ ln ⁡ ρ μ G ( x ) ] {\displaystyle \nabla _{\theta }\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))]=\mathbb {E} _{x\sim \mu _{G}}[\ln(1-D(x))\cdot \nabla _{\theta }\ln \rho _{\mu _{G}}(x)]} ∇ θ E x ∼ μ G [ D W G A N ( x ) ] = E x ∼ μ G [ D W G A N ( x ) ⋅ ∇ θ ln ⁡ ρ μ G ( x ) ] {\displaystyle \nabla _{\theta }\mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)]=\mathbb {E} _{x\sim \mu _{G}}[D_{WGAN}(x)\cdot \nabla _{\theta }\ln \rho _{\mu _{G}}(x)]} where we used the reparameterization trick. As shown, the generator in GAN is motivated to let its μ G {\displaystyle \mu _{G}} "slide down the peak" of ln ⁡ ( 1 − D ( x ) ) {\displaystyle \ln(1-D(x))} . Similarly for the generator in Wasserstein GAN. For Wasserstein GAN, D W G A N {\displaystyle D_{WGAN}} has gradient 1 almost everywhere, while for GAN, ln ⁡ ( 1 − D ) {\displaystyle \ln(1-D)} has flat gradient in the middle, and steep gradient elsewhere. As a result, the variance for the estimator in GAN is usually much larger than that in Wasserstein GAN. See also Figure 3 of. The problem with D J S {\displaystyle D_{JS}} is much more severe in actual machine learning situations. Consider training a GAN to generate ImageNet, a collection of photos of size 256-by-256. The space of all such photos is R 256 2 {\displaystyle \mathbb {R} ^{256^{2}}} , and the distribution of ImageNet pictures, μ r e f {\displaystyle \mu _{ref}} , concentrates on a manifold of much lower dimension in it. Consequently, any generator strategy μ G {\displaystyle \mu _{G}} would almost surely be entirely disjoint from μ r e f {\displaystyle \mu _{ref}} , making D J S ( μ G ‖ μ r e f ) = + ∞ {\displaystyle D_{JS}(\mu _{G}\|\mu _{ref})=+\infty } . Thus, a good discriminator can almost perfectly distinguish μ r e f {\displaystyle \mu _{ref}} from μ G {\displaystyle \mu _{G}} , as well as any μ G ′ {\displaystyle \mu _{G}'} close to μ G {\displaystyle \mu _{G}} . Thus, the gradient ∇ μ G L ( μ G , D ) ≈ 0 {\displaystyle \nabla _{\mu _{G}}L(\mu _{G},D)\approx 0} , creating no learning signal for the generator. Detailed theorems can be found in. == Training Wasserstein GANs == Training the generator in Wasserstein GAN is just gradient descent, the same as in GAN (or most deep learning methods), but training the discriminator is different, as the discriminator is now restricted to have bounded Lipschitz norm. There are several methods for this. === Upper-bounding the Lipschitz norm === Let the discriminator function D {\displaystyle D} to be implemented by a multilayer perceptron: D = D n ∘ D n − 1 ∘ ⋯ ∘ D 1 {\displaystyle D=D_{n}\circ D_{n-1}\circ \cdots \circ D_{1}} where D i ( x ) = h ( W i x ) {\displaystyle D_{i}(x)=h(W_

    Read more →
  • Ginger Software

    Ginger Software

    Ginger Software is an American and Israeli start-up specialized in natural language processing and AI. The main products are tools aiming to improve written communications, develop English speaking skills and boost productivity. The company was founded in 2008 by Yael Karov and Avner Zangvil. Ginger Software uses the context of complete sentences to suggest corrections. In December 2011, Ginger Software was one of nine projects approved by the Board of Governors of the Israel-U.S. Binational Industrial Research and Development Foundation for a funding of $8.1 million. The company also raised $3 million from private Israeli and US investors in 2009. In May, 2014 Intel acquired one of Ginger's business units and the rights to use the company's patented technology. == Founders == Before founding Ginger Software, Yael Karov had worked with Rosetta Genomics as its Chief Technology Officer and Vice President of Research and Development from 2003 to 2006, and with ClickSoftware Technologies as a Director of Research and Development from 1990 to 1994. Karov also founded Agentics, a company specializing in free-text classification of e-commerce product information based on natural language processing, in 1996. Avner Zangvil is the co-founder of Ginger Software. Zangvil co-founded Menta Software in 1996 with his brother Arnon Zangvil to develop a product that transforms any Windows-based application into a Web-enabled application usable from any remote computer running a Web browser. Menta was acquired by GraphOn Corporation in 2001. == Technology == Ginger Software uses patented software algorithms in the field of natural language processing. The company claims that the algorithm allows it to correct the written sentences with relatively high accuracy (eliminating up to 95 percent of writing errors), compared to standard spell checkers. Its unique algorithm allows the software to understand the context of the sentence rather than correcting based solely on a word. According to its founder, Karov, the software operates on the logic of sentence context in addition to the memory of a database of words. The company is at the heart of a growing revolution in the world of assistive technology. Ginger claims that the benefits of the software have been leveraged by native English and non-native speakers alike, and have also found value in niche markets like dyslexia management. They further claim that ESL users derive great benefit from the use of the software, as it lets them write error-free English text. Its use also extends to native English speaking business professionals and students who use it as a 'safety net' for their email edits, as well as international students writing in English. More recently, the company has focused on implementing its technology in mobile devices as an integral component of its mobile keyboard products. == Products == Ginger Software products include Ginger Page, a cross-platform writing enhancement app, and Ginger Keyboard which is available for Android devices. Ginger Writer can be used as an online service or installed on your PC or Mac. It supports MS-Word, MS-Outlook, MS-PowerPoint, Microsoft Edge, Chrome, and functions as a writing enhancement app for Android and iOS mobile devices. Its main feature is English grammar and spelling checker that runs seamlessly with the different user interfaces. It also has an advanced paraphrasing tool, contextual synonyms and definitions, translation and a text-to-speech function that enables users to hear sentences before and after correction. Ginger Keyboard for Android replaces the stock keyboard and functions as a productivity boosting keyboard app. Featuring a full set of advanced keyboard features like Stream (swipe-like) typing, adaptive word prediction, a wide variety of customizable themes and emoji, Ginger Keyboard is the only 3rd party keyboard to offer proofreading and other writing tools via one tap access to Ginger Page. == Target segment == Ginger Software started off targeting people with dyslexia. The algorithm underlying the software studies a vast pool of proper sentences in English and builds a model of proper language. The software does not analyze the text at the level of the word, but of the whole sentence. Dyslexics can have trouble choosing the right word – hence the attention to the sentence as a whole. From 2010, Ginger Software included a new target segment in its marketing outreach – users of English as a second language (ESL). Its contextual-based writing correction tool could benefit those who are not proficient in the English language. == Business model == The main business model for consumers is freemium. The free version offers contextual-based grammar and spelling checker with some limitations. Its premium features include unlimited access to Grammar Checker, the grammar and spelling checker, and Sentence Rephraser the rephrasing tool. Ginger Keyboard is free to download and use, although it does offer in-app purchases like themes and theme packs. It also disables your original spell checker. Ginger also provides a powerful Rest API which can correct full documents in one call.

    Read more →
  • Free boundary condition

    Free boundary condition

    In image processing, the free boundary condition is the convention used when applying a convolution kernel to a digital image in which pixel locations that lie outside the image boundaries are interpreted as having a value of zero.[1] The question of what value to assign out-of-bounds pixels may arise, for instance, when applying a 3×3 kernel to the corner pixel in an image.

    Read more →
  • Machine translation in China

    Machine translation in China

    Machine translation in China is the history of machine translation systems developed in China. China became the fourth country that began machine translation (MT) research following USA, UK, and the Soviet Union. In 1957, the Language Institute of Chinese Academy of Sciences took the initiative in Russian-Chinese MT research program and set up an MT research group. From then on the research activities were directed and applied for academic purposes in Universities. The turning point of MT systems launching initiatives in market began from 1990s. MT systems went into blossom into the market. Among these systems, there were commercialized MT systems. To be more specific, Transtar was the first commercialized MT system and has been constantly upgraded. What's more, IMC/EC MT system which was developed by Computer Institute of Chinese Academy of Sciences has further made great advancement. Meanwhile, the practical MT system MT-IT-EC specific to communication domain was also striking to notice, for it has greatly improved the efficiency and productivity in the issue of publications. Government funding is a critical component and support in the development of market-oriented machine translation in China. It is evident to see that since Chinese opened up to the outside world and joined the WTO, the vigorous import and export trade generate opportunities for machine translation to transfer technical terms of products into the readable target information. Facing the increasing demand of sophisticated state-of -the -art translation technology, the academic area including research institute and universities are even launching bachelors’ and master's programs regarding machine translation. Thus, strong evidence illustrates the promising field of machine translation in the future market of China.

    Read more →
  • Is an AI Paragraph Rewriter Worth It in 2026?

    Is an AI Paragraph Rewriter Worth It in 2026?

    In search of the best AI paragraph rewriter? An AI paragraph rewriter is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI paragraph rewriter slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →