AI Detector Zero

AI Detector Zero — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis

    The 2028 Global Intelligence Crisis is a report authored by James van Geelen and Alap Shah and published by Citrini Research in February 2026, on the impact of artificial intelligence on humanity's future. Written in the form of a scenario analysis, it was viewed millions of times online and reportedly caused a fall in the stock market prices of major tech and financial firms. It also received criticism among others, for its allegedly flawed economic logic. The 'thought exercise', as the authors called it, painted a gloomy picture for the near future, where outputs keep growing while consumer's ability to spend collapses. "...driven by ai agents that don’t sleep, take sick days or require health insurance”, "outputs that are shown in national accounts increases, "but never circulates through the real economy"(which the report calls 'Ghost GDP'), the authors argued. In other words, the authors predict a scenario where the owners of the AI firms will accumulate a vast fortune but there will be scant demand from consumers as AI would cause massive unemployment. The authors caution the reader that what they make is a scenario and not a prediction. In the scenario they visualise, any service whose value proposition is “I will navigate complexity that you find tedious” is getting disrupted. The reports argues that the unique ability of human beings to analyse, decide, create, persuade, and coordinate was “the thing that could not be replicated at scale,” and call the historical scarcity of this precious entity 'friction'. When this friction becomes zero, a gamut of changes occur which then triggers a cascading of changes across the economy. ”Travel booking platforms are an early casualty; Financial advice. tax prep., and routine legal work follow suit. National unemployment rate go as high 10.2% and the S&P 500 goes for a massive 38% peak-to-trough crash. In contrast to the previous technological revolutions the high-earning professionals suffers more and get forced to take up roles in the gig economy. Labour supply becomes abundant and this cuts wages all across the economy. The dent in income for the employees then affects other sectors of the economy such as the residential mortgage market. The losses for the software companies triggers loan defaults and heralds peril for the private credit sector.

    Read more →
  • OCR-A

    OCR-A

    OCR-A is a font issued in 1966 and first implemented in 1968. A special font was needed in the early days of computer optical character recognition, when there was a need for a font that could be recognized not only by the computers of that day, but also by humans. OCR-A uses simple, thick strokes to form recognizable characters. The font is monospaced (fixed-width), with the printer required to place glyphs 0.254 cm (0.10 inch) apart, and the reader required to accept any spacing between 0.2286 cm (0.09 inch) and 0.4572 cm (0.18 inch). == Standardization == The OCR-A font was standardized by the American National Standards Institute (ANSI) as ANSI X3.17-1981. X3.4 has since become the INCITS and the OCR-A standard is now called ISO 1073-1:1976. == Implementations == In 1968, American Type Founders produced OCR-A, one of the first optical character recognition typefaces to meet the criteria set by the U.S. Bureau of Standards. The design is simple so that it can be easily read by a machine, but it is more difficult for the human eye to read. As metal type gave way to computer-based typesetting, Tor Lillqvist used Metafont to describe the OCR-A font. That definition was subsequently improved by Richard B. Wales. Their work is available from CTAN. To make the free version of the font more accessible to users of Microsoft Windows, John Sauter converted the Metafont definitions to TrueType using potrace and FontForge in 2004. In 2007, Gürkan Sengün created a Debian package from this implementation. In 2008. Luc Devroye corrected the vertical positioning in John Sauter's implementation, and fixed the name of lower case z. Independently, Matthew Skala used mftrace to convert the Metafont definitions to TrueType format in 2006. In 2011 he released a new version created by rewriting the Metafont definitions to work with METATYPE1, generating outlines directly without an intermediate tracing step. On September 27, 2012, he updated his implementation to version 0.2. In addition to these free implementations of OCR-A, there are also implementations sold by several vendors. As a joke, Tobias Frere-Jones in 1995 created Estupido-Espezial, a redesign with swashes and a long s. It was used in a "technology"-themed section of Rolling Stone. Maxitype designed the OCR-X typeface—based on the OCR-A typeface with OpenType features, alien/technology-themed dingbats and available in six weights (Thin, Light, Regular, Medium, Bold, Black). Japanese typeface foundry Visual Design Laboratory (VDL) designed two typefaces based on the OCR-A typeface: one for Simplified Chinese characters named Jieyouti and one for Japanese characters named Yota G (ヨタG) , both available in five weights (Light, Regular, Medium, Semi Bold, Bold). == Use == Although optical character recognition technology has advanced to the point where such simple fonts are no longer necessary, the OCR-A font has remained in use. Its usage remains widespread in the encoding of checks around the world. Some lock box companies still insist that the account number and amount owed on a bill return form be printed in OCR-A. Also, because of its unusual look, it is sometimes used in advertising and display graphics. Notably, it is used for the subtitles in films and television series such as Blacklist and for the main titles in The Pretender. Additionally, OCR-A is used in the titles and subtitles for the films 13 Hours: The Secret Soldiers of Benghazi and Hoppers (film). It was also used for the logo, branding, and marketing material of the children's toy line Hexbug. == Code points == A font is a set of character shapes, or glyphs. For a computer to use a font, each glyph must be assigned a code point in a character set. When OCR-A was being standardized the usual character coding was the American Standard Code for Information Interchange or ASCII. Not all of the glyphs of OCR-A fit into ASCII, and for five of the characters there were alternate glyphs, which might have suggested the need for a second font. However, for convenience and efficiency all of the glyphs were expected to be accessible in a single font using ASCII coding, with the additional characters placed at coding points that would otherwise have been unused. The modern descendant of ASCII is Unicode, also known as ISO 10646. Unicode contains ASCII and has special provisions for OCR characters, so some implementations of OCR-A have looked to Unicode for guidance on character code assignments. === Pre-Unicode standard representation === The ISO standard ISO 2033:1983, and the corresponding Japanese Industrial Standard JIS X 9010:1984 (originally JIS C 6229–1984), define character encodings for OCR-A, OCR-B and E-13B. For OCR-A, they define a modified 7-bit ASCII set (also known by its ISO-IR number ISO-IR-91) including only uppercase letters, digits, a subset of the punctuation and symbols, and some additional symbols. Codes which are redefined relative to ASCII, as opposed to simply omitted, are listed below: Additionally, the long vertical mark () is encoded at 0x7C, corresponding to the ASCII vertical bar (|). === Dedicated OCR-A characters in Unicode === The following characters have been defined for control purposes and are now in the "Optical Character Recognition" Unicode range 2440–245F: === Space, digits, and unaccented letters === All implementations of OCR-A use U+0020 for space, U+0030 through U+0039 for the decimal digits, U+0041 through U+005A for the unaccented upper case letters, and U+0061 through U+007A for the unaccented lower case letters. === Regular characters === In addition to the digits and unaccented letters, many of the characters of OCR-A have obvious code points in ASCII. Of those that do not, most, including all of OCR-A's accented letters, have obvious code points in Unicode. === Remaining characters === Linotype coded the remaining characters of OCR-A as follows: === Additional characters === The fonts that descend from the work of Tor Lillqvist and Richard B. Wales define four characters not in OCR-A to fill out the ASCII character set. These shapes use the same style as the OCR-A character shapes. They are: Linotype also defines additional characters. === Exceptions === Some implementations do not use the above code point assignments for some characters. ==== PrecisionID ==== The PrecisionID implementation of OCR-A has the following non-standard code points: OCR Hook at U+007E OCR Chair at U+00C1 OCR Fork at U+00C2 Euro Sign at U+0080 ==== Barcodesoft ==== The Barcodesoft implementation of OCR-A has the following non-standard code points: OCR Hook at U+0060 OCR Chair at U+007E OCR Fork at U+005F Long Vertical Mark at U+007C (agrees with Linotype) Character Erase at U+0008 ==== Morovia ==== The Morovia implementation of OCR-A has the following non-standard code points: OCR Hook at U+007E (agrees with PrecisionID) OCR Chair at U+00F0 OCR Fork at U+005F (agrees with Barcodesoft) Long Vertical Mark at U+007C (agrees with Linotype) ==== IDAutomation ==== The IDAutomation implementation of OCR-A has the following non-standard code points: OCR Hook at U+007E (agrees with PrecisionID) OCR Chair at U+00C1 (agrees with PrecisionID) OCR Fork at U+00C2 (agrees with PrecisionID) OCR Belt Buckle at U+00C3 == Sellers of font standards == Hardcopy of ISO 1073-1:1976, distributed through ANSI, from Amazon.com ISO 1073-1 is also available from Techstreet, who distributes standards for ANSI and ISO

    Read more →
  • Stochastic grammar

    Stochastic grammar

    A stochastic grammar (statistical grammar) is a grammar framework with a probabilistic notion of grammaticality: Stochastic context-free grammar Statistical parsing Data-oriented parsing Hidden Markov model (or stochastic regular grammar) Estimation theory The grammar is realized as a language model. Allowed sentences are stored in a database together with the frequency how common a sentence is. Statistical natural language processing uses stochastic, probabilistic and statistical methods, especially to resolve difficulties that arise because longer sentences are highly ambiguous when processed with realistic grammars, yielding thousands or millions of possible analyses. Methods for disambiguation often involve the use of corpora and Markov models. "A probabilistic model consists of a non-probabilistic model plus some numerical quantities; it is not true that probabilistic models are inherently simpler or less structural than non-probabilistic models." == Examples == A probabilistic method for rhyme detection is implemented by Hirjee & Brown in their study in 2013 to find internal and imperfect rhyme pairs in rap lyrics. The concept is adapted from a sequence alignment technique using BLOSUM (BLOcks SUbstitution Matrix). They were able to detect rhymes undetectable by non-probabilistic models.

    Read more →
  • Is an AI Voice Assistant Worth It in 2026?

    Is an AI Voice Assistant Worth It in 2026?

    Trying to pick the best AI voice assistant? An AI voice assistant is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI voice assistant slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Richardson–Lucy deconvolution

    Richardson–Lucy deconvolution

    The Richardson–Lucy algorithm, also known as Lucy–Richardson deconvolution, is an iterative procedure for recovering an underlying image that has been blurred by a known point spread function. It was named after William Richardson and Leon B. Lucy, who described it independently. == Description == When an image is produced using an optical system and detected using photographic film, a charge-coupled device or a CMOS sensor, for example, it is inevitably blurred, with an ideal point source not appearing as a point but being spread out into what is known as the point spread function. Extended sources can be decomposed into the sum of many individual point sources, thus the observed image can be represented in terms of a transition matrix p operating on an underlying image: d i = ∑ j p i , j u j , {\displaystyle d_{i}=\sum _{j}p_{i,j}u_{j},} where u j {\displaystyle u_{j}} is the intensity of the underlying image at pixel j {\displaystyle j} , and d i {\displaystyle d_{i}} is the detected intensity at pixel i {\displaystyle i} . In general, a matrix whose elements are p i , j {\displaystyle p_{i,j}} describes the portion of light from source pixel j that is detected in pixel i. In most good optical systems (or in general, linear systems that are described as shift-invariant) the transfer function p can be expressed simply in terms of the spatial offset between the source pixel j and the observation pixel i: p i , j = P ( i − j ) , {\displaystyle p_{i,j}=P(i-j),} where P ( Δ i ) {\displaystyle P(\Delta i)} is called a point spread function. In that case the above equation becomes a convolution. This has been written for one spatial dimension, but most imaging systems are two-dimensional, with the source, detected image, and point spread function all having two indices. So a two-dimensional detected image is a convolution of the underlying image with a two-dimensional point spread function P ( Δ x , Δ y ) {\displaystyle P(\Delta x,\Delta y)} plus added detection noise. In order to estimate u j {\displaystyle u_{j}} given the observed d i {\displaystyle d_{i}} and a known P ( Δ i x , Δ j y ) {\displaystyle P(\Delta i_{x},\Delta j_{y})} , the following iterative procedure is employed in which the estimate of u j {\displaystyle u_{j}} (called u ^ j ( t ) {\displaystyle {\hat {u}}_{j}^{(t)}} ) for iteration number t is updated as follows: u ^ j ( t + 1 ) = u ^ j ( t ) ∑ i d i c i p i j , {\displaystyle {\hat {u}}_{j}^{(t+1)}={\hat {u}}_{j}^{(t)}\sum _{i}{\frac {d_{i}}{c_{i}}}p_{ij},} where c i = ∑ j p i j u ^ j ( t ) , {\displaystyle c_{i}=\sum _{j}p_{ij}{\hat {u}}_{j}^{(t)},} and ∑ j p i j = 1 {\displaystyle \sum _{j}p_{ij}=1} is assumed. It has been shown empirically that if this iteration converges, it converges to the maximum likelihood solution for u j {\displaystyle u_{j}} . Writing this more generally for two (or more) dimensions in terms of convolution with a point spread function P: u ^ ( t + 1 ) = u ^ ( t ) ⋅ ( d u ^ ( t ) ⊗ P ⊗ P ∗ ) , {\displaystyle {\hat {u}}^{(t+1)}={\hat {u}}^{(t)}\cdot \left({\frac {d}{{\hat {u}}^{(t)}\otimes P}}\otimes P^{}\right),} where the division and multiplication are element-wise, ⊗ {\displaystyle \otimes } indicates a 2D convolution, and P ∗ {\displaystyle P^{}} is the mirrored point spread function, or the inverse Fourier transform of the Hermitian transpose of the optical transfer function. In problems where the point spread function p i j {\displaystyle p_{ij}} is not known a priori, a modification of the Richardson–Lucy algorithm has been proposed, in order to accomplish blind deconvolution. == Derivation == In the context of fluorescence microscopy, the probability of measuring a set of number of photons (or digitalization counts proportional to detected light) m = [ m 0 , … , m K ] {\displaystyle \mathbf {m} =[m_{0},\dots ,m_{K}]} for expected values E = [ E 0 , … , E K ] {\displaystyle \mathbf {E} =[E_{0},\dots ,E_{K}]} for a detector with K + 1 {\displaystyle K+1} pixels is given by P ( m ∣ E ) = ∏ i K Poisson ⁡ ( E i ) = ∏ i K E i m i e − E i m i ! . {\displaystyle P(\mathbf {m} \mid \mathbf {E} )=\prod _{i}^{K}\operatorname {Poisson} (E_{i})=\prod _{i}^{K}{\frac {E_{i}^{m_{i}}e^{-E_{i}}}{m_{i}!}}.} Since in the context of maximum-likelihood estimation the aim is to locate the maximum of the likelihood function without concern for its absolute value, it is convenient to work with ln ⁡ ( P ) {\displaystyle \ln(P)} : ln ⁡ P ( m ∣ E ) = ∑ i K [ ( m i ln ⁡ E i − E i ) − ln ⁡ ( m i ! ) ] . {\displaystyle \ln P(\mathbf {m} \mid \mathbf {E} )=\sum _{i}^{K}[(m_{i}\ln E_{i}-E_{i})-\ln(m_{i}!)].} Moreover, since ln ⁡ ( m i ! ) {\displaystyle \ln(m_{i}!)} is a constant, it does not give any additional information regarding the position of the maximum, so consider α ( m ∣ E ) = ∑ i K [ m i ln ⁡ E i − E i ] , {\displaystyle \alpha (\mathbf {m} \mid \mathbf {E} )=\sum _{i}^{K}[m_{i}\ln E_{i}-E_{i}],} where α {\displaystyle \alpha } is something that shares the same maximum position as P ( m ∣ E ) {\displaystyle P(\mathbf {m} \mid \mathbf {E} )} . Now consider that E {\displaystyle \mathbf {E} } comes from a ground truth x {\displaystyle \mathbf {x} } and a measurement H {\displaystyle \mathbf {H} } which is assumed to be linear. Then E = H x , {\displaystyle \mathbf {E} =\mathbf {H} \mathbf {x} ,} where a matrix multiplication is implied. This can also be written in the form E m = ∑ n K H m n x n , {\displaystyle E_{m}=\sum _{n}^{K}H_{mn}x_{n},} where it can be seen how H {\displaystyle H} mixes or blurs the ground truth. It can also be shown that the derivative of an element of E {\displaystyle \mathbf {E} } , ( E i ) {\displaystyle (E_{i})} with respect to some other element of x j {\displaystyle x_{j}} can be written as It is easy to see this by writing a matrix H {\displaystyle \mathbf {H} } of, say, 5 × 5 and two arrays E {\displaystyle \mathbf {E} } and x {\displaystyle \mathbf {x} } of 5 elements and check it. This last equation can be interpreted as how much one element of x {\displaystyle \mathbf {x} } , say element i {\displaystyle i} , influences the other elements j ≠ i {\displaystyle j\neq i} (and of course the case i = j {\displaystyle i=j} is also taken into account). For example, in a typical case an element of the ground truth x {\displaystyle \mathbf {x} } will influence nearby elements in E {\displaystyle \mathbf {E} } but not the very distant ones (a value of 0 {\displaystyle 0} is expected on those matrix elements). Now, the key and arbitrary step: x {\displaystyle \mathbf {x} } is not known but may be estimated by x ^ {\displaystyle {\hat {\mathbf {x} }}} . Let's call x ^ old {\displaystyle {\hat {\mathbf {x} }}_{\text{old}}} and x ^ new {\displaystyle {\hat {\mathbf {x} }}_{\text{new}}} the estimated ground truths while using the RL algorithm, where the hat symbol is used to distinguish ground truth from estimator of the ground truth where ∂ ∂ x {\displaystyle {\frac {\partial }{\partial \mathbf {x} }}} stands for a K {\displaystyle K} -dimensional gradient. Performing the partial derivative of α ( m ∣ E ( x ) ) {\displaystyle \alpha (\mathbf {m} \mid \mathbf {E} (\mathbf {x} ))} yields the following expression: ∂ α ( m ∣ E ( x ) ) ∂ x j = ∂ ∂ x j ∑ i K [ m i ln ⁡ E i − E i ] = ∑ i K [ m i E i ∂ ∂ x j E i − ∂ ∂ x j E i ] = ∑ i K ∂ E i ∂ x j [ m i E i − 1 ] . {\displaystyle {\frac {\partial \alpha (\mathbf {m} \mid \mathbf {E} (\mathbf {x} ))}{\partial x_{j}}}={\frac {\partial }{\partial x_{j}}}\sum _{i}^{K}[m_{i}\ln E_{i}-E_{i}]=\sum _{i}^{K}\left[{\frac {m_{i}}{E_{i}}}{\frac {\partial }{\partial x_{j}}}E_{i}-{\frac {\partial }{\partial x_{j}}}E_{i}\right]=\sum _{i}^{K}{\frac {\partial E_{i}}{\partial x_{j}}}\left[{\frac {m_{i}}{E_{i}}}-1\right].} By substituting (1), it follows that ∂ α ( m ∣ E ( x ) ) ∂ x j = ∑ i K H i j [ m i E i − 1 ] . {\displaystyle {\frac {\partial \alpha (\mathbf {m} \mid \mathbf {E} (\mathbf {x} ))}{\partial x_{j}}}=\sum _{i}^{K}H_{ij}\left[{\frac {m_{i}}{E_{i}}}-1\right].} Note that H j i T = H i j {\displaystyle H_{ji}^{T}=H_{ij}} by the definition of a matrix transpose. And hence Since this equation is true for all j {\displaystyle j} spanning all the elements from 1 {\displaystyle 1} to K {\displaystyle K} , these K {\displaystyle K} equations may be compactly rewritten as a single vectorial equation ∂ α ( m ∣ E ( x ) ) ∂ x = H T [ m E − 1 ] , {\displaystyle {\frac {\partial \alpha (\mathbf {m} \mid \mathbf {E} (\mathbf {x} ))}{\partial \mathbf {x} }}=\mathbf {H} ^{T}\left[{\frac {\mathbf {m} }{\mathbf {E} }}-\mathbf {1} \right],} where H T {\displaystyle \mathbf {H} ^{T}} is a matrix, and m {\displaystyle \mathbf {m} } , E {\displaystyle \mathbf {E} } and 1 {\displaystyle \mathbf {1} } are vectors. Now, as a seemingly arbitrary but key step, let where 1 {\displaystyle \mathbf {1} } is a vector of ones of size K {\displaystyle K} (same as m {\displaystyle \mathbf {m} } , E {\displaystyle \mathbf {E} } and x {\displaystyle \mathbf {x} } ), and the d

    Read more →
  • David Horn (Israeli physicist)

    David Horn (Israeli physicist)

    David Horn (Hebrew: דוד הורן; born 10 September 1937) is a Professor (Emeritus) of Physics in the School of Physics and Astronomy at Tel Aviv University (TAU), Israel. He has served as Vice-Rector of TAU, Chairman of the School of Physics and Astronomy and as Dean of the Faculty of Exact Sciences in TAU. He is a fellow of the American Physical Society, nominated for "contributions to theoretical particle physics, including the seminal work on finite energy sum rules, research of the phenomenology of hadronic processes, and investigation of Hamiltonian lattice theories". == Early life and education == David Horn was born and educated in Haifa. He graduated from the Reali School in 1955. He began his academic studies in Physics at the Technion in Haifa in 1957, and received his B.Sc. (Summa Cum Laude) in 1961, and M.Sc. in 1962. He continued his Ph.D. studies at the Hebrew University of Jerusalem until 1965. His thesis on "Some Aspects of the Structure of Weak Interactions" was supervised by Prof. Yuval Ne'eman. == Career == Horn joined the newly founded Tel Aviv University as an assistant in 1962. He became a lecturer in 1965, a senior lecturer in 1967 and an associate professor in 1968. He was promoted to full professor of Physics in 1972. In 1974 he became the incumbent of the Edouard and Francoise Jaupart Chair of Theoretical Physics of Particles and Fields, a position he held until 2007. Horn has supervised 43 graduate students at TAU and authored over 240 scientific publications. He retired as a professor emeritus in 2005, and continues to be an active researcher. Horn spent a significant part of his career holding visiting academic positions at other universities and research institutes, including: Postdoctoral Fellow at Argonne National Lab, ILL, Research Fellow and three times Visiting Associate at California Institute of Technology, Pasadena, CA, Visitor at CERN in Geneva, Visiting Professor at Cornell University, NY, Member of the Institute for Advanced Study, Princeton, NJ, Visiting Professor at SLAC in Stanford University, CA, and Visiting Professor at Kyoto University, Japan. Beginning from 1980, Horn held official positions at Tel Aviv University, starting with tenure as Vice-Rector (1980-1983), a position he left for research at SLAC. After returning he was nominated Chairman of the Department of High Energy Physics (1984-1986), followed by tenures as Chairman of the School of Physics and Astronomy (1986-9), Dean of the Raymond and Beverly Sackler Faculty of Exact Sciences (1990-1995), and first Director of the Adams Super Center for Brain Studies (1993-2000). Horn has also held national and international professional positions. He was Chairman of the Israel Commission for High Energy Physics (1983-2003), and, in this capacity, served as an Israeli observer of the council of CERN (1991-2003). He served as member of the Israel Council for Higher Education (1987-1991), member of the Executive Committee of the European Physical Society (1989-1992) and member of the European Strategy Forum on Research Infrastructures (2005-2017). He chaired the Israeli Committee of Research Infrastructures (2012-2016), issuing roadmaps for scientific RI in 2013 and 2016. == Research == Horn's research work focused on theory and phenomenology of High Energy Physics until 1990. He then shifted his interests to Neural Computation and Machine Learning and, since 2005, he has also published in Bioinformatics. Together with Richard Dolen and Christoph Schmid he discovered the Finite Energy Sum Rules in 1967. It was a realization of the bootstrap approach to hadronic structure, and became known as the Dolen-Horn-Schmid Duality. Together with Richard Silver he investigated a model of coherent production of pions at high energy hadron collisions in 1971, and together with Jeffrey Mandula he undertook the investigation of mesons with constituent gluons in 1978. Moving to lattice gauge theories in 1979, he discovered, together with Shimon Yankielowic and Marvin Weinstein, a non-confining phase in Z(N) theories for large N. In 1981 he demonstrated the existence of finite matrix models with link gauge fields, nowadays known as quantum link models. In 1984 Horn and Weinstein developed the t-expansion methodology. Horn's contributions to neural modeling include a novel mechanism for memory maintenance via neuronal regulation in 1998, developed with Nir Levy and Eytan Ruppin and unsupervised learning of natural languages in 2005, a joint work with Zach Solan, Eytan Ruppin and Shimon Edelman, introducing novel algorithms for motif and grammar extraction from text. Horn has contributed to algorithms of clustering, an important topic in Machine Learning, by developing Support Vector Clustering (SVC) in 2001, together with Asa Ben Hur, Hava Siegelmann and Vladimir Vapnik. This was followed shortly thereafter by a joint work with Assaf Gottlieb on Quantum Clustering (QC). His contributions to Bioinformatics include motif descriptions of function and structure of proteins, as well as motif studies of genomic structures. Together with Erez Persi he studied compositional order of proteomes, and repeat instability of genomes, as evolution markers of organisms and of cancer (a joint work with Persi and others). == Honors == Horn is a Fellow of the American Physical Society (1985) and a Fellow of the Israel Physical Society (2018). == Publications == === Selected articles === R. Dolen, D. Horn and C. Schmid; Prediction of Regge-parameters of rho poles from low-energy pi-N scattering data Phys. Rev. Lett. 19 (1967) 402–407. Finite-Energy Sum Rules and Their Application to pi-N Charge Exchange Phys. Rev. 166 (1968) 1768–1781. D. Horn and R. Silver: Coherent production of pions, Annals Phys. 66 (1971) 509-541 T. Banks, D. Horn and H. Neuberger: Bosonization of the SU(N) Thirring Models, Nucl. Phys. B108, 119 (1976). D. Horn and J. Mandula: Model of Mesons with Constituent Gluons, Phys. Rev. D17, 898 (1978). D. Horn, M. Weinstein and S. Yankielowicz: Hamiltonian Approach to Z(N) Lattice Gauge Theories, Phys. Rev. D19, 3715 (1979). D. Horn: Finite Matrix Models with Continuous Local Gauge Invariance, Phys. Lett. 100B, 149-151 (1981). T. Banks, Y. Dothan and D. Horn: Geometric Fermions, Phys. Lett. 117B, 413 (1982). D. Horn and M. Weinstein: The t expansion: A nonperturbative analytic tool for Hamiltonian systems. Phys. Rev. D 30, 1256-1270 (1984). Ury Naftaly, Nathan Intrator and David Horn: Optimal Ensemble Averaging of Neural Networks. Network, Computation in Neural Systems, 8, 283-296 (1997). David Horn, Nir Levy, Eytan Ruppin: Memory Maintenance via Neuronal Regulation, Neural Computation, 10, 1-18 (1998). Asa Ben-Hur, David Horn, Hava Siegelmann and Vladimir Vapnik: Support Vector Clustering. Journal of Machine Learning Research 2, 125-137 (2001). David Horn and Assaf Gottlieb: Algorithm for data clustering in pattern recognition problems based on quantum mechanics, Phys. Rev. Lett. 88 (2002) 18702 Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman: Unsupervised learning of natural languages, Proc. Natl. Acad. Sc. 102 (2005) 11629–11634. Vered Kunik, Yasmine Meroz, Zach Solan, Ben Sandbank, Uri Weingart, Eytan Ruppin and David Horn: Functional representation of enzymes by specific peptides. PLOS Computational Biology 2007, 3(8):e167. Benny Chor, David Horn, Yaron Levy, Nick Goldman and Tim Massingham: Genomic DNA k-mer spectra: models and modalities. Genome Biology 2009, 10(10):R108 Erez Persi and David Horn. Systematic Analysis of Compositional Order of Proteins Reveals New Characteristics of Biological Functions and a Universal Correlate of Macroevolution. PLoS Comput Biol 9 (2013): e1003346. David Horn. Taxa counting using Specific Peptides of Aminoacyl tRNA Synthetases Encyclopedia of Metagenomics, Springer, 2013. Sagi Shporer, Benny Chor, Saharon Rosset, David Horn. Inversion symmetry of DNA k-mer counts: validity and deviations. BMC Genomics 2016, 17:696 Erez Persi, Davide Prandi, Yuri I. Wolf, Yair Pozniak, Christopher Barbieri, Paola Gasperini, Himisha Beltran, Bishoy M. Faltas, Mark A. Rubin, Tamar Geiger, Eugene V. Koonin, Francesca Demichelis, David Horn. Proteomic and Genomic Signatures of Repeat Instability in Cancer and Adjacent Normal Tissues. PNAS 116, 34, 2019 - 08790 === Book === David Horn and Fredrick Zachariasen: Hadron Physics at Very High Energies. Benjamin 1973. === Patents === Method and Apparatus for Quantum Clustering. USA Patent No. 7,653,646 B2. Method for discovering relationships in data by dynamic quantum clustering USA Patent No 8874412 and USA Patent No. 9,646,074. == Personal life == Horn was married to Nira Fuss since 1963 until her death in 2019. He is a father of three, Yuval, Tamar, and Oded, and grandfather of nine. He lives in Tel Aviv, Israel.

    Read more →
  • Hapax legomenon

    Hapax legomenon

    In corpus linguistics, a hapax legomenon ( also or ; pl. hapax legomena; sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. The term is also sometimes used to describe a word that occurs in just one of an author's works but more than once in that particular work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "said once". The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively (, , ) refer to double, triple, or quadruple occurrences, but are far less commonly used. Hapax legomena are quite common, as predicted by Zipf's law, which states that the frequency of any word in a corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words are hapax legomena, and another 10% to 15% are dis legomena. Thus, in the Brown Corpus of American English, about half of the 50,000 distinct words are hapax legomena within that corpus. Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech. It thus differs from a nonce word, which may never be recorded, may find currency and may be widely recorded, or may appear several times in the work which coins it, and so on. == Significance == Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical (particularly Hebrew; see § Hebrew) hapax legomena sometimes pose problems in translation. Hapax legomena also pose challenges in natural language processing. Some scholars consider Hapax legomena useful in determining the authorship of written works. P. N. Harrison, in The Problem of the Pastoral Epistles (1921) made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual. Harrison's theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W. P. Workman found the following numbers of hapax legomena in each Pauline Epistle: At first glance, the last three totals (for the Pastoral Epistles) are not out of line with the others. To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text, which ranged from 3.6 to 13, as summarized in the diagram on the right. Although the Pastoral Epistles have more hapax legomena per page, Workman found the differences to be moderate in comparison to the variation among other Epistles. This was reinforced when Workman looked at several plays by Shakespeare, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarized in the second diagram on the right. Apart from author identity, there are several other factors that can explain the number of hapax legomena in a work: text length: this directly affects the expected number and percentage of hapax legomena; the brevity of the Pastoral Epistles also makes any statistical analysis problematic. text topic: if the author writes on different subjects, of course many subject-specific words will occur only in limited contexts. text audience: if the author is writing to a peer rather than a student, or their spouse rather than their employer, again quite different vocabulary will appear. time: over the course of years, both the language and an author's knowledge and use of language will change. In the particular case of the Pastoral Epistles, all of these variables are quite different from those in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as strong indicators of authorship; those who reject Pauline authorship of the Pastorals rely on other arguments. There are also subjective questions over whether two forms amount to "the same word": dog vs. dogs, clue vs. clueless, sign vs. signature; many other gray cases also arise. The Jewish Encyclopedia points out that, although there are 1,500 hapaxes in the Hebrew Bible, only about 400 are not obviously related to other attested word forms. A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author, and disparate authors often show similar values. In other words, hapax legomena are not a reliable indicator. Authorship studies now usually use a wide range of measures to look for patterns rather than relying upon single measurements. == Computer science == In the fields of computational linguistics and natural language processing (NLP), esp. corpus linguistics and machine-learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This disregard has the added benefit of significantly reducing the memory use of an application, since, by Zipf's law, many words are hapax legomena. == Examples == The following are some examples of hapax legomena in languages or corpora. === Arabic === In the Qurʾān: The proper nouns Iram (Q 89:7, Iram of the Pillars), Bābil (Q 2:102, Babylon), Bakka(t) (Q 3:96, Bakkah), Jibt (Q 4:51), Ramaḍān (Q 2:185, Ramadan), ar-Rūm (Q 30:2, Byzantine Empire), Tasnīm (Q 83:27), Qurayš (Q 106:1, Quraysh), Majūs (Q 22:17, Magian/Zoroastrian), Mārūt (Q 2:102, Harut and Marut), Makka(t) (Q 48:24, Mecca), Nasr (Q 71:23), (Ḏū) an-Nūn (Q 21:87) and Hārūt (Q 2:102, Harut and Marut) occur only once. zanjabīl (زَنْجَبِيل – ginger) is a Qurʾānic hapax (Q 76:17). zamharīr (زَمْهَرِيرًۭ) is a Qurʾānic hapax (Q 76:13), usually glossed as referring to extreme cold. The epitheton ornans aṣ-ṣamad (الصَّمَد – the One besought) is a Qurʾānic hapax (Q 112:2). ṭūd (طُودْ - mountain) is a Qurʾānic hapax (Q 26:63). === Chinese and Japanese === Classical Chinese and Japanese literature contains many Chinese characters that feature only once in the corpus, and their meaning and pronunciation has often been lost. Known in Japanese as kogo (孤語), literally "lonely characters", these can be considered a type of hapax legomenon. For example, the Classic of Poetry (c. 1000 BC) uses the character 篪 exactly once in the verse 「伯氏吹塤, 仲氏吹篪」, and it was only through the discovery of a description by Guo Pu (276–324 AD) that the character could be associated with a specific type of ancient flute. === English === It is fairly common for authors to "coin" new words to convey a particular meaning or for the sake of entertainment, without any suggestion that they are "proper" words. For example, P.G. Wodehouse and Lewis Carroll frequently coined novel words. Indexy, below, appears to be an example of this. Flother, as a synonym for snowflake, is a hapax legomenon of written English found in a manuscript entitled The XI Pains of Hell (c. 1275). Honorificabilitudinitatibus is a hapax legomenon of Shakespeare's works, coming from Erasmus' Adagia Indexy, in Bram Stoker's Dracula, used as an adjective to describe a situational state with no other further use in the language: "If that man had been an ordinary lunatic I would have taken my chance of trusting him; but he seems so mixed up with the Count in an indexy kind of way that I am afraid of doing anything wrong by helping his fads." Manticratic, meaning "of the rule by the Prophet's family or clan", was apparently invented by T. E. Lawrence and appears once in Seven Pillars of Wisdom. Nortelrye, a word for "education", occurs only once in Chaucer's The Reeve's Tale. Sassigassity, perhaps with the meaning of "audacity", occurs only once in Dickens's short story "A Christmas Tree". Slæpwerigne, "sleep-weary", occurs exactly once in the Old English corpus, in the Exeter Book. There is debate over whether it means "weary with sleep" or "weary for sleep". === German === The name of the 9th-century poem Muspilli is a back-formation from "muspille", Old High German hapax legomenon of unclear meaning only found in this text (see Muspilli § Etymology for discussion). === Ancient Greek === According to classical scholar Clyde Pharr, "the Iliad has 1,097 hapax legomena, while the Odyssey has 868". Others have defined the term differently, however, and count as few as 303 in the Iliad and 191 in the Odyssey. panaōrios (παναώριος), ancient Greek for "very untimely", is one of many words that occur only once in the Iliad. The Greek New Testament contains 686 local hapax legomena, which are sometimes called "New Testament hapaxes". 62 of these occur in 1 Peter and 54 occur in 2 Peter

    Read more →
  • Barney Pell

    Barney Pell

    Barney Pell (born March 18, 1968) is an American entrepreneur, angel investor and computer scientist. He was co-founder and CEO of Powerset, a pioneering natural language search startup, search strategist and architect for Microsoft's Bing search engine, a pioneer in the field of general game playing in artificial intelligence, and the architect of the first intelligent agent to fly onboard and control a spacecraft. He was co-founder, Vice Chairman and Chief Strategy Officer of Moon Express; co-founder and chairman of LocoMobi; and Associate Founder of Singularity University. == Career == === Education === Pell received his Bachelor of Science degree in symbolic systems from Stanford University in 1989, where he graduated Phi Beta Kappa and was a National Merit Scholar. Pell earned a PhD in computer science from Cambridge University in 1993, supervised by Stephen Pulman, where he was a Marshall Scholar. === Research === Pell's research is focused on basic problems in the study of intelligence, computer game playing, machine learning, natural language processing, autonomous robotics, and web search. Barney Pell has published over 30 technical papers on topics related to information retrieval, knowledge management, machine learning, artificial intelligence, and scheduling systems. In computer game playing and machine learning, he was a pioneer in the field of General Game Playing, and created programs to generate the rules of chess-like games and programs to play individual games directly from the rules without human assistance. He also did early work on machine learning in the game of Go and on an architecture for pragmatic reasoning for bidding in the game of Bridge. In natural language processing, he was a scientist in the Artificial Intelligence Center at SRI International, where he worked on the Core Language Engine. Barney Pell was the Technical Area Manager of the Collaborative and Assistant Systems area within the Computational Sciences Division (now the Intelligent Systems Division) at NASA Ames Research Center, where he oversaw a staff of 80 scientists working on information retrieval, search, knowledge management, machine learning, semantic technology, human centered systems, collaboration technology, adaptive user interfaces, human robot interaction, and other areas of artificial intelligence. From 1993 to 1998, Barney Pell worked as a Principal Investigator and Senior Computer Scientist at NASA Ames, where he conducted advanced research and development of autonomous control software for NASA's deep space missions. He was the Architect for the Deep Space One Remote Agent Experiment and the Project Lead for the Executive component of the Remote Agent Experiment, the first intelligent agent to fly onboard and control a spacecraft. === Business === Pell is an entrepreneur who has founded or co-founded several business ventures, including Powerset, Moon Express, and LocoMobi. He was the founder and CEO of Powerset, a San Francisco startup company that built a search engine based on natural language processing technology originally developed at XEROX PARC. On May 11, 2008, the company unveiled a tool for searching a fixed subset of Wikipedia using conversational phrases rather than keywords. On July 1, 2008, Microsoft signed an agreement to acquire Powerset for an estimated $100 million. Powerset became a part of Microsoft's search engine, Bing. From 2008 until August 2011, Pell served as Partner, Search Strategist, and Evangelist for Microsoft's search engine, Bing and as Head of Bing's Local and Mobile Search teams. Prior to joining Powerset, Pell was an Entrepreneur-in-Residence at Mayfield Fund, a venture capital firm in Silicon Valley. Pell is also a founder of Moon Express, Inc., a U.S. company awarded a $10M commercial lunar contract by NASA and a competitor in the Google Lunar X PRIZE. Pell was also co-founder and chairman of LocoMobi, Inc., a U.S. company developing mobile, software and hardware technology solutions for the parking industry. LocoMobi was winner of the Tie50 Award in 2014. Pell is also an associate founder of Singularity University and a Machine Learning Fellow at the Creative Destruction Lab at the Rotman School of Management From 1998 to 2000, Pell served as chief strategist and vice president of business development at StockMaster.com (acquired by Red Herring in March, 2000). From 2000 to 2002, Pell was Chief Strategist and Vice President of Business Development for Whizbang Labs. Pell has been an angel investor and advisor to numerous startup companies, including Pulse.io (acquired by Google), Aardvark (acquired by Google), Appjet (acquired by Google), Jibe Mobile (acquired by Google), Movity (acquired by Trulia), QuestBridge, BrandYourself, CrowdFlower (acquired by Appen), and LinkedIn. === Views and predictions === Pell has expressed views and predictions regarding technological advancements in coming years. He believes that humans will soon have "brain-machine interfaces that will let people interact with each other as if they had 'hangouts' in their mind." Pell predicts these interfaces to become available within 20 to 30 years. Pell also predicts advancements in bodily augmentation, such as "even-better-than-human prosthetics and high-quality tissue engineering within 10 years." Pell believes that with advancements in space exploration technology the moon will soon be a commercially viable resource for material such as platinum and water. == Awards and recognition == In 1986, Pell was awarded a National Merit Scholarship. In 1989, Pell was awarded a Marshall Scholarship. In 1989, Pell was elected Phi Beta Kappa. In 1997, Pell was part of the team award a NASA Software of the Year Award for the Deep Space 1 Remote Agent.

    Read more →
  • Automaton

    Automaton

    An automaton ( ; pl.: automata or automatons) is a relatively self-operating machine or control mechanism designed to automatically follow a sequence of operations or respond to predetermined instructions. Some automata, such as bellstrikers in mechanical clocks, are designed to give the illusion to the casual observer that they are operating under their own power or will, like a mechanical robot. The term has long been commonly associated with automated puppets that resemble moving humans or animals, built to impress and/or to entertain people. Animatronics are a modern type of automata with electronics, often used for the portrayal of characters or creatures in films and in theme park attractions. == Etymology == The word automaton is the latinization of the Ancient Greek automaton (αὐτόματον), which means "acting of one's own will". It was first used by Homer to describe an automatic door opening, or automatic movement of wheeled tripods. It is more often used to describe non-electronic moving machines, especially those that have been made to resemble human or animal actions, such as the jacks on old public striking clocks, or the cuckoo and any other animated figures on a cuckoo clock. == History == === Ancient === There are many examples of automata in Greek mythology: Hephaestus created automata for his workshop; Talos was an artificial man of bronze; King Alkinous of the Phaiakians employed gold and silver watchdogs. According to Aristotle, Daedalus used quicksilver to make his wooden statue of Aphrodite move. In other Greek legends he used quicksilver to install voice in his moving statues. The automata in the Hellenistic world were intended as tools, toys, religious spectacles, or prototypes for demonstrating basic scientific principles. Numerous water-powered automata were built by Ktesibios, a Greek inventor and the first head of the Great Library of Alexandria; for example, he "used water to sound a whistle and make a model owl move. He had invented the world's first 'cuckoo clock'". This tradition continued in Alexandria with inventors such as the Greek mathematician Hero of Alexandria (sometimes known as Heron), whose writings on hydraulics, pneumatics, and mechanics described siphons, a fire engine, a water organ, the aeolipile, and a programmable cart. Philo of Byzantium was famous for his inventions. Complex mechanical devices are known to have existed in Hellenistic Greece, though the only surviving example is the Antikythera mechanism, the earliest known analog computer. The clockwork is thought to have come originally from Rhodes, where there was apparently a tradition of mechanical engineering; the island was renowned for its automata; to quote Pindar's seventh Olympic Ode: The animated figures stand Adorning every public street And seem to breathe in stone, or move their marble feet. However, the information gleaned from recent scans of the fragments indicate that it may have come from the colonies of Corinth in Sicily and implies a connection with Archimedes. According to Jewish legend, King Solomon used his wisdom to design a throne with mechanical animals which hailed him as king when he ascended it; upon sitting down an eagle would place a crown upon his head, and a dove would bring him a Torah scroll. It is also said that when King Solomon stepped upon the throne, a mechanism was set in motion. As soon as he stepped upon the first step, a golden ox and a golden lion each stretched out one foot to support him and help him rise to the next step. On each side, the animals helped the King up until he was comfortably seated upon the throne. In ancient China, a curious account of automata is found in the Lie Zi text, believed to have originated around 400 BCE and compiled around the fourth century CE. Within it there is a description of a much earlier encounter between King Mu of Zhou (1023–957 BCE) and a mechanical engineer known as Yan Shi, an 'artificer'. The latter proudly presented the king with a very realistic and detailed life-size, human-shaped figure of his mechanical handiwork: The king stared at the figure in astonishment. It walked with rapid strides, moving its head up and down, so that anyone would have taken it for a live human being. The artificer touched its chin, and it began singing, perfectly in tune. He touched its hand, and it began posturing, keeping perfect time...As the performance was drawing to an end, the robot winked its eye and made advances to the ladies in attendance, whereupon the king became incensed and would have had Yen Shih [Yan Shi] executed on the spot had not the latter, in mortal fear, instantly taken the robot to pieces to let him see what it really was. And, indeed, it turned out to be only a construction of leather, wood, glue and lacquer, variously coloured white, black, red and blue. Examining it closely, the king found all the internal organs complete—liver, gall, heart, lungs, spleen, kidneys, stomach and intestines; and over these again, muscles, bones and limbs with their joints, skin, teeth and hair, all of them artificial...The king tried the effect of taking away the heart, and found that the mouth could no longer speak; he took away the liver and the eyes could no longer see; he took away the kidneys and the legs lost their power of locomotion. The king was delighted. Other notable examples of automata include Archytas' dove, mentioned by Aulus Gellius. Similar Chinese accounts of flying automata are written of the 5th century BC Mohist philosopher Mozi and his contemporary Lu Ban, who made artificial wooden birds (ma yuan) that could successfully fly according to the Han Fei Zi and other texts. === Medieval === The manufacturing tradition of automata continued in the Greek world well into the Middle Ages. On his visit to Constantinople in 949 ambassador Liutprand of Cremona described automata in the emperor Theophilos' palace, including "lions, made either of bronze or wood covered with gold, which struck the ground with their tails and roared with open mouth and quivering tongue," "a tree of gilded bronze, its branches filled with birds, likewise made of bronze gilded over, and these emitted cries appropriate to their species" and "the emperor's throne" itself, which "was made in such a cunning manner that at one moment it was down on the ground, while at another it rose higher and was to be seen up in the air." Similar automata in the throne room (singing birds, roaring and moving lions) were described by Luitprand's contemporary the Byzantine emperor Constantine Porphyrogenitus, in his book De Ceremoniis (Perì tês Basileíou Tákseōs). In the mid-8th century, the first wind powered automata were built: "statues that turned with the wind over the domes of the four gates and the palace complex of the Round City of Baghdad". The "public spectacle of wind-powered statues had its private counterpart in the 'Abbasid palaces where automata of various types were predominantly displayed." Also in the 8th century, the Muslim alchemist, Jābir ibn Hayyān (Geber), included recipes for constructing artificial snakes, scorpions, and humans that would be subject to their creator's control in his coded Book of Stones. In 827, Abbasid caliph al-Ma'mun had a silver and golden tree in his palace in Baghdad, which had the features of an automatic machine. There were metal birds that sang automatically on the swinging branches of this tree built by Muslim inventors and engineers. The Abbasid caliph al-Muqtadir also had a silver and golden tree in his palace in Baghdad in 917, with birds on it flapping their wings and singing. In the 9th century, the Banū Mūsā brothers invented a programmable automatic flute player and which they described in their Book of Ingenious Devices. Al-Jazari described complex programmable humanoid automata amongst other machines he designed and constructed in the Book of Knowledge of Ingenious Mechanical Devices in 1206. His automaton was a boat with four automatic musicians that floated on a lake to entertain guests at royal drinking parties. His mechanism had a programmable drum machine with pegs (cams) that bump into little levers that operate the percussion. The drummer could be made to play different rhythms and drum patterns if the pegs were moved around. Al-Jazari constructed a hand washing automaton first employing the flush mechanism now used in modern toilets. It features a female automaton standing by a basin filled with water. When the user pulls the lever, the water drains and the automaton refills the basin. His "peacock fountain" was another more sophisticated hand washing device featuring humanoid automata as servants who offer soap and towels. Mark E. Rosheim describes it as follows: "Pulling a plug on the peacock's tail releases water out of the beak; as the dirty water from the basin fills the hollow base a float rises and actuates a linkage which makes a servant figure appear from behind a door under the peacock and offer soap.

    Read more →
  • Best AI Background Removers in 2026

    Best AI Background Removers in 2026

    Comparing the best AI background remover? An AI background remover is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI background remover slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Statistical machine translation

    Statistical machine translation

    Statistical machine translation (SMT) is a machine translation approach where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation, that superseded the previous rule-based approach that required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's information theory. Statistical machine translation was re-introduced in the late 1980s and early 1990s by researchers at IBM's Thomas J. Watson Research Center. Before the introduction of neural machine translation, it was by far the most widely studied machine translation method. == Basis == The idea behind statistical machine translation comes from information theory. A document is translated according to the probability distribution p ( e | f ) {\displaystyle p(e|f)} that a string e {\displaystyle e} in the target language (for example, English) is the translation of a string f {\displaystyle f} in the source language (for example, French). The problem of modeling the probability distribution p ( e | f ) {\displaystyle p(e|f)} has been approached in a number of ways. One approach which lends itself well to computer implementation is to apply Bayes' theorem, that is p ( e | f ) ∝ p ( f | e ) p ( e ) {\displaystyle p(e|f)\propto p(f|e)p(e)} , where the translation model p ( f | e ) {\displaystyle p(f|e)} is the probability that the source string is the translation of the target string, and the language model p ( e ) {\displaystyle p(e)} is the probability of seeing that target language string. This decomposition is attractive as it splits the problem into two subproblems. Finding the best translation e ~ {\displaystyle {\tilde {e}}} is done by picking up the one that gives the highest probability: e ~ = a r g max e ∈ e ∗ p ( e | f ) = a r g max e ∈ e ∗ p ( f | e ) p ( e ) {\displaystyle {\tilde {e}}=arg\max _{e\in e^{}}p(e|f)=arg\max _{e\in e^{}}p(f|e)p(e)} . For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings e ∗ {\displaystyle e^{}} in the native language. Performing the search efficiently is the work of a machine translation decoder that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in speech recognition. As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence. Language models are typically approximated by smoothed n-gram models, and similar approaches have been applied to translation models, but this introduces additional complexity due to different sentence lengths and word orders in the languages. Statistical translation models were initially word based (Models 1-5 from IBM Hidden Markov model from Stephan Vogel and Model 6 from Franz-Joseph Och), but significant advances were made with the introduction of phrase based models. Later work incorporated syntax or quasi-syntactic structures. == Benefits == The most frequently cited benefits of statistical machine translation (SMT) over rule-based approach are: More efficient use of human and data resources There are many parallel corpora in machine-readable format and even more monolingual data. Generally, SMT systems are not tailored to any specific pair of languages. More fluent translations owing to use of a language model == Shortcomings == Corpus creation can be costly. Specific errors are hard to predict and fix. Results may have superficial fluency that masks translation problems. Statistical machine translation usually works less well for language pairs with significantly different word order. The benefits obtained for translation between Western European languages are not representative of results for other language pairs, owing to smaller training corpora and greater grammatical differences. == Word-based translation == In word-based translation, the fundamental unit of translation is a word in some natural language. Typically, the number of words in translated sentences are different, because of compound words, morphology and idioms. The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces. Necessarily it is assumed by information theory that each covers the same concept. In practice this is not really true. For example, the English word corner can be translated in Spanish by either rincón or esquina, depending on whether it is to mean its internal or external angle. Simple word-based translation cannot translate between languages with different fertility. Word-based translation systems can relatively simply be made to cope with high fertility, such that they could map a single word to multiple words, but not the other way about. For example, if we were translating from English to French, each word in English could produce any number of French words— sometimes none at all. But there is no way to group two English words producing a single French word. An example of a word-based translation system is the freely available GIZA++ package (GPLed), which includes the training program for IBM models and HMM model and Model 6. The word-based translation is not widely used today; phrase-based systems are more common. Most phrase-based systems are still using GIZA++ to align the corpus. The alignments are used to extract phrases or deduce syntax rules. And matching words in bi-text is still a problem actively discussed in the community. Because of the predominance of GIZA++, there are now several distributed implementations of it online. == Phrase-based translation == In phrase-based translation, the aim is to reduce the restrictions of word-based translation by translating whole sequences of words, where the lengths may differ. The sequences of words are called blocks or phrases. These are typically not linguistic phrases, but phrasemes that were found using statistical methods from corpora. It has been shown that restricting the phrases to linguistic phrases (syntactically motivated groups of words, see syntactic categories) decreased the quality of translation. The chosen phrases are further mapped one-to-one based on a phrase translation table, and may be reordered. This table could be learnt based on word-alignment, or directly from a parallel corpus. The second model is trained using the expectation maximization algorithm, similarly to the word-based IBM model. == Syntax-based translation == Syntax-based translation is based on the idea of translating syntactic units, rather than single words or strings of words (as in phrase-based MT), i.e. (partial) parse trees of sentences/utterances. Until the 1990s, with advent of strong stochastic parsers, the statistical counterpart of the old idea of syntax-based translation did not take off. Examples of this approach include DOP-based MT and later synchronous context-free grammars. == Hierarchical phrase-based translation == Hierarchical phrase-based translation combines the phrase-based and syntax-based approaches to translation. It uses synchronous context-free grammar rules, but the grammars can be constructed by an extension of methods for phrase-based translation without reference to linguistically motivated syntactic constituents. This idea was first introduced in Chiang's Hiero system (2005). == Language models == A language model is an essential component of any statistical machine translation system, which aids in making the translation as fluent as possible. It is a function that takes a translated sentence and returns the probability of it being said by a native speaker. A good language model will for example assign a higher probability to the sentence "the house is small" than to "small the is house". Other than word order, language models may also help with word choice: if a foreign word has multiple possible translations, these functions may give better probabilities for certain translations in specific contexts in the target language. == Systems implementing statistical machine translation == Google Translate (started transition to neural machine translation in 2016) Microsoft Translator (started transition to neural machine translation in 2016) Yandex.Translate (switched to hybrid approach incorporating neural machine translation in 2017) == Challenges with statistical machine translation == Problems with statistical machine translation include: === Sentence alignment === Single sentences in one language can be found translated into several sentences in the o

    Read more →
  • Machine translation of sign languages

    Machine translation of sign languages

    The machine translation of sign languages has been possible, albeit in a limited fashion, since 1977. When a research project successfully matched English letters from a keyboard to ASL manual alphabet letters which were simulated on a robotic hand. These technologies translate signed languages into written or spoken language, and written or spoken language to sign language, without the use of a human interpreter. Sign languages possess different phonological features than spoken languages, which has created obstacles for developers. Developers use computer vision and machine learning to recognize specific phonological parameters and epentheses unique to sign languages, and speech recognition and natural language processing allow interactive communication between hearing and deaf people. == Limitations == Sign language translation technologies are limited in the same way as spoken language translation. None can translate with 100% accuracy. In fact, sign language translation technologies are far behind their spoken language counterparts. This is, in no trivial way, due to the fact that signed languages have multiple articulators. Where spoken languages are articulated through the vocal tract, signed languages are articulated through the hands, arms, head, shoulders, torso, and parts of the face. This multi-channel articulation makes translating sign languages very difficult. An additional challenge for sign language MT is the fact that there is no formal written format for signed languages. There are notations systems but no writing system has been adopted widely enough, by the international Deaf community, that it could be considered the 'written form' of a given sign language. Sign Languages then are recorded in various video formats. There is no gold standard parallel corpus that is large enough for SMT, for example. == History == The history of automatic sign language translation started with the development of hardware such as finger-spelling robotic hands. In 1977, a finger-spelling hand project called RALPH (short for "Robotic Alphabet") created a robotic hand that can translate alphabets into finger-spellings. Later, the use of gloves with motion sensors became the mainstream, and some projects such as the CyberGlove and VPL Data Glove were born. The wearable hardware made it possible to capture the signers' hand shapes and movements with the help of the computer software. However, with the development of computer vision, wearable devices were replaced by cameras due to their efficiency and fewer physical restrictions on signers. To process the data collected through the devices, researchers implemented neural networks such as the Stuttgart Neural Network Simulator for pattern recognition in projects such as the CyberGlove. Researchers also use many other approaches for sign recognition. For example, Hidden Markov Models are used to analyze data statistically, and GRASP and other machine learning programs use training sets to improve the accuracy of sign recognition. Fusion of non-wearable technologies such as cameras and Leap Motion controllers have shown to increase the ability of automatic sign language recognition and translation software. == Technologies == === VISICAST === http://www.visicast.cmp.uea.ac.uk/Visicast_index.html === eSIGN project === http://www.visicast.cmp.uea.ac.uk/eSIGN/index.html === The American Sign Language Avatar Project at DePaul University === http://asl.cs.depaul.edu/ === Spanish to LSE === López-Ludeña, Verónica; San-Segundo, Rubén; González, Carlos; López, Juan Carlos; Pardo, José M. (2012), Methodology for developing a Speech into Sign Language Translation System in a New Semantic Domain (PDF), CiteSeerX 10.1.1.1065.5265, S2CID 2724186 === SignAloud === SignAloud is a technology that incorporates a pair of gloves made by a group of students at University of Washington that transliterate American Sign Language (ASL) into English. In February 2015 Thomas Pryor, a hearing student from the University of Washington, created the first prototype for this device at Hack Arizona, a hackathon at the University of Arizona. Pryor continued to develop the invention and in October 2015, Pryor brought Navid Azodi onto the SignAloud project for marketing and help with public relations. Azodi has a rich background and involvement in business administration, while Pryor has a wealth of experience in engineering. In May 2016, the duo told NPR that they are working more closely with people who use ASL so that they can better understand their audience and tailor their product to the needs of these people rather than the assumed needs. However, no further versions have been released since then. The invention was one of seven to win the Lemelson-MIT Student Prize, which seeks to award and applaud young inventors. Their invention fell under the "Use it!" category of the award which includes technological advances to existing products. They were awarded $10,000. The gloves have sensors that track the users hand movements and then send the data to a computer system via Bluetooth. The computer system analyzes the data and matches it to English words, which are then spoken aloud by a digital voice. The gloves do not have capability for written English input to glove movement output or the ability to hear language and then sign it to a deaf person, which means they do not provide reciprocal communication. The device also does not incorporate facial expressions and other nonmanual markers of sign languages, which may alter the actual interpretation from ASL. === ProDeaf === ProDeaf (WebLibras) is a computer software that can translate both text and voice into Portuguese Libras (Portuguese Sign Language) "with the goal of improving communication between the deaf and hearing." There is currently a beta edition in production for American Sign Language as well. The original team began the project in 2010 with a combination of experts including linguists, designers, programmers, and translators, both hearing and deaf. The team originated at Federal University of Pernambuco (UFPE) from a group of students involved in a computer science project. The group had a deaf team member who had difficulty communicating with the rest of the group. In order to complete the project and help the teammate communicate, the group created Proativa Soluções and have been moving forward ever since. The current beta version in American Sign Language is very limited. For example, there is a dictionary section and the only word under the letter 'j' is 'jump'. If the device has not been programmed with the word, then the digital avatar must fingerspell the word. The last update of the app was in June 2016, but ProDeaf has been featured in over 400 stories across the country's most popular media outlets. The application cannot read sign language and turn it into word or text, so it only serves as a one-way communication. Additionally, the user cannot sign to the app and receive an English translation in any form, as English is still in the beta edition. === Kinect Sign Language Translator === Since 2012, researchers from the Chinese Academy of Sciences and specialists of deaf education from Beijing Union University in China have been collaborating with Microsoft Research Asian team to create Kinect Sign Language Translator. The translator consists of two modes: translator mode and communication mode. The translator mode is capable of translating single words from sign into written words and vice versa. The communication mode can translate full sentences and the conversation can be automatically translated with the use of the 3D avatar. The translator mode can also detect the postures and hand shapes of a signer as well as the movement trajectory using the technologies of machine learning, pattern recognition, and computer vision. The device also allows for reciprocal communication because the speech recognition technology allows the spoken language to be translated into the sign language and the 3D modeling avatar can sign back to the deaf people. The original project was started in China based on translating Chinese Sign Language. In 2013, the project was presented at Microsoft Research Faculty Summit and Microsoft company meeting. Currently, this project is also being worked by researchers in the United States to implement American Sign Language translation. As of now, the device is still a prototype, and the accuracy of translation in the communication mode is still not perfect. === SignAll === SignAll is an automatic sign language translation system provided by Dolphio Technologies in Hungary. The team is "pioneering the first automated sign language translation solution, based on computer vision and natural language processing (NLP), to enable everyday communication between individuals with hearing who use spoken English and deaf or hard of hearing individuals who use ASL." The system of SignAll uses Kinect from Microsoft and other web camera

    Read more →
  • Stop Motion Studio

    Stop Motion Studio

    Stop Motion Studio is a stop motion animation software developed by Cateater LLC. It is available as both an app for iOS and Android and as a software for Windows and Mac. Two versions of the software exist, the standard Stop Motion Studio for free, and the paid Stop Motion Studio Pro, which contains extra, more advanced features. The software is commonly used in brickfilming.

    Read more →
  • The Best Free AI Coding Assistant for Beginners

    The Best Free AI Coding Assistant for Beginners

    Trying to pick the best AI coding assistant? An AI coding assistant is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI coding assistant slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • How to Choose an AI Photo Editor

    How to Choose an AI Photo Editor

    In search of the best AI photo editor? An AI photo editor is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI photo editor slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →