AI Business Quiz

AI Business Quiz — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • SUPS

    SUPS

    In computational neuroscience, SUPS (for Synaptic Updates Per Second) or formerly CUPS (Connections Updates Per Second) is a measure of a neuronal network performance, useful in fields of neuroscience, cognitive science, artificial intelligence, and computer science. == Computing == For a processor or computer designed to simulate a neural network SUPS is measured as the product of simulated neurons N {\displaystyle N} and average connectivity c {\displaystyle c} (synapses) per neuron per second: S U P S = c × N {\displaystyle SUPS=c\times N} Depending on the type of simulation it is usually equal to the total number of synapses simulated. In an "asynchronous" dynamic simulation if a neuron spikes at υ {\displaystyle \upsilon } Hz, the average rate of synaptic updates provoked by the activity of that neuron is υ c N {\displaystyle \upsilon cN} . In a synchronous simulation with step Δ t {\displaystyle \Delta t} the number of synaptic updates per second would be c N Δ t {\displaystyle {\frac {cN}{\Delta t}}} . As Δ t {\displaystyle \Delta t} has to be chosen much smaller than the average interval between two successive afferent spikes, which implies Δ t < 1 υ N {\displaystyle \Delta t<{\frac {1}{\upsilon N}}} , giving an average of synaptic updates equal to υ c N 2 {\displaystyle \upsilon cN^{2}} . Therefore, spike-driven synaptic dynamics leads to a linear scaling of computational complexity O(N) per neuron, compared with the O(N2) in the "synchronous" case. == Records == Developed in the 1980s Adaptive Solutions' CNAPS-1064 Digital Parallel Processor chip is a full neural network (NNW). It was designed as a coprocessor to a host and has 64 sub-processors arranged in a 1D array and operating in a SIMD mode. Each sub-processor can emulate one or more neurons and multiple chips can be grouped together. At 25 MHz it is capable of 1.28 GMAC. After the presentation of the RN-100 (12 MHz) single neuron chip at Seattle 1991 Ricoh developed the multi-neuron chip RN-200. It had 16 neurons and 16 synapses per neuron. The chip has on-chip learning ability using a proprietary backdrop algorithm. It came in a 257-pin PGA encapsulation and drew 3.0 W at a maximum. It was capable of 3 GCPS (1 GCPS at 32 MHz). In 1991–97, Siemens developed the MA-16 chip, SYNAPSE-1 and SYNAPSE-3 Neurocomputer. The MA-16 was a fast matrix-matrix multiplier that can be combined to form systolic arrays. It could process 4 patterns of 16 elements each (16-bit), with 16 neuron values (16-bit) at a rate of 800 MMAC or 400 MCPS at 50 MHz. The SYNAPSE3-PC PCI card contained 2 MA-16 with a peak performance of 2560 MOPS (1.28 GMAC); 7160 MOPS (3.58 GMAC) when using three boards. In 2013, the K computer was used to simulate a neural network of 1.73 billion neurons with a total of 10.4 trillion synapses (1% of the human brain). The simulation ran for 40 minutes to simulate 1 s of brain activity at a normal activity level (4.4 on average). The simulation required 1 Petabyte of storage.

    Read more →
  • How to Choose an AI Copywriting Tool

    How to Choose an AI Copywriting Tool

    Trying to pick the best AI copywriting tool? An AI copywriting tool is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI copywriting tool slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Deep Learning Studio

    Deep Learning Studio

    Deep Learning Studio is a software tool that aims to simplify the creation of deep learning models used in artificial intelligence. It is compatible with a number of open-source programming frameworks popularly used in artificial neural networks, including MXNet and Google's TensorFlow. Prior to the release of Deep Learning Studio in January 2017, proficiency in Python, among other programming languages, was essential in developing effective deep learning models. Deep Learning Studio sought to simplify the model creation process through a visual, drag-and-drop interface and the application of pre-trained learning models on available data. Irving, Texas–based Deep Cognition Inc. is the developer behind Deep Learning Studio. In 2017, the software allowed Deep Cognition to become a finalist for Best Innovation in Deep Learning in the Alconics Awards, which are given annually to the best artificial intelligence software. Deep Cognition launched version 2.0 of Deep Learning Studio at NVIDIA's GTC 2018 Conference in San Jose, California. Fremont, California–based computing products supplier Exxact Corp provides desktop computers specifically built to handle Deep Learning Studio workloads. == Features == Source: Deep Learning Studio is available in two versions: Desktop and Cloud, both of which are free software. The Desktop version is available on Windows and Ubuntu. The Cloud version is available in single-user and multi-user configurations. A Deep Cognition account is needed to access the Cloud version. Account registration is free. Deep Learning Studio can import existing Keras models; it also takes a data set as an input. Deep Learning Studio's AutoML feature allows automatic generation of deep learning models. More advanced users may choose to generate their own models using various types of layers and neural networks. Deep Learning Studio also has a library of loss functions and optimizers for use in hyperparameter tuning, a traditionally complicated area in neural network programming. Generated models can be trained using either CPUs or GPUs. Trained models can then be used for predictive analytics.

    Read more →
  • Statistical machine translation

    Statistical machine translation

    Statistical machine translation (SMT) is a machine translation approach where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to machine translation as well as with example-based machine translation, that superseded the previous rule-based approach that required explicit description of each and every linguistic rule, which was costly, and which often did not generalize to other languages. The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's information theory. Statistical machine translation was re-introduced in the late 1980s and early 1990s by researchers at IBM's Thomas J. Watson Research Center. Before the introduction of neural machine translation, it was by far the most widely studied machine translation method. == Basis == The idea behind statistical machine translation comes from information theory. A document is translated according to the probability distribution p ( e | f ) {\displaystyle p(e|f)} that a string e {\displaystyle e} in the target language (for example, English) is the translation of a string f {\displaystyle f} in the source language (for example, French). The problem of modeling the probability distribution p ( e | f ) {\displaystyle p(e|f)} has been approached in a number of ways. One approach which lends itself well to computer implementation is to apply Bayes' theorem, that is p ( e | f ) ∝ p ( f | e ) p ( e ) {\displaystyle p(e|f)\propto p(f|e)p(e)} , where the translation model p ( f | e ) {\displaystyle p(f|e)} is the probability that the source string is the translation of the target string, and the language model p ( e ) {\displaystyle p(e)} is the probability of seeing that target language string. This decomposition is attractive as it splits the problem into two subproblems. Finding the best translation e ~ {\displaystyle {\tilde {e}}} is done by picking up the one that gives the highest probability: e ~ = a r g max e ∈ e ∗ p ( e | f ) = a r g max e ∈ e ∗ p ( f | e ) p ( e ) {\displaystyle {\tilde {e}}=arg\max _{e\in e^{}}p(e|f)=arg\max _{e\in e^{}}p(f|e)p(e)} . For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings e ∗ {\displaystyle e^{}} in the native language. Performing the search efficiently is the work of a machine translation decoder that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in speech recognition. As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence. Language models are typically approximated by smoothed n-gram models, and similar approaches have been applied to translation models, but this introduces additional complexity due to different sentence lengths and word orders in the languages. Statistical translation models were initially word based (Models 1-5 from IBM Hidden Markov model from Stephan Vogel and Model 6 from Franz-Joseph Och), but significant advances were made with the introduction of phrase based models. Later work incorporated syntax or quasi-syntactic structures. == Benefits == The most frequently cited benefits of statistical machine translation (SMT) over rule-based approach are: More efficient use of human and data resources There are many parallel corpora in machine-readable format and even more monolingual data. Generally, SMT systems are not tailored to any specific pair of languages. More fluent translations owing to use of a language model == Shortcomings == Corpus creation can be costly. Specific errors are hard to predict and fix. Results may have superficial fluency that masks translation problems. Statistical machine translation usually works less well for language pairs with significantly different word order. The benefits obtained for translation between Western European languages are not representative of results for other language pairs, owing to smaller training corpora and greater grammatical differences. == Word-based translation == In word-based translation, the fundamental unit of translation is a word in some natural language. Typically, the number of words in translated sentences are different, because of compound words, morphology and idioms. The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces. Necessarily it is assumed by information theory that each covers the same concept. In practice this is not really true. For example, the English word corner can be translated in Spanish by either rincón or esquina, depending on whether it is to mean its internal or external angle. Simple word-based translation cannot translate between languages with different fertility. Word-based translation systems can relatively simply be made to cope with high fertility, such that they could map a single word to multiple words, but not the other way about. For example, if we were translating from English to French, each word in English could produce any number of French words— sometimes none at all. But there is no way to group two English words producing a single French word. An example of a word-based translation system is the freely available GIZA++ package (GPLed), which includes the training program for IBM models and HMM model and Model 6. The word-based translation is not widely used today; phrase-based systems are more common. Most phrase-based systems are still using GIZA++ to align the corpus. The alignments are used to extract phrases or deduce syntax rules. And matching words in bi-text is still a problem actively discussed in the community. Because of the predominance of GIZA++, there are now several distributed implementations of it online. == Phrase-based translation == In phrase-based translation, the aim is to reduce the restrictions of word-based translation by translating whole sequences of words, where the lengths may differ. The sequences of words are called blocks or phrases. These are typically not linguistic phrases, but phrasemes that were found using statistical methods from corpora. It has been shown that restricting the phrases to linguistic phrases (syntactically motivated groups of words, see syntactic categories) decreased the quality of translation. The chosen phrases are further mapped one-to-one based on a phrase translation table, and may be reordered. This table could be learnt based on word-alignment, or directly from a parallel corpus. The second model is trained using the expectation maximization algorithm, similarly to the word-based IBM model. == Syntax-based translation == Syntax-based translation is based on the idea of translating syntactic units, rather than single words or strings of words (as in phrase-based MT), i.e. (partial) parse trees of sentences/utterances. Until the 1990s, with advent of strong stochastic parsers, the statistical counterpart of the old idea of syntax-based translation did not take off. Examples of this approach include DOP-based MT and later synchronous context-free grammars. == Hierarchical phrase-based translation == Hierarchical phrase-based translation combines the phrase-based and syntax-based approaches to translation. It uses synchronous context-free grammar rules, but the grammars can be constructed by an extension of methods for phrase-based translation without reference to linguistically motivated syntactic constituents. This idea was first introduced in Chiang's Hiero system (2005). == Language models == A language model is an essential component of any statistical machine translation system, which aids in making the translation as fluent as possible. It is a function that takes a translated sentence and returns the probability of it being said by a native speaker. A good language model will for example assign a higher probability to the sentence "the house is small" than to "small the is house". Other than word order, language models may also help with word choice: if a foreign word has multiple possible translations, these functions may give better probabilities for certain translations in specific contexts in the target language. == Systems implementing statistical machine translation == Google Translate (started transition to neural machine translation in 2016) Microsoft Translator (started transition to neural machine translation in 2016) Yandex.Translate (switched to hybrid approach incorporating neural machine translation in 2017) == Challenges with statistical machine translation == Problems with statistical machine translation include: === Sentence alignment === Single sentences in one language can be found translated into several sentences in the o

    Read more →
  • Level-set method

    Level-set method

    The Level-set method (LSM) is a conceptual framework for using level sets as a tool for numerical analysis of surfaces and shapes. LSM can perform numerical computations involving curves and surfaces on a fixed Cartesian grid without having to parameterize these objects. LSM makes it easier to perform computations on shapes with sharp corners and shapes that change topology (such as by splitting in two or developing holes). These characteristics make LSM effective for modeling objects that vary in time, such as an airbag inflating or a drop of oil floating in water. == Overview == The figure on the right illustrates several ideas about LSM. In the upper left corner is a bounded region with a well-behaved boundary. Below it, the red surface is the graph of a level set function φ {\displaystyle \varphi } determining this shape, and the flat blue region represents the X-Y plane. The boundary of the shape is then the zero-level set of φ {\displaystyle \varphi } , while the shape itself is the set of points in the plane for which φ {\displaystyle \varphi } is positive (interior of the shape) or zero (at the boundary). In the top row, the shape's topology changes as it is split in two. It is challenging to describe this transformation numerically by parameterizing the boundary of the shape and following its evolution. An algorithm can be used to detect the moment the shape splits in two and then construct parameterizations for the two newly obtained curves. On the bottom row, however, the plane at which the level set function is sampled is translated upwards, on which the shape's change in topology is described. It is less challenging to work with a shape through its level-set function rather than with itself directly, in which a method would need to consider all the possible deformations the shape might undergo. Thus, in two dimensions, the level-set method amounts to representing a closed curve Γ {\displaystyle \Gamma } (such as the shape boundary in our example) using an auxiliary function φ {\displaystyle \varphi } , called the level-set function. The curve Γ {\displaystyle \Gamma } is represented as the zero-level set of φ {\displaystyle \varphi } by Γ = { ( x , y ) ∣ φ ( x , y ) = 0 } , {\displaystyle \Gamma =\{(x,y)\mid \varphi (x,y)=0\},} and the level-set method manipulates Γ {\displaystyle \Gamma } implicitly through the function φ {\displaystyle \varphi } . This function φ {\displaystyle \varphi } is assumed to take positive values inside the region delimited by the curve Γ {\displaystyle \Gamma } and negative values outside. == The level-set equation == If the curve Γ {\displaystyle \Gamma } moves in the normal direction with a speed v {\displaystyle v} , then by chain rule and implicit differentiation, it can be determined that the level-set function φ {\displaystyle \varphi } satisfies the level-set equation ∂ φ ∂ t = v | ∇ φ | . {\displaystyle {\frac {\partial \varphi }{\partial t}}=v|\nabla \varphi |.} Here, | ⋅ | {\displaystyle |\cdot |} is the Euclidean norm (denoted customarily by single bars in partial differential equations), and t {\displaystyle t} is time. This is a partial differential equation, in particular a Hamilton–Jacobi equation, and can be solved numerically, for example, by using finite differences on a Cartesian grid. However, the numerical solution of the level set equation may require advanced techniques. Simple finite difference methods fail quickly. Upwinding methods such as the Godunov method are considered better; however, the level set method does not guarantee preservation of the volume and shape of the set level in an advection field that maintains shape and size, for example, a uniform or rotational velocity field. Instead, the shape of the level set may become distorted, and the level set may disappear over a few time steps. Therefore, high-order finite difference schemes, such as high-order essentially non-oscillatory (ENO) schemes, are often required, and even then, the feasibility of long-term simulations is questionable. More advanced methods have been developed to overcome this; for example, combinations of the leveling method with tracking marker particles suggested by the velocity field. == Example == Consider a unit circle in R 2 {\textstyle \mathbb {R} ^{2}} , shrinking in on itself at a constant rate, i.e. each point on the boundary of the circle moves along its inwards pointing normally at some fixed speed. The circle will shrink and eventually collapse down to a point. If an initial distance field is constructed (i.e. a function whose value is the signed Euclidean distance to the boundary, positive interior, negative exterior) on the initial circle, the normalized gradient of this field will be the circle normal. If the field has a constant value subtracted from it in time, the zero level (which was the initial boundary) of the new fields will also be circular and will similarly collapse to a point. This is due to this being effectively the temporal integration of the Eikonal equation with a fixed front velocity. == Applications == In mathematical modeling of combustion, LSM is used to describe the instantaneous flame surface, known as the G equation. Level-set data structures have been developed to facilitate the use of the level-set method in computer applications. Computational fluid dynamics Trajectory planning Optimization Image processing Computational biophysics Discrete complex dynamics (visualization of the parameter plane and the dynamic plane) == History == The level-set method was developed in 1979 by Alain Dervieux, and subsequently popularized by Stanley Osher and James Sethian. It has since become popular in many disciplines, such as image processing, computer graphics, computational geometry, optimization, computational fluid dynamics, and computational biology.

    Read more →
  • Tang Xiao'ou

    Tang Xiao'ou

    Tang Xiao'ou (汤晓鸥; 24 January 1968 – 15 December 2023) was a Chinese businessman and computer scientist. He was the founder and chairman of SenseTime, an AI company. He also served as professor of information engineering, associate dean of engineering, and outstanding fellow of engineering at the Chinese University of Hong Kong. Tang's research primarily focused on areas such as computer vision, pattern recognition, and video processing. Tang was honored with the Best Paper Award at the 2009 IEEE Conference on Computer Vision and Pattern Recognition. He served as the programme chair in 2009 and the general chair in 2019 for the IEEE International Conference on Computer Vision. His editorial contributions include roles as an Associate Editor for both the IEEE Transactions on Pattern Analysis and Machine Intelligence and the International Journal of Computer Vision. Additionally, Tang has been recognised as a Fellow of the IEEE. == Biography == Tang was born in Anshan, Liaoning, northeastern China in 1968. Tang received a Bachelor of Science with a major in computer science from the University of Science and Technology of China in 1990. He received a Master of Science from the University of Rochester in 1991 and a Doctor of Philosophy in ocean engineering from the Massachusetts Institute of Technology in 1996. He worked at MIT and Woods Hole Oceanographic Institution during his doctoral studies. Funders of his research included the Office of Naval Research of the United States Department of the Navy. After graduating from MIT, Tang taught in the Department of Information Engineering of the Chinese University of Hong Kong. In 2001, he founded the Multimedia Laboratory of the Chinese University of Hong Kong. From 2005 to 2008, he worked at Microsoft Research Asia. He served as Associate Dean of the Chinese University of Hong Kong. In 2014, he spearheaded the first facial recognition to beat human accuracy. Tang co-founded SenseTime with Xu Li in 2014. Upon SenseTime's IPO in December 2021, Tang was estimated to have a net worth of approximately $3.4 billion. Tang died on 15 December 2023, at the age of 55. SenseTime made the announcement the next day and changed the colour scheme of its website to black-and-white in mourning. The Chinese University of Hong Kong also changed his faculty page to a black-and-white theme.

    Read more →
  • How to Choose an AI Text-to-image Tool

    How to Choose an AI Text-to-image Tool

    Curious about the best AI text-to-image tool? An AI text-to-image tool is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI text-to-image tool slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Tamara Broderick

    Tamara Broderick

    Tamara Ann Broderick is an American computer scientist at the Massachusetts Institute of Technology. She works on machine learning and Bayesian inference. == Education and early career == Broderick is from Parma Heights, Ohio. She attended Laurel School and graduated in 2003. Whilst at high school she took part in the inaugural Massachusetts Institute of Technology Women's Technology Program. She studied mathematics at Princeton University, earning a bachelor's degree in 2007. She was a Marshall scholar, allowing her to pursue graduate research at the University of Cambridge. She was a runner-up in the Association for Women in Mathematics Alice T. Shafer Prize for Excellence in Mathematics. She was co-president of the Princeton Math Club and organised a competition for high school maths teams. She won the Phi Beta Kappa Prize for the highest academic average at Princeton University. During her undergraduate degree, Broderick worked on dark matter haloes with Rachel Mandelbaum. Broderick moved to the United Kingdom for her graduate studies, earning a Master of Advanced Studies for completing Part III of the Mathematical Tripos at the University of Cambridge in 2009. Her Master's thesis looked at the Nomon selection method, improving the efficiency of communications. She returned to America in 2009, joining University of California, Berkeley for her Master's and PhD. Her graduate research was supported by the Berkeley Fellowship and a National Science Foundation Fellowship. Her PhD thesis Clusters and features from combinatorial stochastic processes looked at clustering and speeding up the analysis of large, streaming data sets. In 2013 she was selected for the Berkeley EECS Rising Stars conference. == Research and career == Broderick joined Massachusetts Institute of Technology as an assistant professor in 2015. She is interested in Bayesian statistics and graphical models. She was the recipient of a Google Faculty Research Grant and International Society for Bayesian Analysis Lifetime Members Junior Researcher Award. She was awarded an Army Research Office young investigator program award to investigate machine-learning to quantify uncertainty in data analysis. Broderick is also Alfred P. Sloan Foundation scholar. === Academic service === In 2018, Broderick spoke at the Harvard University Institute for Applied Computational Science Women in Data Science conference. She spoke about Bayesian inference at the 2018 International Conference on Machine Learning. She led a three-day Masterclass on machine learning at University College London in June 2018. Broderick is a scientific advisor for AI.Reverie and WiML (Women in Machine Learning). She has developed a high-school level introduction to machine learning with the Women's Technology Program (WTP). Software she has developed is available on her website. === Awards and honors === Broderick was awarded the Evelyn Fix Memorial Medal and Citation and the International Society for Bayesian Analysis Savage Award for her doctoral thesis. She was awarded a National Science Foundation CAREER Award to scale her machine learning techniques. She was a 2021 Leadership Academy winner of the Committee of Presidents of Statistical Societies.

    Read more →
  • Learning to rank

    Learning to rank

    Learning to rank (LTR) or machine-learned ranking (MLR) is the application of machine learning, often supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval and recommender systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data. == Applications == === In information retrieval === Ranking is a central part of many information retrieval problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising. A possible architecture of a machine-learned search engine is shown in the accompanying figure. Training data consists of queries and documents matching them together with the relevance degree of each match. It may be prepared manually by human assessors (or raters, as Google calls them), who check results for some queries and determine relevance of each result. It is not feasible to check the relevance of all documents, and so typically a technique called pooling is used — only the top few documents, retrieved by some existing ranking models are checked. This technique may introduce selection bias. Alternatively, training data may be derived automatically by analyzing clickthrough logs (i.e. search results which got clicks from users), query chains, or such search engines' features as Google's (since-replaced) SearchWiki. Clickthrough logs can be biased by the tendency of users to click on the top search results on the assumption that they are already well-ranked. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. Typically, users expect a search query to complete in a short time (such as a few hundred milliseconds for web search), which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the vector space model, Boolean model, weighted AND, or BM25. This phase is called top- k {\displaystyle k} document retrieval and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes. In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents. === In other areas === Learning to rank algorithms have been applied in areas other than information retrieval: In machine translation for ranking a set of hypothesized translations; In computational biology for ranking candidate 3-D structures in protein structure prediction problems; In recommender systems for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article. == Feature vectors == For the convenience of MLR algorithms, query-document pairs are usually represented by numerical vectors, which are called feature vectors. Such an approach is sometimes called bag of features and is analogous to the bag of words model and vector space model used in information retrieval for representation of documents. Components of such vectors are called features, factors or ranking signals. They may be divided into three groups (features from document retrieval are shown as examples): Query-independent or static features — those features, which depend only on the document, but not on the query. For example, PageRank or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's static quality score (or static rank), which is often used to speed up search query evaluation. Query-dependent or dynamic features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions. Query-level features or query features, which depend only on the query. For example, the number of words in a query. Some examples of features, which were used in the well-known LETOR dataset: TF, TF-IDF, BM25, and language modeling scores of document's zones (title, body, anchors text, URL) for a given query; Lengths and IDF sums of document's zones; Document's PageRank, HITS ranks and their variants. Selecting and designing good features is an important area in machine learning, which is called feature engineering. == Evaluation measures == There are several measures (metrics) which are commonly used to judge how well an algorithm is doing on training data and to compare the performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics. Examples of ranking quality measures: Mean average precision (MAP); DCG and NDCG; Precision@n, NDCG@n, where "@n" denotes that the metrics are evaluated only on top n documents; Mean reciprocal rank; Kendall's tau; Spearman's rho. DCG and its normalized variant NDCG are usually preferred in academic research when multiple levels of relevance are used. Other metrics such as MAP, MRR and precision, are defined only for binary judgments. Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric: Expected reciprocal rank (ERR); Yandex's pfound. Both of these metrics are based on the assumption that the user is more likely to stop looking at search results after examining a more relevant document, than after a less relevant document. == Approaches == Learning to Rank approaches are often categorized using one of three approaches: pointwise (where individual documents are ranked), pairwise (where pairs of documents are ranked into a relative order), and listwise (where an entire list of documents are ordered). Tie-Yan Liu of Microsoft Research Asia has analyzed existing algorithms for learning to rank problems in his book Learning to Rank for Information Retrieval. He categorized them into three groups by their input spaces, output spaces, hypothesis spaces (the core function of the model) and loss functions: the pointwise, pairwise, and listwise approach. In practice, listwise approaches often outperform pairwise approaches and pointwise approaches. This statement was further supported by a large scale experiment on the performance of different learning-to-rank methods on a large collection of benchmark data sets. In this section, without further notice, x {\displaystyle x} denotes an object to be evaluated, for example, a document or an image, f ( x ) {\displaystyle f(x)} denotes a single-value hypothesis, h ( ⋅ ) {\displaystyle h(\cdot )} denotes a bi-variate or multi-variate function and L ( ⋅ ) {\displaystyle L(\cdot )} denotes the loss function. === Pointwise approach === In this case, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score. Formally speaking, the pointwise approach aims at learning a function f ( x ) {\displaystyle f(x)} predicting the real-value or ordinal score of a document x {\displaystyle x} using the loss function L ( f ; x j , y j ) {\displaystyle L(f;x_{j},y_{j})} . A number of existing supervised machine learning algorithms can be readily used for this purpose. Ordinal regression and classification algorithms can also be used in pointwise approach when they are used to predict the score of a single query-document pair, and it takes a small, finite number of values. === Pairwise approach === In this case, the learning-to-rank problem is approximated by a classification problem — learning a binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} that can tell which document is better in a given pair of documents. The classifier shall take two documents as its input and the goal is to minimize a loss function L ( h ; x u , x v , y u , v ) {\displaystyle L(h;x_{u},x_{v},y_{u,v})} . The loss function typically reflects the number and magnitude of inversions in the induced ranking. In many cases, the binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} is implemented with a scoring function f ( x ) {\displaystyle f(x)} . As an example, RankNet adapts a probability model and defines h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} as the estimated probability of the document x u {\displaystyle x_{u}} has higher quality than x v {\displaystyle x_{v}} : P u , v ( f ) = CDF ( f ( x u ) − f ( x v ) ) , {\displaystyle P_{u,v}(f)={\text{CDF}

    Read more →
  • Michael Kohlhase

    Michael Kohlhase

    Michael Kohlhase (born 13 September 1964, in Erlangen) is a German computer scientist and professor at University of Erlangen–Nuremberg, where he is head of the KWARC research group (Knowledge Adaptation and Reasoning for Content). == Academic Positions == Michael Kohlhase is president of the OpenMath Society and a trustee of the Interest Group for Mathematical Knowledge Management (MKM). He was a trustee of the Conference on Automated Deduction and the CALCULEMUS Interest Group. He has been Conference Chair of CADE-21 and Program Chair of the KI-2006, MKM-2005, and CALCULEMUS-2000 conferences and has served on the Programme Committees of more than three dozen international conferences. Kohlhase holds an adjunct associate professorship at Carnegie Mellon University and was (2006–2008) vice director of the Department of Safe and Secure Cognitive Systems at German Research Centre for Artificial Intelligence (DFKI) Lab Bremen. In 2014, he became a member of the Global Digital Mathematics Library Working Group of the IMU. == Academic career == Michael Kohlhase obtained a degree in Mathematics (1989) from University of Bonn, a doctorate (1994) and habilitation (1999) in Computer Science at Saarland University. He has pursued his doctoral and post-doctoral research in extended research visits at Carnegie Mellon University, University of Amsterdam, the University of Edinburgh, and SRI International. From 2000–2003, he has conducted research and taught at the School of Computer Science at Carnegie Mellon University, where he was appointed to an adjunct associate professor. In September 2003 he was appointed as Professor of Computer Science at Jacobs University Bremen (International University Bremen until 2007), and 2006–2008 he was vice director of the Department of Safe and Secure Cognitive Systems of the German Research Centre for Artificial Intelligence (DFKI) Bremen. Since September 2016 he holds the Professorship for Knowledge Representation and Processing at University of Erlangen–Nuremberg. He has authored or edited four books and published almost 100 peer-reviewed papers. == Awards and Scholarships == 2000 3-year Heisenberg-Stipend of the Deutsche Forschungsgemeinschaft (DFG). 1996 AKI-prize, dissertation prize of the "Arbeitsgemeinschaft deutscher KI-Institute (AKI)" 1991 dissertation stipend of the Studienstiftung (German National Academic Foundation) 1986 masters stipend of Studienstiftung == Research interests == Michael Kohlhase's current research interests include Automated theorem proving and knowledge representation for mathematics, inference-based techniques for natural language processing and semantics, and computer-supported education. Much of his concrete work is based on web-based content markup formats like MathML, OpenMath, and OMDoc and systems for managing this data, e.g. semantic search engines for mathematical formulae, semantic extensions to LaTeX, or converting legacy LaTeX documents from the arXiv.

    Read more →
  • Translation unit

    Translation unit

    In the field of translation, a translation unit is a segment of a text which the translator treats as a single cognitive unit for the purposes of establishing an equivalence. It may be a single word, a phrase, one or more sentences, or even a larger unit. When a translator segments a text into translation units, the larger these units are, the better chance there is of obtaining an idiomatic translation. This is true not only of human translation, but also where human translators use computer-assisted translation, such as translation memories, and when translations are performed by machine translation systems. == Perceptions on the concept of unit == Vinay and Darbelnet took to Saussure's original concepts of the linguistic sign when beginning to discuss the idea of a single word as a translation unit. According to Saussure, the sign is naturally arbitrary, so it can only derive meaning from contrast in other signs in that same system. However, Russian scholar Leonid Barkhudarov stated that, limiting it to poetry, for instance, a translation unit can take the form of a complete text. This seems to relate to his conception that a translation unit is the smallest unit in the source language with an equivalent in the target one, and when its parts are taken individually, they become untranslatable; these parts can be as small as phonemes or morphemes, or as large as entire texts. Susan Bassnett widened Barkhudarov's poetry perception to include prose, adding that in this type of translation text is the prime unit, including the idea that sentence-by-sentence translation could cause loss of important structural features. Swiss linguist Werner Koller connected Barkhudarov's idea of unit sizing to the difference between the two languages involved, by stating that the more different or unrelated these languages were, the larger the unit would be. One final perception on the idea of unit came from linguist Eugene Nida. To him, translation units have a tendency to be small groups of language building up into sentences, thus forming what he called meaningful mouthfuls of language. == Points of view towards translation units == === Process-oriented POV === According to this point of view, a translation unit is a stretch of text on which attention is focused to be represented as a whole in the target language. In this point of view we can consider the concept of the think-aloud protocol, supported by German linguist Wolfgang Lörscher: isolating units using self-reports by translating subjects. It also relates to how experienced the translator in question is: language learners take a word as a translation unit, whereas experienced translators isolate and translate units of meaning in the form of phrases, clauses or sentences. Since 1996 and 2005 keylogging and eyetracking technologies were introduced in Translation Process Research. These more advanced and non-invasive research methods made it possible to elaborate a finer-grained assessment of translation units as loops of (source or target text) reading and target text typing. Loops of translation units are thought to be the basic units by which translations are produced. Thus, Malmkjaer, for instance, defines process oriented translation units as a “stretch of the source text that the translator keeps in mind at any one time, in order to produce translation equivalents in the text he or she is creating” (p. 286). Records of keystrokes and eye movements allow to investigate these mental constructs through their physical (observable) behavioral traces in the translation process data. Empirical Translation Process Research has deployed numerous theories to explain and models the behavioral traces of these assumed mental units. === Product-oriented POV === Here, the target-text unit can be mapped into an equivalent source-text unit. A case study on this matter was reported by Gideon Toury, in which 27 English-Hebrew student-produced translations were mapped onto a source text. Those students that were less experienced had larger numbers of small units at word and morpheme level in their translations, while one student with translation experience had approximately half of those units, mostly at phrase or clause level.

    Read more →
  • Markov chain

    Markov chain

    In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happens next depends only on the state of affairs now." A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). Markov processes are named in honor of the Russian mathematician Andrey Markov. Markov chains have many applications as statistical models of real-world processes. They provide the basis for general stochastic simulation methods known as Markov chain Monte Carlo, which are used for simulating sampling from complex probability distributions, and have found application in areas including Bayesian statistics, biology, chemistry, economics, finance, information theory, physics, signal processing, and speech processing. The adjectives Markovian and Markov are used to describe something that is related to a Markov process. == Principles == === Definition === A Markov process is a stochastic process that satisfies the Markov property (sometimes characterized as "memorylessness"). In simpler terms, it is a process for which predictions can be made regarding future outcomes based solely on its present state and—most importantly—such predictions are just as good as the ones that could be made knowing the process's full history. In other words, conditional on the present state of the system, its future and past states are independent. A Markov chain is a type of Markov process that has either a discrete state space or a discrete index set (often representing time), but the precise definition of a Markov chain varies. For example, it is common to define a Markov chain as a Markov process in either discrete or continuous time with a countable state space (thus regardless of the nature of time), but it is also common to define a Markov chain as having discrete time in either countable or continuous state space (thus regardless of the state space). === Types of Markov chains === The system's state space and time parameter index need to be specified. The following table gives an overview of the different instances of Markov processes for different levels of state space generality for both discrete and continuous time: Note that there is no definitive agreement in the literature on the use of some of the terms that signify special cases of Markov processes. Usually the term "Markov chain" is reserved for a process with a discrete set of times, that is, a discrete-time Markov chain (DTMC), but a few authors use the term "Markov process" to refer to a continuous-time Markov chain (CTMC) without explicit mention. In addition, there are other extensions of Markov processes that are referred to as such but do not necessarily fall within any of these four categories (see Markov model). Moreover, the time index need not necessarily be real-valued; like with the state space, there are conceivable processes that move through index sets with other mathematical constructs. Notice that the general state space continuous-time Markov chain is general to such a degree that it has no designated term. While the time parameter is usually discrete, the state space of a Markov chain does not have any generally agreed-on restrictions: the term may refer to a process on an arbitrary state space. However, many applications of Markov chains employ finite or countably infinite state spaces, which have a more straightforward statistical analysis. Besides time-index and state-space parameters, there are many other variations, extensions and generalizations (see Variations). For simplicity, most of this article concentrates on the discrete-time, discrete state-space case, unless mentioned otherwise. === Transitions === The changes of state of the system are called transitions. The probabilities associated with various state changes are called transition probabilities. The process is characterized by a state space, a transition matrix describing the probabilities of particular transitions, and an initial state (or initial distribution) across the state space. By convention, we assume all possible states and transitions have been included in the definition of the process, so there is always a next state, and the process does not terminate. A discrete-time random process involves a system which is in a certain state at each step, with the state changing randomly between steps. The steps are often thought of as moments in time, but they can equally well refer to physical distance or any other discrete measurement. Formally, the steps are the integers or natural numbers, and the random process is a mapping of these to states. The Markov property states that the conditional probability distribution for the system at the next step (and in fact at all future steps) depends only on the current state of the system, and not additionally on the state of the system at previous steps. Since the system changes randomly, it is generally impossible to predict with certainty the state of a Markov chain at a given point in the future. However, the statistical properties of the system's future can be predicted. In many applications, it is these statistical properties that are important. == History == Andrey Markov studied Markov processes in the early 20th century, publishing his first paper on the topic in 1906. Markov processes in continuous time were discovered long before his work in the early 20th century in the form of the Poisson process. Markov was interested in studying an extension of independent random sequences, motivated by a disagreement with Pavel Nekrasov who claimed independence was necessary for the weak law of large numbers to hold. In his first paper on Markov chains, published in 1906, Markov showed that under certain conditions the average outcomes of the Markov chain would converge to a fixed vector of values, so proving a weak law of large numbers without the independence assumption, which had been commonly regarded as a requirement for such mathematical laws to hold. Markov later used Markov chains to study the distribution of vowels in Eugene Onegin, written by Alexander Pushkin, and proved a central limit theorem for such chains. In 1912 Henri Poincaré studied Markov chains on finite groups with an aim to study card shuffling. Other early uses of Markov chains include a diffusion model, introduced by Paul and Tatyana Ehrenfest in 1907, and a branching process, introduced by Francis Galton and Henry William Watson in 1873, preceding the work of Markov. After the work of Galton and Watson, it was later revealed that their branching process had been independently discovered and studied around three decades earlier by Irénée-Jules Bienaymé. Starting in 1928, Maurice Fréchet became interested in Markov chains, eventually resulting in him publishing in 1938 a detailed study on Markov chains. Andrey Kolmogorov developed in a 1931 paper a large part of the early theory of continuous-time Markov processes. Kolmogorov was partly inspired by Louis Bachelier's 1900 work on fluctuations in the stock market as well as Norbert Wiener's work on Einstein's model of Brownian movement. He introduced and studied a particular set of Markov processes known as diffusion processes, where he derived a set of differential equations describing the processes. Independent of Kolmogorov's work, Sydney Chapman derived in a 1928 paper an equation, now called the Chapman–Kolmogorov equation, in a less mathematically rigorous way than Kolmogorov, while studying Brownian movement. The differential equations are now called the Kolmogorov equations or the Kolmogorov–Chapman equations. Other mathematicians who contributed significantly to the foundations of Markov processes include William Feller, starting in 1930s, and then later Eugene Dynkin, starting in the 1950s. == Examples == Mark V. Shaney is a third-order Markov chain program, and a Markov text generator. It ingests the sample text (the Tao Te Ching, or the posts of a Usenet group) and creates a massive list of every sequence of three successive words (triplet) which occurs in the text. It then chooses two words at random, and looks for a word which follows those two in one of the triplets in its massive list. If there is more than one, it picks at random (identical triplets count separately, so a sequence which occurs twice is twice as likely to be picked as one which only occurs once). It then adds that word to the generated text. Then, in the same way, it picks a triplet that starts with the second and third words in the generated text, and that gives a fourth word. It adds the fourth word, then repeats with the third and fourth words, and so on. Random walks based on integers and the gambler's ruin problem are ex

    Read more →
  • Automated attendant

    Automated attendant

    In telephony, an automated attendant (also auto attendant, auto-attendant, autoattendant, automatic phone menus, AA, or virtual receptionist) allows callers to be automatically transferred to an extension without the intervention of an operator/receptionist. Many AAs will also offer a simple menu system ("for sales, press 1, for service, press 2," etc.). An auto attendant may also allow a caller to reach a live operator by dialing a number, usually "0". Typically the auto attendant is included in a business's phone system such as a PBX, but some services allow businesses to use an AA without such a system. Modern AA services (which now overlap with more complicated interactive voice response or IVR systems) can route calls to mobile phones, VoIP virtual phones, other AAs/IVRs, or other locations using traditional land-line phones or voice message machines. == Feature description == Telephone callers will recognize an automated attendant system as one that greets calls incoming to an organization with a recorded greeting of the form, "Thank you for calling .... If you know your party's extension, you may dial it any time during this message." Callers who have a touch-tone (DTMF) phone can dial an extension number or, in most cases, wait for operator ("attendant") assistance. Since the telephone network does not transmit the DC signals from rotary dial telephones (except for audible clicks), callers who have rotary dial phones have to wait for assistance. On a purely technical level it could be argued that an automated attendant is a very simple kind of IVR however, in the telecom industry the terms IVR and auto attendant are generally considered distinct. An automated attendant serves a very specific purpose (replace live operator and route calls), whereas an IVR can perform all sorts of functions (telephone banking, account inquiries, etc.). An AA will often include a directory which will allow a caller to dial by name in order to find a user on a system. There is no standard format to these directories, and they can use combinations of first name, last name, or both. The following lists common routing steps that are components of an automated attendant: Transfer to extension Transfer to voicemail Play message (i.e., "our address is ...") Go to a sub-menu Repeat choices In addition, an automated attendant would be expected to have values for the following: '0' – where to go when the caller dials '0' Timeout – what to do if the caller does nothing (usually go to the same place as '0') Default mailbox – where to send calls if '0' is not answered (or is not pointing to a live person) == Background == PBXs (private branch exchanges) or PABXs (private automatic branch exchanges) are telephone systems that serve an organization that has many telephone extensions but fewer telephone lines (sometimes called "trunks") that connect that organization to the rest of the global telecommunications network. While persons within an enterprise served by a PBX can call each other by dialing their extension numbers, incoming calls, i.e., calls originating from a telephone not served by the PBX but intended for a party served by the PBX, required assistance from a switchboard operator (also called a "switchboard attendant") or a telephone service called DID ("direct inward dialing"). Direct inward dialing has advantages such as rapid connection to the destination party and disadvantages including cost, lack of identification of the called organization and use of ten-digit telephone numbers. Automated attendants provide, among many other things, a way for an external caller to be directed to an extension or department served by a PBX system without using direct inward dialing or without switchboard attendant assistance. == History == Automated attendants are not part of voicemail systems. Voice messaging (or voicemail or VM) technology has existed since the late 1970s; in the early 1980s companies provided voice-prompting systems that allowed callers to reach (route the call) to an intended party, not necessarily to leave a message. Automated attendant systems are also referred to as automated menu systems and much early work in this field was done by Michael J. Freeman, Ph.D. == Time-based routing == Many auto attendants will have options to allow for time-of-day routing, as well as weekend and holiday routing. The specifics of these features will depend entirely on the particular automated attendant, but typically there would be a normal greeting and routing steps that would take place during normal business hours, and a different greeting and routing for non-business hours.

    Read more →
  • AI Resume Builders Reviews: What Actually Works in 2026

    AI Resume Builders Reviews: What Actually Works in 2026

    Shopping for the best AI resume builder? An AI resume builder is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI resume builder slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • Is an AI Photo Editor Worth It in 2026?

    Is an AI Photo Editor Worth It in 2026?

    Shopping for the best AI photo editor? An AI photo editor is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI photo editor slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →