AI Grammar Rephrase Online Free

AI Grammar Rephrase Online Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Quantification (machine learning)

    Quantification (machine learning)

    In machine learning, quantification (variously called learning to quantify, or supervised prevalence estimation, or class prior estimation) is the task of using supervised learning in order to train models (quantifiers) that estimate the relative frequencies (also known as prevalence values) of the classes of interest in a sample of unlabelled data items. For instance, in a sample of 100,000 unlabelled tweets known to express opinions about a certain political candidate, a quantifier may be used to estimate the percentage of these tweets which belong to class `Positive' (i.e., which manifest a positive stance towards this candidate), and to do the same for classes `Neutral' and `Negative'. Quantification may also be viewed as the task of training predictors that estimate a (discrete) probability distribution, i.e., that generate a predicted distribution that approximates the unknown true distribution of the items across the classes of interest. Quantification is different from classification, since the goal of classification is to predict the class labels of individual data items, while the goal of quantification it to predict the class prevalence values of sets of data items. Quantification is also different from regression, since in regression the training data items have real-valued labels, while in quantification the training data items have class labels. It has been shown in multiple research works that performing quantification by classifying all unlabelled instances and then counting the instances that have been attributed to each class (the 'classify and count' method) usually leads to suboptimal quantification accuracy. This suboptimality may be seen as a direct consequence of 'Vapnik's principle', which states: If you possess a restricted amount of information for solving some problem, try to solve the problem directly and never solve a more general problem as an intermediate step. It is possible that the available information is sufficient for a direct solution but is insufficient for solving a more general intermediate problem. In our case, the problem to be solved directly is quantification, while the more general intermediate problem is classification. As a result of the suboptimality of the 'classify and count' method, quantification has evolved as a task in its own right, different (in goals, methods, techniques, and evaluation measures) from classification. == Quantification tasks == === Quantification tasks according to the set of classes === The main variants of quantification, according to the characteristics of the set of classes used, are: Binary quantification, corresponding to the case in which there are only n = 2 {\displaystyle n=2} classes and each data item belongs to exactly one of them; Single-label multiclass quantification, corresponding to the case in which there are n > 2 {\displaystyle n>2} classes and each data item belongs to exactly one of them; Multi-label multiclass quantification, corresponding to the case in which there are n ≥ 2 {\displaystyle n\geq 2} classes and each data item can belong to zero, one, or several classes at the same time; Ordinal quantification, corresponding to the single-label multiclass case in which a total order is defined on the set of classes. Regression quantification, a task which stands to 'standard' quantification as regression stands to classification. Strictly speaking, this task is not a quantification task as defined above (since the individual items do not have class labels but are labelled by real values), but has enough commonalities with other quantification tasks to be considered one of them. Most known quantification methods address the binary case or the single-label multiclass case, and only few of them address the multi-label, ordinal, and regression cases. Binary-only methods include the Mixture Model (MM) method, the HDy method, SVM(KLD), and SVM(Q). Methods that can deal with both the binary case and the single-label multiclass case include probabilistic classify and count (PCC), adjusted classify and count (ACC), probabilistic adjusted classify and count (PACC), the Saerens-Latinne-Decaestecker EM-based method (SLD), and KDEy. Methods for multi-label quantification include regression-based quantification (RQ) and label powerset-based quantification (LPQ). Methods for the ordinal case include ordinal versions of the above-mentioned ACC, PACC, and SLD methods, and ordinal versions of the above-mentioned HDy method. Methods for the regression case include Regress and splice and Adjusted regress and sum. === Quantification tasks according to the type of data === Several subtasks of quantification may be identified according to the type of data involved. Example such tasks are: Quantification of networked data. This task consists of performing quantification when the datapoints are members of a relation, i.e., are interlinked. As such, this task is a strict relative of collective classification. Quantification over time. This task consists of performing quantification on sets that become available in a temporal sequence, i.e., as a data stream, and finds application in contexts in which class prevalence values must be monitored over time. == Evaluation measures for quantification == Several evaluation measures can be used for evaluating the error of a quantification method. Since quantification consists of generating a predicted probability distribution that estimates a true probability distribution, these evaluation measures are ones that compare two probability distributions. Most evaluation measures for quantification belong to the class of divergences. Evaluation measures for binary quantification, single-label multiclass quantification, and multi-label quantification, are Absolute Error Squared Error Relative Absolute Error Kullback–Leibler divergence Pearson Divergence Evaluation measures for ordinal quantification are Normalized Match Distance (a particular case of the Earth Mover's Distance) Root Normalized Order-Aware Distance == Applications == Quantification is of special interest in fields such as the social sciences, epidemiology, market research, allocating resources, and ecological modelling, since these fields are inherently concerned with aggregate data. However, quantification is also useful as a building block for solving other downstream tasks, such as improving the accuracy of classifiers on out-of-distribution data, measuring classifier bias and ranker bias, and estimating the accuracy of classifiers on out-of-distribution data. == Resources == LQ 2021: the 1st International Workshop on Learning to Quantify LQ 2022: the 2nd International Workshop on Learning to Quantify LQ 2023: the 3rd International Workshop on Learning to Quantify LQ 2024: the 4th International Workshop on Learning to Quantify LQ 2025: the 5th International Workshop on Learning to Quantify LeQua 2022: the 1st Data Challenge on Learning to Quantify LeQua 2024: the 2nd Data Challenge on Learning to Quantify QuaPy: An open-source Python-based software library for quantification QuantificationLib: A Python library for quantification and prevalence estimation

    Read more →
  • Alberto Broggi

    Alberto Broggi

    Alberto Broggi is General Manager at VisLab srl (spinoff of the University of Parma acquired by Silicon-Valley company Ambarella Inc. in June 2015) and a professor of Computer Engineering at the University of Parma in Italy. == Research in computer vision, hardware, and AV == Broggi's research activities started in 1991–1994. His group together with the Dipartimento di Elettronica, Politecnico di Torino, Italy, built their own hardware architecture (named PAPRICA, for PArallel PRocessor for Image Checking and Analysis, based on 256 single-bit processing elements working in SIMD fashion) and installed it on board of a mobile laboratory (Mob-Lab) to develop and test some initial concepts in the field of intelligent vehicles. In 1996, Broggi's group worked to develop a real vehicle prototype (named ARGO, a Lancia Thema passenger car which was equipped with vision sensors, processing systems, and vehicle actuators) and developed the necessary software and hardware that made it able to drive autonomously on standard roads. Broggi's research group (called VisLab from then on) gathered all their findings in a book, which was then also translated in Chinese. When Broggi was with the University of Pavia, his research was extended and applied to extreme conditions (automatic driving on snow and ice): in 2001, VisLab led the research effort of providing a vehicle (RAS, Robot Antartico di Superficie) with sensing capabilities so that it was able to automatically follow the vehicle in front. In 2010 Broggi's group embarked on driving 4 vehicles autonomously from Italy to China with no human intervention. This challenge is called VIAC, for VisLab Intercontinental Autonomous Challenge . Soon after this, Broggi was awarded a second ERC grant (Proof of concept) to industrialize some of the results obtained and successfully tested on the VIAC vehicles. On July 12, 2013, VisLab tested the BRAiVE vehicle in downtown Parma, negotiating two-way narrow rural roads, pedestrian crossings, traffic lights, artificial bumps, pedestrian areas, and tight roundabouts. The vehicle traveled from Parma University Campus up to Piazza della Pilotta (downtown Parma): a 20 minutes run in a real environment, together with real traffic at 11am on a working day, that required absolutely no human intervention. Part of this test was driven with nobody in the driver seat, for the first time ever on public roads.

    Read more →
  • Markov information source

    Markov information source

    In mathematics, a Markov information source, or simply, a Markov source, is an information source whose underlying dynamics are given by a stationary finite Markov chain. == Formal definition == An information source is a sequence of random variables ranging over a finite alphabet Γ {\displaystyle \Gamma } , having a stationary distribution. A Markov information source is then a (stationary) Markov chain M {\displaystyle M} , together with a function f : S → Γ {\displaystyle f:S\to \Gamma } that maps states S {\displaystyle S} in the Markov chain to letters in the alphabet Γ {\displaystyle \Gamma } . A unifilar Markov source is a Markov source for which the values f ( s k ) {\displaystyle f(s_{k})} are distinct whenever each of the states s k {\displaystyle s_{k}} are reachable, in one step, from a common prior state. Unifilar sources are notable in that many of their properties are far more easily analyzed, as compared to the general case. == Applications == Markov sources are commonly used in communication theory, as a model of a transmitter. Markov sources also occur in natural language processing, where they are used to represent hidden meaning in a text. Given the output of a Markov source, whose underlying Markov chain is unknown, the task of solving for the underlying chain is undertaken by the techniques of hidden Markov models, such as the Viterbi algorithm.

    Read more →
  • AI Avatar Generators Reviews: What Actually Works in 2026

    AI Avatar Generators Reviews: What Actually Works in 2026

    Shopping for the best AI avatar generator? An AI avatar generator is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI avatar generator slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Feature (machine learning)

    Feature (machine learning)

    In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a data set. Choosing informative, discriminating, and independent features is crucial to producing effective algorithms for pattern recognition, classification, and regression tasks. Features are usually numeric, but other types such as strings and graphs are used in syntactic pattern recognition, after some pre-processing step such as one-hot encoding. The concept of "features" is related to that of explanatory variables used in statistical techniques such as linear regression. == Feature types == In feature engineering, two types of features are commonly used: numerical and categorical. Numerical features are continuous values that can be measured on a scale. Examples of numerical features include age, height, weight, and income. Numerical features can be used in machine learning algorithms directly. Categorical features are discrete values that can be grouped into categories. Examples of categorical features include gender, color, and zip code. Categorical features typically need to be converted to numerical features before they can be used in machine learning algorithms. This can be done using a variety of techniques, such as one-hot encoding, label encoding, and ordinal encoding. The type of feature that is used in feature engineering depends on the specific machine learning algorithm that is being used. Some machine learning algorithms, such as decision trees, can handle both numerical and categorical features. Other machine learning algorithms, such as linear regression, can only handle numerical features. == Classification == A numeric feature can be conveniently described by a feature vector. One way to achieve binary classification is using a linear predictor function (related to the perceptron) with a feature vector as input. The method consists of calculating the scalar product between the feature vector and a vector of weights, qualifying those observations whose result exceeds a threshold. Algorithms for classification from a feature vector include nearest neighbor classification, neural networks, and statistical techniques such as Bayesian approaches. == Examples == In character recognition, features may include histograms counting the number of black pixels along horizontal and vertical directions, number of internal holes, stroke detection and many others. In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches, logarithmic Mel-scale spectral vectors and Mel-frequency cepstral coefficients, which represent the frequency characteristics of audio signals. In spam detection algorithms, features may include the presence or absence of certain email headers, the email structure, the language, the frequency of specific terms, the grammatical correctness of the text. In computer vision, there are a large number of possible features, such as edges and objects. == Feature vectors == In pattern recognition and machine learning, a feature vector is an n-dimensional vector of numerical features that represent some object. Many algorithms in machine learning require a numerical representation of objects, since such representations facilitate processing and statistical analysis. When representing images, the feature values might correspond to the pixels of an image, while when representing texts the features might be the frequencies of occurrence of textual terms. Feature vectors are equivalent to the vectors of explanatory variables used in statistical procedures such as linear regression. Feature vectors are often combined with weights using a dot product in order to construct a linear predictor function that is used to determine a score for making a prediction. The vector space associated with these vectors is often called the feature space. In order to reduce the dimensionality of the feature space, a number of dimensionality reduction techniques can be employed. Higher-level features can be obtained from already available features and added to the feature vector; for example, for the study of diseases the feature 'Age' is useful and is defined as Age = 'Year of death' minus 'Year of birth' . This process is referred to as feature construction. Feature construction is the application of a set of constructive operators to a set of existing features resulting in construction of new features. Examples of such constructive operators include checking for the equality conditions {=, ≠}, the arithmetic operators {+,−,×, /}, the array operators {max(S), min(S), average(S)} as well as other more sophisticated operators, for example count(S, C) that counts the number of features in the feature vector S satisfying some condition C or, for example, distances to other recognition classes generalized by some accepting device. Feature construction has long been considered a powerful tool for increasing both accuracy and understanding of structure, particularly in high-dimensional problems. Applications include studies of disease and emotion recognition from speech. == Selection and extraction == The initial set of raw features can be redundant and large enough that estimation and optimization is made difficult or ineffective. Therefore, a preliminary step in many applications of machine learning and pattern recognition consists of selecting a subset of features, or constructing a new and reduced set of features to facilitate learning, and to improve generalization and interpretability. Extracting or selecting features is a combination of art and science; developing systems to do so is known as feature engineering. It requires the experimentation of multiple possibilities and the combination of automated techniques with the intuition and knowledge of the domain expert. Automating this process is feature learning, where a machine not only uses features for learning, but learns the features itself.

    Read more →
  • Michael Kohlhase

    Michael Kohlhase

    Michael Kohlhase (born 13 September 1964, in Erlangen) is a German computer scientist and professor at University of Erlangen–Nuremberg, where he is head of the KWARC research group (Knowledge Adaptation and Reasoning for Content). == Academic Positions == Michael Kohlhase is president of the OpenMath Society and a trustee of the Interest Group for Mathematical Knowledge Management (MKM). He was a trustee of the Conference on Automated Deduction and the CALCULEMUS Interest Group. He has been Conference Chair of CADE-21 and Program Chair of the KI-2006, MKM-2005, and CALCULEMUS-2000 conferences and has served on the Programme Committees of more than three dozen international conferences. Kohlhase holds an adjunct associate professorship at Carnegie Mellon University and was (2006–2008) vice director of the Department of Safe and Secure Cognitive Systems at German Research Centre for Artificial Intelligence (DFKI) Lab Bremen. In 2014, he became a member of the Global Digital Mathematics Library Working Group of the IMU. == Academic career == Michael Kohlhase obtained a degree in Mathematics (1989) from University of Bonn, a doctorate (1994) and habilitation (1999) in Computer Science at Saarland University. He has pursued his doctoral and post-doctoral research in extended research visits at Carnegie Mellon University, University of Amsterdam, the University of Edinburgh, and SRI International. From 2000–2003, he has conducted research and taught at the School of Computer Science at Carnegie Mellon University, where he was appointed to an adjunct associate professor. In September 2003 he was appointed as Professor of Computer Science at Jacobs University Bremen (International University Bremen until 2007), and 2006–2008 he was vice director of the Department of Safe and Secure Cognitive Systems of the German Research Centre for Artificial Intelligence (DFKI) Bremen. Since September 2016 he holds the Professorship for Knowledge Representation and Processing at University of Erlangen–Nuremberg. He has authored or edited four books and published almost 100 peer-reviewed papers. == Awards and Scholarships == 2000 3-year Heisenberg-Stipend of the Deutsche Forschungsgemeinschaft (DFG). 1996 AKI-prize, dissertation prize of the "Arbeitsgemeinschaft deutscher KI-Institute (AKI)" 1991 dissertation stipend of the Studienstiftung (German National Academic Foundation) 1986 masters stipend of Studienstiftung == Research interests == Michael Kohlhase's current research interests include Automated theorem proving and knowledge representation for mathematics, inference-based techniques for natural language processing and semantics, and computer-supported education. Much of his concrete work is based on web-based content markup formats like MathML, OpenMath, and OMDoc and systems for managing this data, e.g. semantic search engines for mathematical formulae, semantic extensions to LaTeX, or converting legacy LaTeX documents from the arXiv.

    Read more →
  • Cognitive Technologies

    Cognitive Technologies

    Cognitive Technologies is a Russian software corporation that develops corporate business applications, AI-based advanced driver assistance systems. Founded in 1993 in Moscow (Russia), the company has offices in Eastern Europe, with R&D Centers in Russia. == History == Cognitive Technologies was founded in 1993 by Olga Uskova and Vladimir Arlazarov. The first employees previously worked in the team that developed the first world computer chess champion "Kaissa". The first programs developed by Cognitive Technologies were optical image and character recognition software – Tiger and CuneiForm. In February 2015 Cognitive Technologies and Kamaz, Russian Dakar Rally-winning truck manufacturer, started working on the self-driving Kamaz truck project. The first field tests took place in June 2015. In 2015 Andrey Chernogorov was appointed CEO of the company. == Products == Cognitive Technologies develops business application software and self-driving vehicle artificial intelligence. The main products are: C-pilot, AI-based ADAS E1 Evfrat – electronic workflow system CognitiveLot – e-purchasing systems == Cooperation with global companies == Under the contract signed between Cognitive Technologies and Hewlett-Packard, all scanners sold in Russia had text recognition software developed by Cognitive Technologies. It was the first contract with HP for an Eastern European company. Afterwards, Cognitive Technologies signed OEM contracts and business agreements with several global IT-companies, including IBM, Canon, Corel, Samsung, Xerox, Brother, Epson, and Olivetti. In 1998 Cognitive Technologies became the first company in Eastern Europe to get the Oracle Complementary Software Provider status. In 2001 Cognitive Technologies sold its Russian language speech corpus to Intel. In 2010 Cognitive Technologies sold its text parsing module to Yandex. The company also signed an agreement with NVIDIA join efforts in the development of intelligent document recognition technologies. == Self-driving car project == The system developed by Cognitive Technologies does not require building smart cities and smart roads equipped with multiple sensors – it works the opposite way, trying to understand the situation on the road like humans do. The system uses a video camera like a driver who uses his eyes, analyzing the information and focusing on the relevant data. For this purpose the system uses a special type of computer vision – foveal computer vision. Only 5–7% of the data gathered by the video cameras and sensors is processed by the system as relevant. The prototype is being tested in Russia on rough roads, on roads without marking, with the goal to prepare the system for work in difficult situations and on bad roads all around the world. == C-Pilot ADAS project == In August 2016 Cognitive Technologies started its own ADAS development project C-Pilot for ground transport control automation. == Self-driving tractors and harvesters project == The experts from Cognitive Technologies claim that the system will track stones, poles, and other obstacles that might be dangerous for the vehicles. This data will enable the engineers to develop an interactive field map, with GPS coordinates for stones and other obstacles. Eventually, this will result in an alteration of the harvester's movement pattern preventing it from running into stones or other objects that may inflict damage. Harvesters will work autonomously on the field, on the territory that is narrowed by radio beacons. == Present international activities == In 2016 Cognitive Technologies has joined the international community OpenPower Foundation, a consortium of open source solutions to developers based on POWER technology from IBM, which includes the world's leading IT map of Google, NVidia, Mellanox, etc. Within the consortium Cognitive Technologies is the initiator of forming of an international working group to develop a single software standard for the self-driving vehicle control. == Awards == In 2016, the leading Russian business newspaper Kommersant, announced that Cognitive Technologies is the TOP-2 Russian software company. TOP-6 Russian software company in 2015 according to Russoft TOP-500 biggest Russian companies according to RBC TOP-2 company of the Russian EDMS market in 2014 according to IDC TOP-20 Russian biggest IT-companies in 2013 according to Cnews Analytics

    Read more →
  • AI Video Generators Reviews: What Actually Works in 2026

    AI Video Generators Reviews: What Actually Works in 2026

    Comparing the best AI video generator? An AI video generator is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI video generator slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.

    Read more →
  • PCVC Speech Dataset

    PCVC Speech Dataset

    The PCVC (Persian Consonant Vowel Combination) Speech Dataset is a Modern Persian speech corpus for speech recognition and also speaker recognition. The dataset contains sound samples of Modern Persian combination of vowel and consonant phonemes from different speakers. Every sound sample contains just one consonant and one vowel So it is somehow labeled in phoneme level. This dataset consists of 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker). The sample rate of all speech samples is 48000 which means there are 48000 sound samples in every 1 second. Every sound sample starts with consonant then continues with vowel. In each sample, in average, 0.5 second of each sample is speech and the rest is silence. Each sound sample ends with silence. All of sound samples are denoised with "Adaptive noise reduction" algorithm. Compared to Farsdat speech dataset and Persian speech corpus it is more easy to use because it is prepared in .mat data files. Also it is more based on phoneme based separation and all samples are denoised. == Contents == The corpus is downloadable from its Kaggle web page, and contains the following: .mat data files of sound samples in a 23630000 matrix, in which 23 is number of consonants, 6 is the number of vowels and 30000 is the length of sound sample.

    Read more →
  • Trigram tagger

    Trigram tagger

    In computational linguistics, a trigram tagger is a statistical method for automatically identifying words as being nouns, verbs, adjectives, adverbs, etc. based on second order Markov models that consider triples of consecutive words. It is trained on a text corpus as a method to predict the next word, taking the product of the probabilities of unigram, bigram and trigram. In speech recognition, algorithms utilizing trigram-tagger score better than those algorithms utilizing IIMM tagger but less well than Net tagger. The description of the trigram tagger is provided by Brants (2000).

    Read more →
  • The Best Free AI Humanizer for Beginners

    The Best Free AI Humanizer for Beginners

    Comparing the best AI humanizer? An AI humanizer is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI humanizer slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Kunihiko Fukushima

    Kunihiko Fukushima

    Kunihiko Fukushima (Japanese: 福島 邦彦, born 16 March 1936) is a Japanese computer scientist, most noted for his work on artificial neural networks and deep learning. He is currently working part-time as a senior research scientist at the Fuzzy Logic Systems Institute in Fukuoka, Japan. == Notable scientific achievements == In 1980, Fukushima published the neocognitron, the original deep convolutional neural network (CNN) architecture. Fukushima proposed several supervised and unsupervised learning algorithms to train the parameters of a deep neocognitron such that it could learn internal representations of incoming data. Today, however, the CNN architecture is usually trained through backpropagation. This approach is now heavily used in computer vision. In 1969 Fukushima introduced the ReLU (Rectifier Linear Unit) activation function in the context of visual feature extraction in hierarchical neural networks, which he called "analog threshold element". (Though the ReLU was first used by Alston Householder in 1941 as a mathematical abstraction of biological neural networks.) As of 2017 it is the most popular activation function for deep neural networks. == Education and career == In 1958, Fukushima received his Bachelor of Engineering in electronics from Kyoto University. He became a senior research scientist at the NHK Science & Technology Research Laboratories. In 1989, he joined the faculty of Osaka University. In 1999, he joined the faculty of the University of Electro-Communications. In 2001, he joined the faculty of Tokyo University of Technology. From 2006 to 2010, he was a visiting professor at Kansai University. Fukushima acted as founding president of the Japanese Neural Network Society (JNNS). He also was a founding member on the board of governors of the International Neural Network Society (INNS), and president of the Asia-Pacific Neural Network Assembly (APNNA). He was one of the board of governors of the International Neural Network Society (INNS) in 1989-1990 and 1993-2005. == Awards == In 2020, Fukushima received the Bower Award and Prize for Achievement in Science. In 2022, Fukushima became a laureate of the Asian Scientist 100 by the Asian Scientist. He also received the IEICE Achievement Award and Excellent Paper Awards, the IEEE Neural Networks Pioneer Award, the APNNA Outstanding Achievement Award, the JNNS Excellent Paper Award and the INNS Helmholtz Award.

    Read more →
  • Learning to rank

    Learning to rank

    Learning to rank (LTR) or machine-learned ranking (MLR) is the application of machine learning, often supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval and recommender systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data. == Applications == === In information retrieval === Ranking is a central part of many information retrieval problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising. A possible architecture of a machine-learned search engine is shown in the accompanying figure. Training data consists of queries and documents matching them together with the relevance degree of each match. It may be prepared manually by human assessors (or raters, as Google calls them), who check results for some queries and determine relevance of each result. It is not feasible to check the relevance of all documents, and so typically a technique called pooling is used — only the top few documents, retrieved by some existing ranking models are checked. This technique may introduce selection bias. Alternatively, training data may be derived automatically by analyzing clickthrough logs (i.e. search results which got clicks from users), query chains, or such search engines' features as Google's (since-replaced) SearchWiki. Clickthrough logs can be biased by the tendency of users to click on the top search results on the assumption that they are already well-ranked. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. Typically, users expect a search query to complete in a short time (such as a few hundred milliseconds for web search), which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the vector space model, Boolean model, weighted AND, or BM25. This phase is called top- k {\displaystyle k} document retrieval and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes. In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents. === In other areas === Learning to rank algorithms have been applied in areas other than information retrieval: In machine translation for ranking a set of hypothesized translations; In computational biology for ranking candidate 3-D structures in protein structure prediction problems; In recommender systems for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article. == Feature vectors == For the convenience of MLR algorithms, query-document pairs are usually represented by numerical vectors, which are called feature vectors. Such an approach is sometimes called bag of features and is analogous to the bag of words model and vector space model used in information retrieval for representation of documents. Components of such vectors are called features, factors or ranking signals. They may be divided into three groups (features from document retrieval are shown as examples): Query-independent or static features — those features, which depend only on the document, but not on the query. For example, PageRank or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's static quality score (or static rank), which is often used to speed up search query evaluation. Query-dependent or dynamic features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions. Query-level features or query features, which depend only on the query. For example, the number of words in a query. Some examples of features, which were used in the well-known LETOR dataset: TF, TF-IDF, BM25, and language modeling scores of document's zones (title, body, anchors text, URL) for a given query; Lengths and IDF sums of document's zones; Document's PageRank, HITS ranks and their variants. Selecting and designing good features is an important area in machine learning, which is called feature engineering. == Evaluation measures == There are several measures (metrics) which are commonly used to judge how well an algorithm is doing on training data and to compare the performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics. Examples of ranking quality measures: Mean average precision (MAP); DCG and NDCG; Precision@n, NDCG@n, where "@n" denotes that the metrics are evaluated only on top n documents; Mean reciprocal rank; Kendall's tau; Spearman's rho. DCG and its normalized variant NDCG are usually preferred in academic research when multiple levels of relevance are used. Other metrics such as MAP, MRR and precision, are defined only for binary judgments. Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric: Expected reciprocal rank (ERR); Yandex's pfound. Both of these metrics are based on the assumption that the user is more likely to stop looking at search results after examining a more relevant document, than after a less relevant document. == Approaches == Learning to Rank approaches are often categorized using one of three approaches: pointwise (where individual documents are ranked), pairwise (where pairs of documents are ranked into a relative order), and listwise (where an entire list of documents are ordered). Tie-Yan Liu of Microsoft Research Asia has analyzed existing algorithms for learning to rank problems in his book Learning to Rank for Information Retrieval. He categorized them into three groups by their input spaces, output spaces, hypothesis spaces (the core function of the model) and loss functions: the pointwise, pairwise, and listwise approach. In practice, listwise approaches often outperform pairwise approaches and pointwise approaches. This statement was further supported by a large scale experiment on the performance of different learning-to-rank methods on a large collection of benchmark data sets. In this section, without further notice, x {\displaystyle x} denotes an object to be evaluated, for example, a document or an image, f ( x ) {\displaystyle f(x)} denotes a single-value hypothesis, h ( ⋅ ) {\displaystyle h(\cdot )} denotes a bi-variate or multi-variate function and L ( ⋅ ) {\displaystyle L(\cdot )} denotes the loss function. === Pointwise approach === In this case, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score. Formally speaking, the pointwise approach aims at learning a function f ( x ) {\displaystyle f(x)} predicting the real-value or ordinal score of a document x {\displaystyle x} using the loss function L ( f ; x j , y j ) {\displaystyle L(f;x_{j},y_{j})} . A number of existing supervised machine learning algorithms can be readily used for this purpose. Ordinal regression and classification algorithms can also be used in pointwise approach when they are used to predict the score of a single query-document pair, and it takes a small, finite number of values. === Pairwise approach === In this case, the learning-to-rank problem is approximated by a classification problem — learning a binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} that can tell which document is better in a given pair of documents. The classifier shall take two documents as its input and the goal is to minimize a loss function L ( h ; x u , x v , y u , v ) {\displaystyle L(h;x_{u},x_{v},y_{u,v})} . The loss function typically reflects the number and magnitude of inversions in the induced ranking. In many cases, the binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} is implemented with a scoring function f ( x ) {\displaystyle f(x)} . As an example, RankNet adapts a probability model and defines h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} as the estimated probability of the document x u {\displaystyle x_{u}} has higher quality than x v {\displaystyle x_{v}} : P u , v ( f ) = CDF ( f ( x u ) − f ( x v ) ) , {\displaystyle P_{u,v}(f)={\text{CDF}

    Read more →
  • Radford M. Neal

    Radford M. Neal

    Radford M. Neal (born September 12, 1956) is a professor emeritus at the Department of Statistics and Department of Computer Science at the University of Toronto, where he held a Canada research chair in statistics and machine learning. == Education and career == Neal studied computer science at the University of Calgary, where he received his B.Sc. in 1977 and M.Sc. in 1980, with thesis work supervised by David Hill. He worked for several years as a sessional instructor at the University of Calgary and as a statistical consultant in the industry before coming back to the academia. Neal continued his study at the University of Toronto, where he received his Ph.D. in 1995 under the supervision of Geoffrey Hinton. Neal became an assistant professor at the University of Toronto in 1995, an associated professor in 1999 and a full professor since 2001. He was the Canada Research Chair in Statistics and Machine Learning from 2003 to 2016 and retired in 2017. Neal has made great contributions in the area of machine learning and statistics, where he is particularly well known for his work on Markov chain Monte Carlo, error correcting codes and Bayesian learning for neural networks. He is also known for his blog and as the developer of pqR: a new version of the R interpreter.

    Read more →
  • Generalized nondeterministic finite automaton

    Generalized nondeterministic finite automaton

    In the theory of computation, a generalized nondeterministic finite automaton (GNFA), also known as an expression automaton or a generalized nondeterministic finite state machine, is a variation of a nondeterministic finite automaton (NFA) where each transition is labeled with any regular expression. The GNFA reads blocks of symbols from the input which constitute a string as defined by the regular expression on the transition. There are several differences between a standard finite state machine and a generalized nondeterministic finite state machine. A GNFA must have only one start state and one accept state, and these cannot be the same state, whereas an NFA or DFA both may have several accept states, and the start state can be an accept state. A GNFA must have only one transition between any two states, whereas a NFA or DFA both allow for numerous transitions between states. In a GNFA, a state has a single transition to every state in the machine, although often it is a convention to ignore the transitions that are labelled with the empty set when drawing generalized nondeterministic finite state machines. == Formal definition == A GNFA can be defined as a 5-tuple, (S, Σ, T, s, a), consisting of a finite set of states (S); a finite set called the alphabet (Σ); a transition function (T : (S ∖ {\displaystyle \setminus } {a}) × (S ∖ {\displaystyle \setminus } {s}) → R); a start state (s ∈ S); an accept state (a ∈ S); where R is the collection of all regular expressions over the alphabet Σ. The transition function takes as its argument a pair of two states and outputs a regular expression (the label of the transition). This differs from other finite state machines, which take as input a single state and an input from the alphabet (or the empty string in the case of nondeterministic finite state machines) and outputs the next state (or the set of possible states in the case of nondeterministic finite state machines). A DFA or NFA can easily be converted into a GNFA and then the GNFA can be easily converted into a regular expression by repeatedly collapsing parts of it to single edges until S = {s, a}. Similarly, GNFAs can be reduced to NFAs by changing regular expression operators into new edges until each edge is labelled with a regular expression matching a single string of length at most 1. NFAs, in turn, can be reduced to DFAs using the powerset construction. This shows that GNFAs recognize the same set of formal languages as DFAs and NFAs.

    Read more →