AI Data Bias

AI Data Bias — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Conditional random field

    Conditional random field

    Conditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. The kind of graph used depends on the application. For example, in natural language processing, "linear chain" CRFs are popular, for which each prediction is dependent only on its immediate neighbours. In image processing, the graph typically connects locations to nearby and/or similar locations to enforce that they receive similar predictions. Other examples where CRFs are used are: labeling or parsing of sequential data for natural language processing or biological sequences, part-of-speech tagging, shallow parsing, named entity recognition, gene finding, peptide critical functional region finding, and object recognition and image segmentation in computer vision. == Description == CRFs are a type of discriminative undirected probabilistic graphical model. Lafferty, McCallum and Pereira define a CRF on observations X {\displaystyle {\boldsymbol {X}}} and random variables Y {\displaystyle {\boldsymbol {Y}}} as follows: Let G = ( V , E ) {\displaystyle G=(V,E)} be a graph such that Y = ( Y v ) v ∈ V {\displaystyle {\boldsymbol {Y}}=({\boldsymbol {Y}}_{v})_{v\in V}} , so that Y {\displaystyle {\boldsymbol {Y}}} is indexed by the vertices of G {\displaystyle G} . Then ( X , Y ) {\displaystyle ({\boldsymbol {X}},{\boldsymbol {Y}})} is a conditional random field when each random variable Y v {\displaystyle {\boldsymbol {Y}}_{v}} , conditioned on X {\displaystyle {\boldsymbol {X}}} , obeys the Markov property with respect to the graph; that is, its probability is dependent only on its neighbours in G and not its past states: P ( Y v | X , { Y w : w ≠ v } ) = P ( Y v | X , { Y w : w ∼ v } ) {\displaystyle P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\neq v\})=P({\boldsymbol {Y}}_{v}|{\boldsymbol {X}},\{{\boldsymbol {Y}}_{w}:w\sim v\})} , where w ∼ v {\displaystyle {\mathit {w}}\sim v} means that w {\displaystyle w} and v {\displaystyle v} are neighbors in G {\displaystyle G} . What this means is that a CRF is an undirected graphical model whose nodes can be divided into exactly two disjoint sets X {\displaystyle {\boldsymbol {X}}} and Y {\displaystyle {\boldsymbol {Y}}} , the observed and output variables, respectively; the conditional distribution p ( Y | X ) {\displaystyle p({\boldsymbol {Y}}|{\boldsymbol {X}})} is then modeled. === Inference === For general graphs, the problem of exact inference in CRFs is intractable. The inference problem for a CRF is basically the same as for an MRF and the same arguments hold. However, there exist special cases for which exact inference is feasible: If the graph is a chain or a tree, message passing algorithms yield exact solutions. The algorithms used in these cases are analogous to the forward-backward and Viterbi algorithm for the case of HMMs. If the CRF only contains pair-wise potentials and the energy is submodular, combinatorial min cut/max flow algorithms yield exact solutions. If exact inference is impossible, several algorithms can be used to obtain approximate solutions. These include: Loopy belief propagation Alpha expansion Mean field inference Linear programming relaxations === Parameter learning === Learning the parameters θ {\displaystyle \theta } is usually done by maximum likelihood learning for p ( Y i | X i ; θ ) {\displaystyle p(Y_{i}|X_{i};\theta )} . If all nodes have exponential family distributions and all nodes are observed during training, this optimization is convex. It can be solved for example using gradient descent algorithms, or Quasi-Newton methods such as the L-BFGS algorithm. On the other hand, if some variables are unobserved, the inference problem has to be solved for these variables. Exact inference is intractable in general graphs, so approximations have to be used. === Examples === In sequence modeling, the graph of interest is usually a chain graph. An input sequence of observed variables X {\displaystyle X} represents a sequence of observations and Y {\displaystyle Y} represents a hidden (or unknown) state variable that needs to be inferred given the observations. The Y i {\displaystyle Y_{i}} are structured to form a chain, with an edge between each Y i − 1 {\displaystyle Y_{i-1}} and Y i {\displaystyle Y_{i}} . As well as having a simple interpretation of the Y i {\displaystyle Y_{i}} as "labels" for each element in the input sequence, this layout admits efficient algorithms for: model training, learning the conditional distributions between the Y i {\displaystyle Y_{i}} and feature functions from some corpus of training data. decoding, determining the probability of a given label sequence Y {\displaystyle Y} given X {\displaystyle X} . inference, determining the most likely label sequence Y {\displaystyle Y} given X {\displaystyle X} . The conditional dependency of each Y i {\displaystyle Y_{i}} on X {\displaystyle X} is defined through a fixed set of feature functions of the form f ( i , Y i − 1 , Y i , X ) {\displaystyle f(i,Y_{i-1},Y_{i},X)} , which can be thought of as measurements on the input sequence that partially determine the likelihood of each possible value for Y i {\displaystyle Y_{i}} . The model assigns each feature a numerical weight and combines them to determine the probability of a certain value for Y i {\displaystyle Y_{i}} . Linear-chain CRFs have many of the same applications as conceptually simpler hidden Markov models (HMMs), but relax certain assumptions about the input and output sequence distributions. An HMM can loosely be understood as a CRF with very specific feature functions that use constant probabilities to model state transitions and emissions. Conversely, a CRF can loosely be understood as a generalization of an HMM that makes the constant transition probabilities into arbitrary functions that vary across the positions in the sequence of hidden states, depending on the input sequence. Notably, in contrast to HMMs, CRFs can contain any number of feature functions, the feature functions can inspect the entire input sequence X {\displaystyle X} at any point during inference, and the range of the feature functions need not have a probabilistic interpretation. == Variants == === Higher-order CRFs and semi-Markov CRFs === CRFs can be extended into higher order models by making each Y i {\displaystyle Y_{i}} dependent on a fixed number k {\displaystyle k} of previous variables Y i − k , . . . , Y i − 1 {\displaystyle Y_{i-k},...,Y_{i-1}} . In conventional formulations of higher order CRFs, training and inference are only practical for small values of k {\displaystyle k} (such as k ≤ 5), since their computational cost increases exponentially with k {\displaystyle k} . However, another recent advance has managed to ameliorate these issues by leveraging concepts and tools from the field of Bayesian nonparametrics. Specifically, the CRF-infinity approach constitutes a CRF-type model that is capable of learning infinitely-long temporal dynamics in a scalable fashion. This is effected by introducing a novel potential function for CRFs that is based on the Sequence Memoizer (SM), a nonparametric Bayesian model for learning infinitely-long dynamics in sequential observations. To render such a model computationally tractable, CRF-infinity employs a mean-field approximation of the postulated novel potential functions (which are driven by an SM). This allows for devising efficient approximate training and inference algorithms for the model, without undermining its capability to capture and model temporal dependencies of arbitrary length. There exists another generalization of CRFs, the semi-Markov conditional random field (semi-CRF), which models variable-length segmentations of the label sequence Y {\displaystyle Y} . This provides much of the power of higher-order CRFs to model long-range dependencies of the Y i {\displaystyle Y_{i}} , at a reasonable computational cost. Finally, large-margin models for structured prediction, such as the structured Support Vector Machine can be seen as an alternative training procedure to CRFs. === Latent-dynamic conditional random field === Latent-dynamic conditional random fields (LDCRF) or discriminative probabilistic latent variable models (DPLVM) are a type of CRFs for sequence tagging tasks. They are latent variable models that are trained discriminatively. In an LDCRF, like in any sequence tagging task, given a sequence of observations x = x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} , the main problem the model must solve is how to assign a sequence of labels y = y 1 , … , y n {\displaystyle y_{1},\dots ,y_{n}} from one finite set

    Read more →
  • World Programming System

    World Programming System

    The World Programming System, also known as WPS Analytics or WPS, is a software product developed by a company called World Programming (acquired by Altair Engineering). WPS Analytics supports users of mixed ability to access and process data and to perform data science tasks. It has interactive visual programming tools using data workflows, and it has coding tools supporting the use of the SAS language mixed with Python, R and SQL. == About == WPS can use programs written in the language of SAS without the need for translating them into any other language. In this regard WPS is compatible with the SAS system. WPS has a built-in language interpreter able to process the language of SAS and produce similar results. WPS is available to run on z/OS, Windows, macOS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX. On all supported platforms, programs written in the language of SAS can be executed from a WPS command line interface, often referred to as running in batch mode. WPS can also be used from a graphical user interface known as the WPS Workbench for managing, editing and running programs written in the language of SAS. The WPS Workbench user interface is based on Eclipse. WPS version 4 (released in March 2018) introduced a drag-and-drop workflow canvas providing interactive blocks for data retrieval, blending and preparation, data discovery and profiling, predictive modelling powered by machine learning algorithms, model performance validation and scorecards. WPS version 3 (released in February 2012) provided a new client/server architecture that allows the WPS Workbench GUI to execute SAS programs on remote server installations of WPS in a network or cloud. The resulting output, data sets, logs, etc., can then all be viewed and manipulated from inside the Workbench as if the workloads had been executed locally. SAS programs do not require any special language statements to use this feature. == Summary of main features == Runs on Windows, macOS, z/OS, Linux (x86, Armv8 64-bit, IBM Power LE, IBM Z), and AIX An integrated development environment based on Eclipse for Linux, macOS and Windows. Support for language of SAS elements. Support for the language of SAS Macros. Matrix Programming support using PROC IML. Support for generating band plots, bar charts, box plots, bubble plots, contour plots, dendrogram plots, ellipse plots, fringe plots, heat maps, high-low plots, histograms, loess plots, needle plots, pie charts, penalised b-spline, radar charts, reference lines, scatter plots, series plots, step plots, regression plots and vector plots. Support for statistical procedures ACECLUS, ASSOCRULES, ANOVA, BIN, BOXPLOT, CANCORR, CANDISC, CLUSTER, CORRESP, DISCRIM, DISTANCE, FACTOR, FASTCLUS, FREQ, GAM, GANNO, GENMOD, GLIMMIX, GLM, GLMMOD, GLMSELECT, ICLIFETEST, KDE, LIFEREG, LIFETEST, LOESS, LOGISTIC, MDS, MEANS, MI, MIANALYSE, MIXED, MODECLUS, NESTED, NLIN, NPAR1WAY, PHREG, PLAN, PLS, POWER, PRINCOMP, PROBIT, QUANTREG, RBF, REG, ROBUSTREG, RSREG, SCORE, SEGMENT, SIMNORMAL, STANDARD, STDSIZE, STDRATE, STEPDISC, SUMMARY, SURVEYMEANS, SURVEYSELECT, TPSPLINE, TRANSREG, TREE, TTEST, UNIVARIATE, VARCLUS, VARCOMP Support for time series procedures ARIMA, AUTOREG, ESM, EXPAND, FORECAST, LOAN, SEVERITY, SPECTRA, TIMESERIES, X12 Support for machine learning procedures DECISIONFOREST, DECISIONTREE, GMM, MLP, OPTIMALBIN, SEGMENT, SVM Support for ODS. Reads and writes SAS datasets (compressed or uncompressed). Access: Actian Matrix (previously known as ParAccel), DASD, DB2, Excel, Greenplum, Hadoop, Informix, Kognitio Archived 2012-08-24 at the Wayback Machine, MariaDB, MySQL, Netezza, ODBC, OLEDB, Oracle, PostgreSQL, SAND, Snowflake, SPSS/PSPP, SQL Server, Sybase, Sybase IQ, Teradata, VSAM, Vertica and XML. Support for SAS Tape Format. Direct output of reports to CSV, PDF and HTML. Support to connect WPS systems programmatically, remote submit parts of a program to execute on connected remote servers, upload and download data between the connected systems. Support for Hadoop Support for R Support for Python == Industry recognition == Gartner recognized World Programming in their Cool Vendors in Data Science, 2014 Report. == Lawsuit == In 2010 World Programming defended its use of the language of SAS in the High Court of England and Wales in SAS Institute Inc. v World Programming Ltd. The software was the subject of a lawsuit by SAS Institute. The EU Court of Justice ruled in favor of World Programming, stating that the copyright protection does not extend to the software functionality, the programming language used and the format of the data files used by the program. It stated that there is no copyright infringement when a company which does not have access to the source code of a program studies, observes and tests that program to create another program with the same functionality.

    Read more →
  • International Conference on Computer Vision

    International Conference on Computer Vision

    The International Conference on Computer Vision (ICCV) is a research conference sponsored by the Institute of Electrical and Electronics Engineers (IEEE) held every other year. It is considered to be one of the top conferences in computer vision, alongside CVPR and ECCV, and it is held on years in which ECCV is not. The conference is usually spread over four to five days. Typically, experts in the focus areas give tutorial talks on the first day, then the technical sessions (and poster sessions in parallel) follow. Recent conferences have also had an increasing number of focused workshops and a commercial exhibition. == Awards == === Azriel Rosenfeld Lifetime Achievement Award === The Azriel Rosenfeld Award, or Azriel Rosenfeld Lifetime Achievement Award, recognizes researchers who have made significant contributions to the field of computer vision over their careers. It is named in memory of computer scientist and mathematician Azriel Rosenfeld. The following people have received this award: === Helmholtz Prize === The ICCV Helmholtz Prize, known as the Test of Time Award before 2013, is awarded every other year at the ICCV, recognizing ICCV papers from ten or more years earlier that had a significant impact on computer vision research. Winners are selected by the IEEE Computer Society's Technical Committee on Pattern Analysis and Machine Intelligence. The award is named after the 19th century physician and physicist Hermann von Helmholtz, and the ICCV's award is not related to the various Helmholtz Prizes in physics, or the Hermann von Helmholtz Prize in neuroscience. === Marr Prize === The ICCV best-paper award is the Marr Prize, named after British neuroscientist David Marr. === Mark Everingham Prize === The Mark Everingham Prize is an award given yearly by the Technical Committee on Pattern Analysis and Machine Intelligence of the IEEE Computer Society at the IEEE International Conference on Computer Vision or the European Conference on Computer Vision to commemorate the late Mark Everingham, "one of the rising stars of computer vision", and to encourage others to follow in his footsteps by acting to further progress in the computer vision community as a whole. The prize is given to a researcher, or a team of researchers, who have made a selfless contribution of significant benefit to other members of the computer vision community. The Mark Everingham Prize for Rigorous Evaluation was an award given in 2012 at the British Machine Vision Conference. === PAMI Distinguished Researcher Award === The PAMI Distinguished Researcher Award (until 2013 called Significant Researcher Award) is awarded to candidates whose research projects have significantly contributed to the progress of computer vision. Awards are made based on major research contributions, as well as the role of those contributions in influencing and inspiring other research. Candidates are nominated by the community. The following people have received this award: == Conference list == The conference is usually held in the Spring in various international locations.

    Read more →
  • European Conference on Computer Vision

    European Conference on Computer Vision

    The European Conference on Computer Vision (ECCV) is a biennial research conference with the proceedings published by Springer Science+Business Media. Similar to ICCV in scope and quality, it is held those years which ICCV is not. It is considered to be one of the top conferences in computer vision, alongside CVPR and ICCV, with an 'A' rating from the Australian Ranking of ICT Conferences and an 'A1' rating from the Brazilian ministry of education. The acceptance rate for ECCV 2010 was 24.4% for posters and 3.3% for oral presentations. Like other top computer vision conferences, ECCV has tutorial talks, technical sessions, and poster sessions. The conference is usually spread over five to six days with the main technical program occupying three days in the middle, and tutorial and workshops, focused on specific topics, being held in the beginning and at the end. The ECCV presents the Koenderink Prize annually to recognize fundamental contributions in computer vision. == Location == The conference is usually held in autumn in Europe.

    Read more →
  • List of artificial intelligence journals

    List of artificial intelligence journals

    This is a list of notable peer-reviewed academic journals that publish research in the field of artificial intelligence (AI), including areas such as machine learning, computer vision, natural language processing, robotics, and intelligent systems. == General artificial intelligence == Artificial Intelligence (journal) – Elsevier Journal of Artificial Intelligence Research (JAIR) – AI Access Foundation Knowledge-Based Systems – Elsevier == Machine learning == Data Mining and Knowledge Discovery – Springer Machine Learning (journal) – Springer Journal of Machine Learning Research – Microtome Pattern Recognition (journal) – Elsevier Neural Networks (journal) – Elsevier Neural Computation (journal) – MIT Press Neurocomputing (journal) - Elsevier == Deep learning and neural computation == IEEE Transactions on Evolutionary Computation – IEEE IEEE Transactions on Neural Networks and Learning Systems – IEEE Nature Machine Intelligence – Springer Nature == Computer vision == International Journal of Computer Vision – Springer IEEE Transactions on Pattern Analysis and Machine Intelligence – IEEE Machine Vision and Applications – Springer == Natural language processing == Computational Linguistics (journal) – MIT Press Natural Language Processing Transactions of the Association for Computational Linguistics – ACL == Robotics and intelligent systems == IEEE Transactions on Robotics – IEEE Autonomous Robots – Springer Journal of Intelligent & Robotic Systems – Springer == Interdisciplinary and ethics in AI == AI & Society – Springer Artificial Life – MIT Press Philosophy & Technology – Springer Minds and Machines – Springer

    Read more →
  • Ordinal regression

    Ordinal regression

    In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem between regression and classification. Examples of ordinal regression are ordered logit and ordered probit. Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference (on a scale from, say, 1–5 for "very poor" through "excellent"), as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning. == Linear models for ordinal regression == Ordinal regression can be performed using a generalized linear model (GLM) that fits both a coefficient vector and a set of thresholds to a dataset. Suppose one has a set of observations, represented by length-p vectors x1 through xn, with associated responses y1 through yn, where each yi is an ordinal variable on a scale 1, ..., K. For simplicity, and without loss of generality, we assume y is a non-decreasing vector, that is, yi ≤ {\displaystyle \leq } yi+1. To this data, one fits a length-p coefficient vector w and a set of thresholds θ1, ..., θK−1 with the property that θ1 < θ2 < ... < θK−1. This set of thresholds divides the real number line into K disjoint segments, corresponding to the K response levels. The model can now be formulated as Pr ( y ≤ i ∣ x ) = σ ( θ i − w ⋅ x ) {\displaystyle \Pr(y\leq i\mid \mathbf {x} )=\sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )} or, the cumulative probability of the response y being at most i is given by a function σ (the inverse link function) applied to a linear function of x. Several choices exist for σ; the logistic function σ ( θ i − w ⋅ x ) = 1 1 + e − ( θ i − w ⋅ x ) {\displaystyle \sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )={\frac {1}{1+e^{-(\theta _{i}-\mathbf {w} \cdot \mathbf {x} )}}}} gives the ordered logit model, while using the CDF of the standard normal distribution gives the ordered probit model. A third option is to use an exponential function σ ( θ i − w ⋅ x ) = 1 − exp ⁡ ( − exp ⁡ ( θ i − w ⋅ x ) ) {\displaystyle \sigma (\theta _{i}-\mathbf {w} \cdot \mathbf {x} )=1-\exp(-\exp(\theta _{i}-\mathbf {w} \cdot \mathbf {x} ))} which gives the proportional hazards model. === Latent variable model === The probit version of the above model can be justified by assuming the existence of a real-valued latent variable (unobserved quantity) y, determined by y ∗ = w ⋅ x + ε {\displaystyle y^{}=\mathbf {w} \cdot \mathbf {x} +\varepsilon } where ε is normally distributed with zero mean and unit variance, conditioned on x. The response variable y results from an "incomplete measurement" of y, where one only determines the interval into which y falls: y = { 1 if y ∗ ≤ θ 1 , 2 if θ 1 < y ∗ ≤ θ 2 , 3 if θ 2 < y ∗ ≤ θ 3 ⋮ K if θ K − 1 < y ∗ . {\displaystyle y={\begin{cases}1&{\text{if}}~~y^{}\leq \theta _{1},\\2&{\text{if}}~~\theta _{1} Read more →

  • Spatial Analysis of Principal Components

    Spatial Analysis of Principal Components

    Spatial Principal Component Analysis (sPCA) is a multivariate statistical technique that complements the traditional Principal Component Analysis (PCA) by incorporating spatial information into the analysis of genetic variation. While traditional PCA can be used to find spatial patterns, it focuses on reducing data dimensionality by identifying uncorrelated principal components that capture maximum variance, thus often lacking power to identify non-trivial spatial genetic patterns. By accounting for spatial autocorrelation, sPCA is able to uncover spatial patterns in the data and find the spatial structure of datasets where observations are either geographically or topologically linked. This statistical power improvement allows the investigation of cryptic spatial patterns of genetic variability otherwise overlooked. sPCA has been applied in various fields, including geography, ecology and genetics. == History == sPCA was introduced in 2008 by Thibaut Jombart, Sébastien Devillard, Anne-Béatrice Dufour, and D. Pontier as a spatially explicit method to investigate the spatial pattern of genetic variation among individuals or populations. In 2017, Valeria Montano and Thibaut Jombart published an alternative non-parametric test to evaluate the significance of global and local spatial genetic patterns with improved statistical power. == Details == sPCA modifies the PCA framework by integrating spatial weights, typically in the form of connectivity matrices or spatial adjacency graphs. It identifies principal components (PCs) that maximize both genentic variance and spatial autocorreation, as measured by Moran's I. These weights represent relationships between observations based on geographic distance or other spatial criteria. The method decomposes variance into two components: Global structures, correspond to positive autocorrelation, that is, reflect broad-scale spatial patterns where similar values cluster over large regions. Local structures, correspond to negative autocorrelation, that is, capture fine-scale spatial variations or localized patterns. The core of sPCA relies on the eigenanalysis of a spatially weighted covariance or correlation matrix. The spatial weight matrix can be constructed using techniques such as Delaunay triangulation, nearest-neighbor graphs, or distance-based criteria. Applications of sPCA should be used only as an explorative tool. == Applications == sPCA has been widely used in many fields, including: Ecology: To find spatial patterns in species distributions and environmental gradients. Genetics: Population structure and gene flow analysis while allowing for spatial autocorrelation considerations. Biogeography: To identify historical dispersal routes, and barriers to gene flow, providing insights into species distribution patterns and evolutionary history. == Software/Source Code == sPCA implementations are available in R in adegenet and ntbox . These tools facilitate the application of sPCA by providing functions for constructing spatial weight matrices, performing eigenanalysis, and obtaining spatial principal components in an easy-to-read form.

    Read more →
  • Kernel method

    Kernel method

    In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing. Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the "kernel trick". Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors. Algorithms capable of operating with kernels include the kernel perceptron, support-vector machines (SVM), Gaussian processes, principal components analysis (PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters and many others. Most kernel algorithms are based on convex optimization or eigenproblems and are statistically well-founded. Typically, their statistical properties are analyzed using statistical learning theory (for example, using Rademacher complexity). == Motivation and informal explanation == Kernel methods can be thought of as instance-based learners: rather than learning some fixed set of parameters corresponding to the features of their inputs, they instead "remember" the i {\displaystyle i} -th training example ( x i , y i ) {\displaystyle (\mathbf {x} _{i},y_{i})} and learn for it a corresponding weight w i {\displaystyle w_{i}} . Prediction for unlabeled inputs, i.e., those not in the training set, are treated by the application of a similarity function k {\displaystyle k} , called a kernel, between the unlabeled input x ′ {\displaystyle \mathbf {x'} } and each of the training inputs x i {\displaystyle \mathbf {x} _{i}} . For instance, a kernelized binary classifier typically computes a weighted sum of similarities y ^ = sgn ⁡ ∑ i = 1 n w i y i k ( x i , x ′ ) , {\displaystyle {\hat {y}}=\operatorname {sgn} \sum _{i=1}^{n}w_{i}y_{i}k(\mathbf {x} _{i},\mathbf {x'} ),} where y ^ ∈ { − 1 , + 1 } {\displaystyle {\hat {y}}\in \{-1,+1\}} is the kernelized binary classifier's predicted label for the unlabeled input x ′ {\displaystyle \mathbf {x'} } whose hidden true label y {\displaystyle y} is of interest; k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is the kernel function that measures similarity between any pair of inputs x , x ′ ∈ X {\displaystyle \mathbf {x} ,\mathbf {x'} \in {\mathcal {X}}} ; the sum ranges over the n labeled examples { ( x i , y i ) } i = 1 n {\displaystyle \{(\mathbf {x} _{i},y_{i})\}_{i=1}^{n}} in the classifier's training set, with y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} ; the w i ∈ R {\displaystyle w_{i}\in \mathbb {R} } are the weights for the training examples, as determined by the learning algorithm; the sign function sgn {\displaystyle \operatorname {sgn} } determines whether the predicted classification y ^ {\displaystyle {\hat {y}}} comes out positive or negative. Kernel classifiers were described as early as the 1960s, with the invention of the kernel perceptron. They rose to great prominence with the popularity of the support-vector machine (SVM) in the 1990s, when the SVM was found to be competitive with neural networks on tasks such as handwriting recognition. == Mathematics: the kernel trick == The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all x {\displaystyle \mathbf {x} } and x ′ {\displaystyle \mathbf {x'} } in the input space X {\displaystyle {\mathcal {X}}} , certain functions k ( x , x ′ ) {\displaystyle k(\mathbf {x} ,\mathbf {x'} )} can be expressed as an inner product in another space V {\displaystyle {\mathcal {V}}} . The function k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is often referred to as a kernel or a kernel function. The word "kernel" is used in mathematics to denote a weighting function for a weighted sum or integral. Certain problems in machine learning have more structure than an arbitrary weighting function k {\displaystyle k} . The computation is made much simpler if the kernel can be written in the form of a "feature map" φ : X → V {\displaystyle \varphi \colon {\mathcal {X}}\to {\mathcal {V}}} which satisfies k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ V . {\displaystyle k(\mathbf {x} ,\mathbf {x'} )=\langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle _{\mathcal {V}}.} The key restriction is that ⟨ ⋅ , ⋅ ⟩ V {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {V}}} must be a proper inner product. On the other hand, an explicit representation for φ {\displaystyle \varphi } is not necessary, as long as V {\displaystyle {\mathcal {V}}} is an inner product space. The alternative follows from Mercer's theorem: an implicitly defined function φ {\displaystyle \varphi } exists whenever the space X {\displaystyle {\mathcal {X}}} can be equipped with a suitable measure ensuring the function k {\displaystyle k} satisfies Mercer's condition. Mercer's theorem is similar to a generalization of the result from linear algebra that associates an inner product to any positive-definite matrix. In fact, Mercer's condition can be reduced to this simpler case. If we choose as our measure the counting measure μ ( T ) = | T | {\displaystyle \mu (T)=|T|} for all T ⊂ X {\displaystyle T\subset X} , which counts the number of points inside the set T {\displaystyle T} , then the integral in Mercer's theorem reduces to a summation ∑ i = 1 n ∑ j = 1 n k ( x i , x j ) c i c j ≥ 0. {\displaystyle \sum _{i=1}^{n}\sum _{j=1}^{n}k(\mathbf {x} _{i},\mathbf {x} _{j})c_{i}c_{j}\geq 0.} If this summation holds for all finite sequences of points ( x 1 , … , x n ) {\displaystyle (\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n})} in X {\displaystyle {\mathcal {X}}} and all choices of n {\displaystyle n} real-valued coefficients ( c 1 , … , c n ) {\displaystyle (c_{1},\dots ,c_{n})} (cf. positive definite kernel), then the function k {\displaystyle k} satisfies Mercer's condition. Some algorithms that depend on arbitrary relationships in the native space X {\displaystyle {\mathcal {X}}} would, in fact, have a linear interpretation in a different setting: the range space of φ {\displaystyle \varphi } . The linear interpretation gives us insight about the algorithm. Furthermore, there is often no need to compute φ {\displaystyle \varphi } directly during computation, as is the case with support-vector machines. Some cite this running time shortcut as the primary benefit. Researchers also use it to justify the meanings and properties of existing algorithms. Theoretically, a Gram matrix K ∈ R n × n {\displaystyle \mathbf {K} \in \mathbb {R} ^{n\times n}} with respect to { x 1 , … , x n } {\displaystyle \{\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n}\}} (sometimes also called a "kernel matrix"), where K i j = k ( x i , x j ) {\displaystyle K_{ij}=k(\mathbf {x} _{i},\mathbf {x} _{j})} , must be positive semi-definite (PSD). Empirically, for machine learning heuristics, choices of a function k {\displaystyle k} that do not satisfy Mercer's condition may still perform reasonably if k {\displaystyle k} at least approximates the intuitive idea of similarity. Regardless of whether k {\displaystyle k} is a Mercer kernel, k {\displaystyle k} may still be referred to as a "kernel". If the kernel function k {\displaystyle k} is also a covariance function as used in Gaussian processes, then the Gram matrix K {\displaystyle \mathbf {K} } can also be called a covariance matrix. == Applications == Application areas of kernel methods are diverse and include geostatistics, kriging, inverse distance weighting, 3D reconstruction, bioinformatics, cheminformatics, information extraction and handwriting recognition. == Popular kernels == Fisher kernel Graph kernels Kernel smoother Polynomial kernel Radial basis function kern

    Read more →
  • WeChat

    WeChat

    WeChat or Weixin in Chinese (Chinese: 微信; pinyin: Wēixìn ; lit. 'micro-message') is an instant messaging, social media, and mobile payment app developed by Tencent. First released in 2011, it became the world's largest standalone mobile app in 2018 with over 1 billion monthly active users. The Chinese version of WeChat, Weixin, has been described as China's "app for everything" and a super-app because of its wide range of functions. WeChat provides text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video conferencing, video games, mobile payment, sharing of photographs and videos and location sharing. It has been described as having "an almost indispensable part of life in China". Accounts registered using Chinese phone numbers are managed under the Weixin brand, and their data is stored in mainland China and subject to Weixin's terms of service and privacy policy. Non-Chinese numbers are registered under WeChat, and WeChat users are subject to a more liberal terms of service and better privacy policy, and their data is stored in the Netherlands for users in the European Union, and in Singapore for other users. User activity on Weixin, the Chinese version of the app, is analyzed, tracked and shared with Chinese authorities upon request as part of the mass surveillance network in China. Chinese-registered Weixin accounts censor politically sensitive topics, and the software license agreement for Weixin (but not WeChat) explicitly forbids content which "[en]danger[s] national security, divulge[s] state secrets, subvert[s] state power and undermine[s] national unity", as well as other types of content such as content that "[u]ndermine[s] national religious policies" and content that is "[i]nciting illegal assembly, association, procession, demonstrations and gatherings disrupting the social order". Due to its central part of Chinese life, a Chinese person having their WeChat account banned can cause a significant disruption to their life. Any interactions between Weixin and WeChat users are subject to the terms of service and privacy policies of both services. == History == By 2010, Tencent had already attained a massive user base with their desktop messenger app QQ. Recognizing smart phones were likely to disrupt this status quo, CEO Pony Ma sought to proactively invest in alternatives to their own QQ messenger app. WeChat began as a project at Tencent Guangzhou Research and Project center in October 2010. The original version of the app was created by Allen Zhang, named "Weixin" (微信) by Pony Ma, and launched in 2011. The user adoption of WeChat was initially very slow, with users wondering why key features were missing; however, after the release of the Walkie-talkie-like voice messaging feature in May of that year, growth surged. By 2012, when the number of users reached 100 million, Weixin was re-branded "WeChat" by President Martin Lau for the international market. During a period of government support of e-commerce development—for example in the 12th five-year plan (2011–2015)—WeChat also saw new features enabling payments and commerce in 2013, which saw massive adoption after their virtual Red envelope promotion for Chinese New Year 2014. WeChat had over 889 million monthly active users by 2016, and as of 2019 WeChat's monthly active users had risen to an estimate of one billion. As of January 2022, it was reported that WeChat has more than 1.2 billion users. After the launch of WeChat payment in 2013, its users reached 400 million the next year, 90 percent of whom were in China. By comparison, Facebook Messenger and WhatsApp had about one billion monthly active users in 2016 but did not offer most of the other services available on WeChat. For example, in Q2 2017, WeChat's revenues from social media advertising were about US$0.9 billion (RMB6 billion) compared with Facebook's total revenues of US$9.3 billion, 98% of which were from social media advertising. WeChat's revenues from its value-added services were US$5.5 billion. By 2018, WeChat had been used by 93.5% of Chinese internet users. In that year, it became the world's largest standalone mobile app in 2018 with over 1 billion monthly active users. In response to a border dispute between India and China, WeChat was banned in India in June 2020 along with several other Chinese apps, including TikTok. U.S. president Donald Trump sought to ban U.S. "transactions" with WeChat through an executive order but was blocked by a preliminary injunction issued in the United States District Court for the Northern District of California in September 2020. Joe Biden officially dropped Trump's efforts to ban WeChat in the U.S. in June 2021. == Features == WeChat, has been described as China's "app for everything" and a super-app because of its wide range of functions. WeChat provides text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video conferencing, video games, mobile payment, sharing of photographs and videos and location sharing. It has been described as having "an almost indispensable part of life in China". Due to its central part of Chinese life, a Chinese person having their WeChat account banned can cause a significant disruption to their life. === Messaging === WeChat provides a variety of features including text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, video calls and conferencing, video games, photograph and video sharing, as well as location sharing. WeChat also allows users to exchange contacts with people nearby via Bluetooth, as well as providing various features for contacting people at random if desired (if people are open to it). It can also integrate with other social networking services such as Facebook and Tencent QQ. Photographs may also be embellished with filters and captions, and automatic translation service is available and could also translate the conversation during messaging. WeChat supports different instant messaging methods, including text messages, voice messages, walkie talkie, and stickers. Users can send previously saved or live pictures and videos, profiles of other users, coupons, lucky money packages, or current GPS locations with friends either individually or in a group chat. WeChat also provides a message recall feature to allow users to recall and withdraw information (e.g. images, documents) that are sent within 2 minutes in a conversation. WeChat also provides a voice-to-text feature that brings convenience when it is not convenient to listen to voice messages, as well as the basic ability to recognize emojis based on different tones of voice. A distance sensing feature is implemented in WeChat. It has the ability to activate the receivers' hold-to-talk function when the phone was brought in close proximity to the ear. After the receiver was held at a certain distance from the ear, the sensor would then proceed to automatically disable the phone speakers. This feature eliminates the risk of the user's voice messages being inadvertently broadcast to the general public. === Public accounts === WeChat users can register as a public account (公众号), which enables them to push feeds to subscribers, interact with subscribers, and provide subscribers with services. Users can also create an official account, which fall under service, subscription, or enterprise accounts. Once users as individuals or organizations set up a type of account, they cannot change it to another type. By the end of 2014, the number of WeChat official accounts had reached 8 million. Official accounts of organizations can apply to be verified (cost 300 RMB or about US$45). Official accounts can be used as a platform for services such as hospital pre-registrations, or credit card service. To create an official account, the applicant must register with Chinese authorities, which discourages "foreign companies". In April 2022, WeChat announced that it will start displaying the location of users in China every time they post on a public account. Meanwhile, overseas users on public accounts will also display the country based on their IP address. === Moments === "Moments" (朋友圈) is WeChat's brand name for its social feed of friends' updates. "Moments" is an interactive platform that allows users to post images, text, and short videos taken by users. It also allows users to share articles and music (associated with QQ Music or other web-based music services). Friends in the contact list can like the content and leave comments, functioning similarly to a private social network. In 2017 WeChat had a policy of a maximum of two advertisements per day per Moments user. Privacy in WeChat works by groups of friends: only the friends from the user's contact are able to view their Moments' contents and comments. The friends of the user will only be able to see the likes and comments from other users only if they are in a mutual friend group. For example, friends from high school are not able to

    Read more →
  • Error tolerance (PAC learning)

    Error tolerance (PAC learning)

    In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data. Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated. == Notation and the Valiant learning model == In the following, let X {\displaystyle X} be our n {\displaystyle n} -dimensional input space. Let H {\displaystyle {\mathcal {H}}} be a class of functions that we wish to use in order to learn a { 0 , 1 } {\displaystyle \{0,1\}} -valued target function f {\displaystyle f} defined over X {\displaystyle X} . Let D {\displaystyle {\mathcal {D}}} be the distribution of the inputs over X {\displaystyle X} . The goal of a learning algorithm A {\displaystyle {\mathcal {A}}} is to choose the best function h ∈ H {\displaystyle h\in {\mathcal {H}}} such that it minimizes e r r o r ( h ) = P x ∼ D ( h ( x ) ≠ f ( x ) ) {\displaystyle error(h)=P_{x\sim {\mathcal {D}}}(h(x)\neq f(x))} . Let us suppose we have a function s i z e ( f ) {\displaystyle size(f)} that can measure the complexity of f {\displaystyle f} . Let Oracle ( x ) {\displaystyle {\text{Oracle}}(x)} be an oracle that, whenever called, returns an example x {\displaystyle x} and its correct label f ( x ) {\displaystyle f(x)} . When no noise corrupts the data, we can define learning in the Valiant setting: Definition: We say that f {\displaystyle f} is efficiently learnable using H {\displaystyle {\mathcal {H}}} in the Valiant setting if there exists a learning algorithm A {\displaystyle {\mathcal {A}}} that has access to Oracle ( x ) {\displaystyle {\text{Oracle}}(x)} and a polynomial p ( ⋅ , ⋅ , ⋅ , ⋅ ) {\displaystyle p(\cdot ,\cdot ,\cdot ,\cdot )} such that for any 0 < ε ≤ 1 {\displaystyle 0<\varepsilon \leq 1} and 0 < δ ≤ 1 {\displaystyle 0<\delta \leq 1} it outputs, in a number of calls to the oracle bounded by p ( 1 ε , 1 δ , n , size ( f ) ) {\displaystyle p\left({\frac {1}{\varepsilon }},{\frac {1}{\delta }},n,{\text{size}}(f)\right)} , a function h ∈ H {\displaystyle h\in {\mathcal {H}}} that satisfies with probability at least 1 − δ {\displaystyle 1-\delta } the condition error ( h ) ≤ ε {\displaystyle {\text{error}}(h)\leq \varepsilon } . In the following we will define learnability of f {\displaystyle f} when data have suffered some modification. == Classification noise == In the classification noise model a noise rate 0 ≤ η < 1 2 {\displaystyle 0\leq \eta <{\frac {1}{2}}} is introduced. Then, instead of Oracle ( x ) {\displaystyle {\text{Oracle}}(x)} that returns always the correct label of example x {\displaystyle x} , algorithm A {\displaystyle {\mathcal {A}}} can only call a faulty oracle Oracle ( x , η ) {\displaystyle {\text{Oracle}}(x,\eta )} that will flip the label of x {\displaystyle x} with probability η {\displaystyle \eta } . As in the Valiant case, the goal of a learning algorithm A {\displaystyle {\mathcal {A}}} is to choose the best function h ∈ H {\displaystyle h\in {\mathcal {H}}} such that it minimizes e r r o r ( h ) = P x ∼ D ( h ( x ) ≠ f ( x ) ) {\displaystyle error(h)=P_{x\sim {\mathcal {D}}}(h(x)\neq f(x))} . In applications it is difficult to have access to the real value of η {\displaystyle \eta } , but we assume we have access to its upperbound η B {\displaystyle \eta _{B}} . Note that if we allow the noise rate to be 1 / 2 {\displaystyle 1/2} , then learning becomes impossible in any amount of computation time, because every label conveys no information about the target function. Definition: We say that f {\displaystyle f} is efficiently learnable using H {\displaystyle {\mathcal {H}}} in the classification noise model if there exists a learning algorithm A {\displaystyle {\mathcal {A}}} that has access to Oracle ( x , η ) {\displaystyle {\text{Oracle}}(x,\eta )} and a polynomial p ( ⋅ , ⋅ , ⋅ , ⋅ ) {\displaystyle p(\cdot ,\cdot ,\cdot ,\cdot )} such that for any 0 ≤ η ≤ 1 2 {\displaystyle 0\leq \eta \leq {\frac {1}{2}}} , 0 ≤ ε ≤ 1 {\displaystyle 0\leq \varepsilon \leq 1} and 0 ≤ δ ≤ 1 {\displaystyle 0\leq \delta \leq 1} it outputs, in a number of calls to the oracle bounded by p ( 1 1 − 2 η B , 1 ε , 1 δ , n , s i z e ( f ) ) {\displaystyle p\left({\frac {1}{1-2\eta _{B}}},{\frac {1}{\varepsilon }},{\frac {1}{\delta }},n,size(f)\right)} , a function h ∈ H {\displaystyle h\in {\mathcal {H}}} that satisfies with probability at least 1 − δ {\displaystyle 1-\delta } the condition e r r o r ( h ) ≤ ε {\displaystyle error(h)\leq \varepsilon } . == Statistical query learning == Statistical Query Learning is a kind of active learning problem in which the learning algorithm A {\displaystyle {\mathcal {A}}} can decide if to request information about the likelihood P f ( x ) {\displaystyle P_{f(x)}} that a function f {\displaystyle f} correctly labels example x {\displaystyle x} , and receives an answer accurate within a tolerance α {\displaystyle \alpha } . Formally, whenever the learning algorithm A {\displaystyle {\mathcal {A}}} calls the oracle Oracle ( x , α ) {\displaystyle {\text{Oracle}}(x,\alpha )} , it receives as feedback probability Q f ( x ) {\displaystyle Q_{f(x)}} , such that Q f ( x ) − α ≤ P f ( x ) ≤ Q f ( x ) + α {\displaystyle Q_{f(x)}-\alpha \leq P_{f(x)}\leq Q_{f(x)}+\alpha } . Definition: We say that f {\displaystyle f} is efficiently learnable using H {\displaystyle {\mathcal {H}}} in the statistical query learning model if there exists a learning algorithm A {\displaystyle {\mathcal {A}}} that has access to Oracle ( x , α ) {\displaystyle {\text{Oracle}}(x,\alpha )} and polynomials p ( ⋅ , ⋅ , ⋅ ) {\displaystyle p(\cdot ,\cdot ,\cdot )} , q ( ⋅ , ⋅ , ⋅ ) {\displaystyle q(\cdot ,\cdot ,\cdot )} , and r ( ⋅ , ⋅ , ⋅ ) {\displaystyle r(\cdot ,\cdot ,\cdot )} such that for any 0 < ε ≤ 1 {\displaystyle 0<\varepsilon \leq 1} the following hold: Oracle ( x , α ) {\displaystyle {\text{Oracle}}(x,\alpha )} can evaluate P f ( x ) {\displaystyle P_{f(x)}} in time q ( 1 ε , n , s i z e ( f ) ) {\displaystyle q\left({\frac {1}{\varepsilon }},n,size(f)\right)} ; 1 α {\displaystyle {\frac {1}{\alpha }}} is bounded by r ( 1 ε , n , s i z e ( f ) ) {\displaystyle r\left({\frac {1}{\varepsilon }},n,size(f)\right)} A {\displaystyle {\mathcal {A}}} outputs a model h {\displaystyle h} such that e r r ( h ) < ε {\displaystyle err(h)<\varepsilon } , in a number of calls to the oracle bounded by p ( 1 ε , n , s i z e ( f ) ) {\displaystyle p\left({\frac {1}{\varepsilon }},n,size(f)\right)} . Note that the confidence parameter δ {\displaystyle \delta } does not appear in the definition of learning. This is because the main purpose of δ {\displaystyle \delta } is to allow the learning algorithm a small probability of failure due to an unrepresentative sample. Since now Oracle ( x , α ) {\displaystyle {\text{Oracle}}(x,\alpha )} always guarantees to meet the approximation criterion Q f ( x ) − α ≤ P f ( x ) ≤ Q f ( x ) + α {\displaystyle Q_{f(x)}-\alpha \leq P_{f(x)}\leq Q_{f(x)}+\alpha } , the failure probability is no longer needed. The statistical query model is strictly weaker than the PAC model: any efficiently SQ-learnable class is efficiently PAC learnable in the presence of classification noise, but there exist efficient PAC-learnable problems such as parity that are not efficiently SQ-learnable. == Malicious classification == In the malicious classification model an adversary generates errors to foil the learning algorithm. This setting describes situations of error burst, which may occur when for a limited time transmission equipment malfunctions repeatedly. Formally, algorithm A {\displaystyle {\mathcal {A}}} calls an oracle Oracle ( x , β ) {\displaystyle {\text{Oracle}}(x,\beta )} that returns a correctly labeled example x {\displaystyle x} drawn, as usual, from distribution D {\displaystyle {\mathcal {D}}} over the input space with probability 1 − β {\displaystyle 1-\beta } , but it returns with probability β {\displaystyle \beta } an example drawn from a distribution that is not related to D {\displaystyle {\mathcal {D}}} . Moreover, this maliciously chosen example may strategically selected by an adversary who has knowledge of f {\displaystyle f} , β {\displaystyle \beta } , D {\displaystyle {\mathcal {D}}} , or the current progress of the learning algorithm. Definition: Given a bound β B < 1 2 {\displaystyle \beta _{B}<{\frac {1}{2}}} for 0 ≤ β < 1 2 {\displaystyle 0\leq \beta <{\frac {1}{2}}} , we say that f {\displaystyle f} is efficiently learnable using H {\displaystyle {\mathcal {H}}} in the malicious classification model, if there exist a learning algorithm A {\displaystyle {\mathcal {A}}} that has access to Oracle ( x , β ) {\displaystyle {\text{Oracle}}(x,\beta )}

    Read more →
  • Radial basis function kernel

    Radial basis function kernel

    In machine learning, the radial basis function kernel, or RBF kernel, is a popular kernel function used in various kernelized learning algorithms. In particular, it is commonly used in support vector machine classification. The RBF kernel on two samples x , x ′ ∈ R k {\displaystyle \mathbf {x} ,\mathbf {x'} \in \mathbb {R} ^{k}} , represented as feature vectors in some input space, is defined as K ( x , x ′ ) = exp ⁡ ( − ‖ x − x ′ ‖ 2 2 σ 2 ) {\displaystyle K(\mathbf {x} ,\mathbf {x'} )=\exp \left(-{\frac {\|\mathbf {x} -\mathbf {x'} \|^{2}}{2\sigma ^{2}}}\right)} ‖ x − x ′ ‖ 2 {\displaystyle \textstyle \|\mathbf {x} -\mathbf {x'} \|^{2}} may be recognized as the squared Euclidean distance between the two feature vectors. σ {\displaystyle \sigma } is a free parameter. An equivalent definition involves a parameter γ = 1 2 σ 2 {\displaystyle \textstyle \gamma ={\tfrac {1}{2\sigma ^{2}}}} : K ( x , x ′ ) = exp ⁡ ( − γ ‖ x − x ′ ‖ 2 ) {\displaystyle K(\mathbf {x} ,\mathbf {x'} )=\exp(-\gamma \|\mathbf {x} -\mathbf {x'} \|^{2})} Since the value of the RBF kernel decreases with distance and ranges between zero (in the infinite-distance limit) and one (when x = x'), it has a ready interpretation as a similarity measure. The feature space of the kernel has an infinite number of dimensions; for σ = 1 {\displaystyle \sigma =1} , its expansion using the multinomial theorem is: exp ⁡ ( − 1 2 ‖ x − x ′ ‖ 2 ) = exp ⁡ ( 2 2 x ⊤ x ′ − 1 2 ‖ x ‖ 2 − 1 2 ‖ x ′ ‖ 2 ) = exp ⁡ ( x ⊤ x ′ ) exp ⁡ ( − 1 2 ‖ x ‖ 2 ) exp ⁡ ( − 1 2 ‖ x ′ ‖ 2 ) = ∑ j = 0 ∞ ( x ⊤ x ′ ) j j ! exp ⁡ ( − 1 2 ‖ x ‖ 2 ) exp ⁡ ( − 1 2 ‖ x ′ ‖ 2 ) = ∑ j = 0 ∞ ∑ n 1 + n 2 + ⋯ + n k = j exp ⁡ ( − 1 2 ‖ x ‖ 2 ) x 1 n 1 ⋯ x k n k n 1 ! ⋯ n k ! exp ⁡ ( − 1 2 ‖ x ′ ‖ 2 ) x ′ 1 n 1 ⋯ x ′ k n k n 1 ! ⋯ n k ! = ⟨ φ ( x ) , φ ( x ′ ) ⟩ {\displaystyle {\begin{alignedat}{2}\exp \left(-{\frac {1}{2}}\|\mathbf {x} -\mathbf {x'} \|^{2}\right)&=\exp \left({\frac {2}{2}}\mathbf {x} ^{\top }\mathbf {x'} -{\frac {1}{2}}\|\mathbf {x} \|^{2}-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right)\\[5pt]&=\exp \left(\mathbf {x} ^{\top }\mathbf {x'} \right)\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right)\exp \left(-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right)\\[5pt]&=\sum _{j=0}^{\infty }{\frac {(\mathbf {x} ^{\top }\mathbf {x'} )^{j}}{j!}}\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right)\exp \left(-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right)\\[5pt]&=\sum _{j=0}^{\infty }\quad \sum _{n_{1}+n_{2}+\dots +n_{k}=j}\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right){\frac {x_{1}^{n_{1}}\cdots x_{k}^{n_{k}}}{\sqrt {n_{1}!\cdots n_{k}!}}}\exp \left(-{\frac {1}{2}}\|\mathbf {x'} \|^{2}\right){\frac {{x'}_{1}^{n_{1}}\cdots {x'}_{k}^{n_{k}}}{\sqrt {n_{1}!\cdots n_{k}!}}}\\[5pt]&=\langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle \end{alignedat}}} φ ( x ) = exp ⁡ ( − 1 2 ‖ x ‖ 2 ) ( a ℓ 0 ( 0 ) , a 1 ( 1 ) , … , a ℓ 1 ( 1 ) , … , a 1 ( j ) , … , a ℓ j ( j ) , … ) {\displaystyle \varphi (\mathbf {x} )=\exp \left(-{\frac {1}{2}}\|\mathbf {x} \|^{2}\right)\left(a_{\ell _{0}}^{(0)},a_{1}^{(1)},\dots ,a_{\ell _{1}}^{(1)},\dots ,a_{1}^{(j)},\dots ,a_{\ell _{j}}^{(j)},\dots \right)} where ℓ j = ( k + j − 1 j ) {\displaystyle \ell _{j}={\tbinom {k+j-1}{j}}} , a ℓ ( j ) = x 1 n 1 ⋯ x k n k n 1 ! ⋯ n k ! | n 1 + n 2 + ⋯ + n k = j ∧ 1 ≤ ℓ ≤ ℓ j {\displaystyle a_{\ell }^{(j)}={\frac {x_{1}^{n_{1}}\cdots x_{k}^{n_{k}}}{\sqrt {n_{1}!\cdots n_{k}!}}}\quad |\quad n_{1}+n_{2}+\dots +n_{k}=j\wedge 1\leq \ell \leq \ell _{j}} == Approximations == Because support vector machines and other models employing the kernel trick do not scale well to large numbers of training samples or large numbers of features in the input space, several approximations to the RBF kernel (and similar kernels) have been introduced. Typically, these take the form of a function z that maps a single vector to a vector of higher dimensionality, approximating the kernel: ⟨ z ( x ) , z ( x ′ ) ⟩ ≈ ⟨ φ ( x ) , φ ( x ′ ) ⟩ = K ( x , x ′ ) {\displaystyle \langle z(\mathbf {x} ),z(\mathbf {x'} )\rangle \approx \langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle =K(\mathbf {x} ,\mathbf {x'} )} where φ {\displaystyle \textstyle \varphi } is the implicit mapping embedded in the RBF kernel. === Fourier random features === One way to construct such a z is to randomly sample from the Fourier transformation of the kernel φ ( x ) = 1 D [ cos ⁡ ⟨ w 1 , x ⟩ , sin ⁡ ⟨ w 1 , x ⟩ , … , cos ⁡ ⟨ w D , x ⟩ , sin ⁡ ⟨ w D , x ⟩ ] T {\displaystyle \varphi (x)={\frac {1}{\sqrt {D}}}[\cos \langle w_{1},x\rangle ,\sin \langle w_{1},x\rangle ,\ldots ,\cos \langle w_{D},x\rangle ,\sin \langle w_{D},x\rangle ]^{T}} where w 1 , . . . , w D {\displaystyle w_{1},...,w_{D}} are independent samples from the normal distribution N ( 0 , σ − 2 I ) {\displaystyle N(0,\sigma ^{-2}I)} . Theorem: E ⁡ [ ⟨ φ ( x ) , φ ( y ) ⟩ ] = e ‖ x − y ‖ 2 / ( 2 σ 2 ) . {\displaystyle \operatorname {E} [\langle \varphi (x),\varphi (y)\rangle ]=e^{\|x-y\|^{2}/(2\sigma ^{2})}.} Proof: It suffices to prove the case of D = 1 {\displaystyle D=1} . Use the trigonometric identity cos ⁡ ( a − b ) = cos ⁡ ( a ) cos ⁡ ( b ) + sin ⁡ ( a ) sin ⁡ ( b ) {\displaystyle \cos(a-b)=\cos(a)\cos(b)+\sin(a)\sin(b)} , the spherical symmetry of Gaussian distribution, then evaluate the integral ∫ − ∞ ∞ cos ⁡ ( k x ) e − x 2 / 2 2 π d x = e − k 2 / 2 . {\displaystyle \int _{-\infty }^{\infty }{\frac {\cos(kx)e^{-x^{2}/2}}{\sqrt {2\pi }}}dx=e^{-k^{2}/2}.} Theorem: Var ⁡ [ ⟨ φ ( x ) , φ ( y ) ⟩ ] = O ( D − 1 ) {\displaystyle \operatorname {Var} [\langle \varphi (x),\varphi (y)\rangle ]=O(D^{-1})} . (Appendix A.2). === Nyström method === Another approach uses the Nyström method to approximate the eigendecomposition of the Gram matrix K, using only a random sample of the training set.

    Read more →
  • Population model (evolutionary algorithm)

    Population model (evolutionary algorithm)

    The population model of an evolutionary algorithm (EA) describes the structural properties of its population to which its members are subject. A population is the set of all proposed solutions of an EA considered in one iteration, which are also called individuals according to the biological role model. The individuals of a population can generate further individuals as offspring with the help of the genetic operators of the procedure. The simplest and widely used population model in EAs is the global or panmictic model, which corresponds to an unstructured population. It allows each individual to choose any other individual of the population as a partner for the production of offspring by crossover, whereby the details of the selection are irrelevant as long as the fitness of the individuals plays a significant role. Due to global mate selection, the genetic information of even slightly better individuals can prevail in a population after a few generations (iteration of an EA), provided that no better other offspring have emerged in this phase. If the solution found in this way is not the optimum sought, that is called premature convergence. This effect can be observed more often in panmictic populations. In nature global mating pools are rarely found. What prevails is a certain and limited isolation due to spatial distance. The resulting local neighbourhoods initially evolve independently and mutants have a higher chance of persisting over several generations. As a result, genotypic diversity in the gene pool is preserved longer than in a panmictic population. It is therefore obvious to divide the previously global population by substructures. Two basic models were introduced for this purpose, the island models, which are based on a division of the population into fixed subpopulations that exchange individuals from time to time, and the neighbourhood models, which assign individuals to overlapping neighbourhoods, also known as cellular genetic or evolutionary algorithms (cGA or cEA). The associated division of the population also suggests a corresponding parallelization of the procedure. For this reason, the topic of population models is also frequently discussed in the literature in connection with the parallelization of EAs. == Island models == In the island model, also called the migration model or coarse grained model, evolution takes place in strictly divided subpopulations. These can be organised panmictically, but do not have to be. From time to time an exchange of individuals takes place, which is called migration. The time between an exchange is called an epoch and its end can be triggered by various criteria: E.g. after a given time or given number of completed generations, or after the occurrence of stagnation. Stagnation can be detected, for example, by the fact that no fitness improvement has occurred in the island for a given number of generations. Island models introduce a variety of new strategy parameters: Number of subpopulations Size of the subpopulations Neighbourhood relations between islands: they determine which islands are considered neighbouring and can thus exchange individuals, see picture of a simple unidirectional ring (black arrows) and its extension by additional bidirectional neighbourhood relations (additional green arrows) Criteria for the termination of an epoch, synchronous or asynchronous migration Migration rate: number or proportion of individuals involved in migration. Migrant selection: There are many alternatives for this. E.g. the best individuals can replace the worst or randomly selected ones. Depending on the migration rate, this can affect one or more individuals at a time. With these parameters, the selection pressure can be influenced to a considerable extent. For example, it increases with the interconnectedness of the islands and decreases with the number of subpopulations or the epoch length. == Neighbourhood models or cellular evolutionary algorithms == The neighbourhood model, also called diffusion model or fine grained model, defines a topological neighbouhood relation between the individuals of a population that is independent of their phenotypic properties. The fundamental idea of this model is to provide the EA population with a special structure defined as a connected graph, in which each vertex is an individual that communicates with its nearest neighbours. Particularly, individuals are conceptually set in a toroidal mesh, and are only allowed to recombine with close individuals. This leads to a kind of locality known as isolation by distance. The set of potential mates of an individual is called its neighbourhood or deme. The adjacent figure illustrates that by showing two slightly overlapping neighbourhoods of two individuals marked yellow, through which genetic information can spread between the two demes. It is known that in this kind of algorithm, similar individuals tend to cluster and create niches that are independent of the deme boundaries and, in particular, can be larger than a deme. There is no clear borderline between adjacent groups, and close niches could be easily colonized by competitive ones and maybe merge solution contents during this process. Simultaneously, farther niches can be affected more slowly. EAs with this type of population are also well known as cellular EAs (cEA) or cellular genetic algorithms (cGA). A commonly used structure for arranging the individuals of a population is a 2D toroidal grid, although the number of dimensions can be easily extended (to 3D) or reduced (to 1D, e.g. a ring, see the figure on the right). The neighbourhood of a particular individual in the grid is defined in terms of the Manhattan distance from it to others in the population. In the basic algorithm, all the neighbourhoods have the same size and identical shapes. The two most commonly used neighbourhoods for two-dimensional cEAs are L5 and C9, see the figure on the left. Here, L stands for Linear while C stands for Compact. Each deme represents a panmictic subpopulation within which mate selection and the acceptance of offspring takes place by replacing the parent. The rules for the acceptance of offspring are local in nature and based on the neighbourhood: for example, it can be specified that the best offspring must be better than the parent being replaced or, less strictly, only better than the worst individual in the deme. The first rule is elitist and creates a higher selective pressure than the second non-elitist rule. In elitist EAs, the best individual of a population always survives. In this respect, they deviate from the biological model. The overlap of the neighbourhoods causes a mostly slow spread of genetic information across the neighbourhood boundaries, hence the name diffusion model. A better offspring now needs more generations than in panmixy to spread in the population. This promotes the emergence of local niches and their local evolution, thus preserving genotypic diversity over a longer period of time. The result is a better and dynamic balance between breadth and depth search adapted to the search space during a run. Depth search takes place in the niches and breadth search in the niche boundaries and through the evolution of the different niches of the whole population. For the same neighbourhood size, the spread of genetic information is larger for elongated figures like L9 than for a block like C9, and again significantly larger than for a ring. This means that ring neighbourhoods are well suited for achieving high quality results, even if this requires comparatively long run times. On the other hand, if one is primarily interested in fast and good, but possibly suboptimal results, 2D topologies are more suitable. == Comparison == When applying both population models to genetic algorithms, evolutionary strategy and other EAs, the splitting of a total population into subpopulations usually reduces the risk of premature convergence and leads to better results overall more reliably and faster than would be expected with panmictic EAs. Island models have the disadvantage compared to neighbourhood models that they introduce a large number of new strategy parameters. Despite the existing studies on this topic in the literature, a certain risk of unfavourable settings remains for the user. With neighbourhood models, on the other hand, only the size of the neighbourhood has to be specified and, in the case of the two-dimensional model, the choice of the neighbourhood figure is added. == Parallelism == Since both population models imply population partitioning, they are well suited as a basis for parallelizing an EA. This applies even more to cellular EAs, since they rely only on locally available information about the members of their respective demes. Thus, in the extreme case, an independent execution thread can be assigned to each individual, so that the entire cEA can run on a parallel hardware platform. The island model also supports p

    Read more →
  • Rabbit r1

    Rabbit r1

    The Rabbit r1 is an artificial intelligence personal assistant device developed by the American technology startup Rabbit Inc and co-designed by Teenage Engineering. It was announced at the 2024 Consumer Electronics Show as a handheld device intended to perform digital tasks through voice commands, touch interaction, and web-based AI agents. The r1 was marketed around Rabbit's concept of a "large action model" (LAM), which the company described as software able to operate websites and services on behalf of users. The device runs rabbitOS, an operating system based on the Android Open Source Project. Its services have included AI search, image recognition, voice interaction, music playback, rideshare and food-ordering integrations, and later experimental web-agent features such as LAM Playground and teach mode. Initial reviews were largely negative, with reviewers criticizing the device's limited functionality, bugs, and unclear advantages over a smartphone. Critics also questioned Rabbit's claims after the r1 software was shown to run on an Android phone. Rabbit continued to issue software updates after launch, including rabbitOS 2 in September 2025, which introduced a redesigned card-based interface, gesture navigation, and a "creations" feature for generating small software tools and experiences on the device. Rabbit Inc was founded by Jesse Lyu Cheng. == Hardware == Display: A 2.88-inch touchscreen for interactive user input. Input: push-to-talk button to activate voice commands; scroll wheel; Gyroscope; Magnetometer; Accelerometer; GPS. Camera: 8 MP single camera, with a resolution of 3264x2448, allowing for the connected external AI to use computer vision. Audio: Equipped with a speaker and dual microphones for audio interaction. Connectivity: Supports Wi-Fi and cellular connections via a SIM card slot to access internet services. Processor: Runs on a 2.3GHz MediaTek Helio P35 processor. Memory: Contains 4GB of RAM for operational tasks. Storage: Offers 128GB of internal storage for data. Ports: Utilizes a USB-C port for charging and data connections. == Software == The Rabbit r1 runs rabbitOS, which is based on the Android Open Source Project (AOSP), specifically Android 13. Rabbit founder Jesse Lyu described rabbitOS as a "very bespoke AOSP" after reports that the r1's software could be run on a conventional Android phone. Rabbit described the r1 as using a large action model (LAM), a type of AI agent intended to perform tasks across software interfaces rather than only answer questions. At launch, the device supported a limited set of services, including AI search, vision features, music playback, and some third-party integrations. Perplexity.ai was one of the AI services used to answer user queries. In 2024, Rabbit released several software updates that added features and attempted to address early criticism of the device. In July 2024, the company launched "beta rabbit", an advanced search and conversation mode for more complex queries. In October 2024, it released LAM Playground, a web-based agent feature intended to let the r1 operate websites on behalf of users. Reviewers found the feature experimental; Android Authority reported that it could perform some navigation tasks but struggled with CAPTCHAs, loops, and unintended behavior. In November 2024, Rabbit introduced a beta "teach mode", which allowed users to demonstrate web-based tasks in the Rabbithole web portal and later ask the r1 to repeat them. The company described teach mode as experimental, and The Verge noted that Rabbit warned users that results could be unpredictable and that CAPTCHA-protected sites could cause problems. Rabbit released rabbitOS 2 in September 2025. The update redesigned the interface around a card-based layout, added additional touchscreen gestures, and introduced "creations", a feature that lets users generate simple software tools, games, and interfaces through natural-language prompts. Coverage of the update described it as a major software overhaul rather than new hardware. == Reception == === Funding === Rabbit raised $20 million in funding from Khosla Ventures, Synergis Capital and Kakao Investment in October 2023. The company announced an additional $10 million in funding in December 2023. === Sales === Following its announcement at the 2024 Consumer Electronics Show, 130,000 units were sold. On August 13, 2024, Rabbit announced that sales of r1 had expanded to the entire European Union (except Malta) and United Kingdom. On August 21, 2024, sales of r1 expanded to Singapore. === Reviews === The r1 was met with strong criticism immediately after Rabbit began shipping the device. Some reviews questioned what the device was able to do that a smartphone could not, while comparing it to the similar Humane Ai Pin. YouTuber Marques Brownlee called the device "barely reviewable". Android Authority's Mishaal Rahman managed to install Rabbit r1's software on a Pixel 6a smartphone, after a tipster shared an APK file. The Verge echoed the claims made by Rahman. In response, Lyu published statements confirming its use of Android, but denying that the r1 is an Android app. Mashable called its Vision features impressive, but said that "these praise-worthy features are overshadowed by buggy performance". Ars Technica wrote a blog post claiming "the company is blocking access from bootleg APKs". TechCrunch gave a slightly more positive review, calling the device a "fun peep at a possible future", but could not "advise anyone to buy one now." Shortly after the launch of r1, Rabbit began a weekly cadence of software updates to address much of the criticism from the early reviews, including "battery and GPS performance, time zone selection, and more". Digital Trends said the Magic Camera feature "takes the most mundane, ordinary, and badly composed photos and makes something fun and eye-catching from them." Mashable said the "beta rabbit" feature "makes Rabbit R1 more conversational and intelligent". Later coverage noted that Rabbit continued to update the r1 after its poorly received launch. The Verge reported in September 2024 that about 5,000 of roughly 100,000 purchasers were using the device at any given moment, citing Lyu, and described the product as having launched before it was ready. In 2025, coverage of rabbitOS 2 described the update as an attempt to reset the device's software experience after the criticism of its original release. == Controversies == === GAMA project === Rabbit Inc has garnered attention due to allegations surrounding its funding and the company's past projects. The company came under scrutiny when Stephen Findeisen, known as Coffeezilla on YouTube, published a video in May 2024, alleging that Rabbit Incorporation was "built on a scam". Rabbit Incorporation, initially named Cyber Manufacturing Co, rebranded just two months before launching the Rabbit R1. The company, under its former name, raised $6 million in November 2021 for a project called GAMA, described as a "Next Generation NFT Project." Jesse Lyu, the CEO of Rabbit Incorporation, referred to GAMA as a "fun little project." Coffeezilla, who investigates influencer scams, highlighted old Clubhouse recordings of Jesse Lyu discussing the GAMA project. In these recordings, Lyu emphasized the substantial funding behind GAMA and its potential to be a revolutionary, carbon-negative cryptocurrency. Coffeezilla questioned the whereabouts of the funds raised for GAMA, estimating that approximately $1 million in refunds to investors remained unresolved. He suggested that the rebranding to Rabbit Incorporation and the shift to developing the Rabbit R1 were attempts to divert from the GAMA project's issues. In response to Coffeezilla's inquiries, Rabbit Incorporation stated that the $6 million raised was used for the GAMA project. The company said that NFTs cannot be refunded unless the owner agrees to "burn" them on the blockchain. Rabbit Incorporation also said that the GAMA project was open-sourced and returned to the community, aligning with community feedback. They also mentioned that efforts to buy back NFTs were made to counteract malicious trading and maintain market stability. === Security === In June 2024, Engadget reported that the Rabbitude team, a community reverse engineering project, had gained access to the r1's codebase revealing that r1's software contained several hardcoded API keys in its code for ElevenLabs, Microsoft Azure, Yelp, and Google Maps, potentially allowing unauthorized access to r1 responses, including those containing the users' personal information. For a short time, Rabbit immediately began revoking and rotating those secrets and confirmed that the code was leaked by an employee who had "been terminated and remains under investigation". In July 2024, the company revealed that all user chats and device pairing data were logged on the r1 with no ability to delete them. This meant that lost or stolen devices could be used to extract user

    Read more →
  • Stress majorization

    Stress majorization

    Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of n {\displaystyle n} m {\displaystyle m} -dimensional data items, a configuration X {\displaystyle X} of n {\displaystyle n} points in r {\displaystyle r} ( ≪ m ) {\displaystyle (\ll m)} -dimensional space is sought that minimizes the so-called stress function σ ( X ) {\displaystyle \sigma (X)} . Usually r {\displaystyle r} is 2 {\displaystyle 2} or 3 {\displaystyle 3} , i.e. the ( n × r ) {\displaystyle (n\times r)} matrix X {\displaystyle X} lists points in 2 − {\displaystyle 2-} or 3 − {\displaystyle 3-} dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function σ {\displaystyle \sigma } is a cost or loss function that measures the squared differences between ideal ( m {\displaystyle m} -dimensional) distances and actual distances in r-dimensional space. It is defined as: σ ( X ) = ∑ i < j ≤ n w i j ( d i j ( X ) − δ i j ) 2 {\displaystyle \sigma (X)=\sum _{i Read more →

  • Mating pool

    Mating pool

    Mating pool is a concept used in evolutionary algorithms and means a population of parents for the next population. The mating pool is formed by candidate solutions that the selection operators deem to have the highest fitness in the current population. Solutions that are included in the mating pool are referred to as parents. Individual solutions can be repeatedly included in the mating pool, with individuals of higher fitness values having a higher chance of being included multiple times. Crossover operators are then applied to the parents, resulting in recombination of genes recognized as superior. Lastly, random changes in the genes are introduced through mutation operators, increasing the genetic variation in the gene pool. Those two operators improve the chance of creating new, superior solutions. A new generation of solutions is thereby created, the children, who will constitute the next population. Depending on the selection method, the total number of parents in the mating pool can be different to the size of the initial population, resulting in a new population that’s smaller. To continue the algorithm with an equally sized population, random individuals from the old populations can be chosen and added to the new population. At this point, the fitness value of the new solutions is evaluated. If the termination conditions are fulfilled, processes come to an end. Otherwise, they are repeated. The repetition of the steps result in candidate solutions that evolve towards the most optimal solution over time. The genes will become increasingly uniform towards the most optimal gene, a process called convergence. If 95% of the population share the same version of a gene, the gene has converged. When all the individual fitness values have reached the value of the best individual, i.e. all the genes have converged, population convergence is achieved. == Mating pool creation == Several methods can be applied to create a mating pool. All of these processes involve the selective breeding of a particular number of individuals within a population. There are multiple criteria that can be employed to determine which individuals make it into the mating pool and which are left behind. The selection methods can be split into three general types: fitness proportionate selection, ordinal based selection and threshold based selection. === Fitness proportionate selection === In the case of fitness proportionate selection, random individuals are selected to enter the pool. However, the ones with a higher level of fitness are more likely to be picked and therefore have a greater chance of passing on their features to the next generation. One of the techniques used in this type of parental selection is the roulette wheel selection. This approach divides a hypothetical circular wheel into different slots, the size of which is equal to the fitness values of each potential candidate. Afterwards, the wheel is rotated and a fixed point determines which individual gets picked. The greater the fitness value of an individual, the higher the probability of being chosen as a parent by the random spin of the wheel. Alternatively, stochastic universal sampling can be implemented. This selection method is also based on the rotation of a spinning wheel. However, in this case there is more than one fixed point and as a result all of the mating pool members will be selected simultaneously. === Ordinal based selection === The ordinal based selection methods include the tournament and ranking selection. Tournament selection involves the random selection of individuals of a population and the subsequent comparison of their fitness levels. The winners of these “tournaments” are the ones with the highest values and will be put into the mating pool as parents. In ranking selection all the individuals are sorted based on their fitness values. Then, the selection of the parents is made according to the rank of the candidates. Every individual has a chance of being chosen, but higher ranked ones are favored === Threshold based selection === The last type of selection method is referred to as the threshold based method. This includes the truncation selection method, which sorts individuals based on their phenotypic values on a specific trait and later selects the proportion of them that are within a certain threshold as parents.

    Read more →