AI Code Ui

AI Code Ui — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • WriterDuet

    WriterDuet

    WriterDuet is a screenwriting software for writing and editing screenplays and other forms of mass media. == History == WriterDuet was founded in 2013 by Guy Goldstein. In April 2015, WriterDuet acquired the domain for Scripped.com after they closed, citing a serious technical failure. In August 2016, WriterDuet released a localized version of its software in China. In May 2018, WriterDuet included Bechdel test analysis functions to address issues of gender diversity in the screenwriting industry. In 2018, WriterDuet published WriterSolo, an offline version of their app that runs on the browser and opens/saves files on the computer, Dropbox, Google Drive, and iCloud. In July 2019, WriterDuet made the WriterSolo browser app and desktop app available as pay-what-you-want under the web address FreeScreenwriting.com. == Features == WriterDuet is primarily used to outline, write, and format screenplays to the standards recommended by the AMPAS. It also supports formats for theater, novels, and video games. The software is powered by Firebase allowing users to write together in real-time from multiple devices. WriterDuet's main competitors in the screenwriting industry are Final Draft, Celtx, and Movie Magic Screenwriter.

    Read more →
  • Structured kNN

    Structured kNN

    Structured k-nearest neighbours (SkNN) is a machine learning algorithm that generalizes k-nearest neighbors (k-NN). k-NN supports binary classification, multiclass classification, and regression, whereas SkNN allows training of a classifier for general structured output. For instance, a data sample might be a natural language sentence, and the output could be an annotated parse tree. Training a classifier consists of showing many instances of ground truth sample-output pairs. After training, the SkNN model is able to predict the corresponding output for new, unseen sample instances; that is, given a natural language sentence, the classifier can produce the most likely parse tree. == Training == As a training set, SkNN accepts sequences of elements with class labels. The type of element does not matter; the only requirement is a defined metric function that gives a distance between each pair of elements of a set. SkNN is based on idea of creating a graph, with each node representing a class label. There is an edge between a pair of nodes if there is a sequence of two elements in the training set with corresponding classes. The first step of SkNN training is the construction of such a graph from training sequences. There are two special nodes in the graph corresponding to sentence beginnings and ends: if a sequence starts with class C, the edge between node START and node C should be created. Like regular k-NN, the second part of SkNN training consists of storing the elements of a training sequence in a certain way. Each element of the training sequences is stored in the node related to the class of the previous element in the sequence. Every first element is stored in the START node. == Inference == Labelling input sequences by SkNN consists of finding the sequence of transitions in the graph, starting from node START. Each transition corresponds to a single element of the input sequence. As a result, the label of each element is determined as the target node label of the transition. The cost of the path is defined as the sum of all transitions, with the cost of transition from node A to node B being the distance from the current input sequence element to the nearest element of class B, stored in node A. Determining an optimal path may be performed using a modified Viterbi algorithm (where the sum of the distances is minimized, unlike the original algorithm which maximizes the product of probabilities).

    Read more →
  • Principal component analysis

    Principal component analysis

    Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data are linearly transformed onto a new coordinate system such that the directions (principal components) capturing the largest variation in the data can be easily identified. The principal components of a collection of points in a real coordinate space are a sequence of p {\displaystyle p} unit vectors, where the i {\displaystyle i} -th vector is the direction of a line that best fits the data while being orthogonal to the first i − 1 {\displaystyle i-1} vectors. Here, a best-fitting line is defined as one that minimizes the average squared perpendicular distance from the points to the line. These directions (i.e., principal components) constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science. == Overview == When performing PCA, the first principal component of a set of p {\displaystyle p} variables is the derived variable formed as a linear combination of the original variables that explains the most variance. The second principal component explains the most variance in what is left once the effect of the first component is removed, and we may proceed through p {\displaystyle p} iterations until all the variance is explained. PCA is most commonly used when many of the variables are highly correlated with each other and it is desirable to reduce their number to an independent set. The first principal component can equivalently be defined as a direction that maximizes the variance of the projected data. The i {\displaystyle i} -th principal component can be taken as a direction orthogonal to the first i − 1 {\displaystyle i-1} principal components that maximizes the variance of the projected data. For either objective, it can be shown that the principal components are eigenvectors of the data's covariance matrix. Thus, the principal components are often computed by eigendecomposition of the data covariance matrix or singular value decomposition of the data matrix. PCA is the simplest of the true eigenvector-based multivariate analyses and is closely related to factor analysis. Factor analysis typically incorporates more domain-specific assumptions about the underlying structure and solves eigenvectors of a slightly different matrix. PCA is also related to canonical correlation analysis (CCA). CCA defines coordinate systems that optimally describe the cross-covariance between two datasets while PCA defines a new orthogonal coordinate system that optimally describes variance in a single dataset. Robust and L1-norm-based variants of standard PCA have also been proposed. == History == PCA was invented in 1901 by Karl Pearson, as an analogue of the principal axis theorem in mechanics; it was later independently developed and named by Harold Hotelling in the 1930s. Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT) in signal processing, the Hotelling transform in multivariate quality control, proper orthogonal decomposition (POD) in mechanical engineering, singular value decomposition (SVD) of X (invented in the last quarter of the 19th century), eigenvalue decomposition (EVD) of XTX in linear algebra, factor analysis (for a discussion of the differences between PCA and factor analysis see Ch. 7 of Jolliffe's Principal Component Analysis), Eckart–Young theorem (Harman, 1960), or empirical orthogonal functions (EOF) in meteorological science (Lorenz, 1956), empirical eigenfunction decomposition (Sirovich, 1987), quasiharmonic modes (Brooks et al., 1988), spectral decomposition in noise and vibration, and empirical modal analysis in structural dynamics. == Intuition == PCA can be thought of as fitting a p-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the ellipsoid is small, then the variance along that axis is also small. To find the axes of the ellipsoid, we must first center the values of each variable in the dataset on 0 by subtracting the mean of the variable's observed values from each of those values. These transformed values are used instead of the original observed values for each of the variables. Then, we compute the covariance matrix of the data and calculate the eigenvalues and corresponding eigenvectors of this covariance matrix. Then we must normalize each of the orthogonal eigenvectors to turn them into unit vectors. Once this is done, each of the mutually-orthogonal unit eigenvectors can be interpreted as an axis of the ellipsoid fitted to the data. This choice of basis will transform the covariance matrix into a diagonalized form, in which the diagonal elements represent the variance of each axis. The proportion of the variance that each eigenvector represents can be calculated by dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues. Biplots and scree plots (degree of explained variance) are used to interpret findings of the PCA. == Details == PCA is defined as an orthogonal linear transformation on a real inner product space that transforms the data to a new coordinate system such that the greatest variance by some scalar projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. Consider an n × p {\displaystyle n\times p} data matrix, X, with column-wise zero empirical mean (the sample mean of each column has been shifted to zero), where each of the n rows represents a different repetition of the experiment, and each of the p columns gives a particular kind of feature (say, the results from a particular sensor). Mathematically, the transformation is defined by a set of size l {\displaystyle l} (where l {\displaystyle l} is usually selected to be strictly less than p {\displaystyle p} to reduce dimensionality) of p {\displaystyle p} -dimensional vectors of weights or coefficients w ( k ) = ( w 1 , … , w p ) ( k ) {\displaystyle \mathbf {w} _{(k)}=(w_{1},\dots ,w_{p})_{(k)}} that map each row vector x ( i ) = ( x 1 , … , x p ) ( i ) {\displaystyle \mathbf {x} _{(i)}=(x_{1},\dots ,x_{p})_{(i)}} of X to a new vector of principal component scores t ( i ) = ( t 1 , … , t l ) ( i ) {\displaystyle \mathbf {t} _{(i)}=(t_{1},\dots ,t_{l})_{(i)}} , given by t k ( i ) = x ( i ) ⋅ w ( k ) f o r i = 1 , … , n k = 1 , … , l {\displaystyle {t_{k}}_{(i)}=\mathbf {x} _{(i)}\cdot \mathbf {w} _{(k)}\qquad \mathrm {for} \qquad i=1,\dots ,n\qquad k=1,\dots ,l} in such a way that the individual variables t 1 , … , t l {\displaystyle t_{1},\dots ,t_{l}} of t considered over the data set successively inherit the maximum possible variance from X, with each coefficient vector w constrained to be a unit vector. The above may equivalently be written in matrix form as T = X W {\displaystyle \mathbf {T} =\mathbf {X} \mathbf {W} } where T i k = t k ( i ) {\displaystyle {\mathbf {T} }_{ik}={t_{k}}_{(i)}} , X i j = x j ( i ) {\displaystyle {\mathbf {X} }_{ij}={x_{j}}_{(i)}} , and W j k = w j ( k ) {\displaystyle {\mathbf {W} }_{jk}={w_{j}}_{(k)}} . === First component === In order to maximize variance, the first weight vector w(1) thus has to satisfy w ( 1 ) = arg ⁡ max ‖ w ‖ = 1 { ∑ i ( t 1 ) ( i ) 2 } = arg ⁡ max ‖ w ‖ = 1 { ∑ i ( x ( i ) ⋅ w ) 2 } {\displaystyle \mathbf {w} _{(1)}=\arg \max _{\Vert \mathbf {w} \Vert =1}\,\left\{\sum _{i}(t_{1})_{(i)}^{2}\right\}=\arg \max _{\Vert \mathbf {w} \Vert =1}\,\left\{\sum _{i}\left(\mathbf {x} _{(i)}\cdot \mathbf {w} \right)^{2}\right\}} Equivalently, writing this in matrix form gives w ( 1 ) = arg ⁡ max ‖ w ‖ = 1 { ‖ X w ‖ 2 } = arg ⁡ max ‖ w ‖ = 1 { w T X T X w } {\displaystyle \mathbf {w} _{(1)}=\arg \max _{\left\|\mathbf {w} \right\|=1}\left\{\left\|\mathbf {Xw} \right\|^{2}\right\}=\arg \max _{\left\|\mathbf {w} \right\|=1}\left\{\mathbf {w} ^{\mathsf {T}}\mathbf {X} ^{\mathsf {T}}\mathbf {Xw} \right\}} Since w(1) has been defined to be a unit vector, it equivalently also satisfies w ( 1 ) = arg ⁡ max { w T X T X w w T w } {\displaystyle \mathbf {w} _{(1)}=\arg \max \left\{{\frac {\mathbf {w} ^{\mathsf {T}}\mathbf {X} ^{\mathsf {T}}\mathbf {Xw} }{\mathbf {w} ^{\mathsf {T}}\mathbf {w} }}\right\}} The quantity to be maximised can be recognised as a Rayleigh quotient. A standard result for a positive semidefinite matrix such as XTX is that the quotient's maximum possible value is the largest eigenvalue of the matrix, which occurs when w is the corresponding eigenvector. With w(1) found, the first principal component of a data vector

    Read more →
  • Extremal Ensemble Learning

    Extremal Ensemble Learning

    Extremal Ensemble Learning (EEL) is a machine learning algorithmic paradigm for graph partitioning. EEL creates an ensemble of partitions and then uses information contained in the ensemble to find new and improved partitions. The ensemble evolves and learns how to form improved partitions through extremal updating procedure. The final solution is found by achieving consensus among its member partitions about what the optimal partition is. == Reduced-Network Extremal Ensemble Learning (RenEEL) == A particular implementation of the EEL paradigm is the Reduced-Network Extremal Ensemble Learning (RenEEL) scheme for partitioning a graph. RenEEL uses consensus across many partitions in an ensemble to create a reduced network that can be efficiently analyzed to find more accurate partitions. These better quality partitions are subsequently used to update the ensemble. An algorithm that utilizes the RenEEL scheme is currently the best algorithm for finding the graph partition with maximum modularity, which is an NP-hard problem.

    Read more →
  • Systems development life cycle

    Systems development life cycle

    The systems development life cycle (SDLC) describes the typical phases and progression between phases during the development of a computer-based system. These phases progress from inception to retirement. At base, there is just one life cycle, but the taxonomy used to describe it may vary; the cycle may be classified into different numbers of phases and various names may be used for those phases. The SDLC is analogous to the life cycle of a living organism from its birth to its death. In particular, the SDLC varies by system in much the same way that each living organism has a unique path through its life. The SDLC does not prescribe how engineers should go about their work to move the system through its life cycle. Prescriptive techniques are referred to using various terms such as methodology, model, framework, and formal process. Other terms are used for the same concept as SDLC, including software development life cycle (also SDLC), application development life cycle (ADLC), and system design life cycle (also SDLC). These other terms focus on a different scope of development and are associated with different prescriptive techniques, but are about the same essential life cycle. The term "life cycle" is often written without a space, as "lifecycle", with the former more popular in the past and in non-engineering contexts. The acronym SDLC was coined when the longer form was more popular and has remained associated with the expansion, even though the shorter form is popular in engineering. Also, SDLC is relatively unique as opposed to the TLA SDL, which is highly overloaded. == Phases == Depending on the source, the SDLC is described as having different phases and using different terms. Even so, there are common aspects. The following attempts to describe notable phases using notable terminology. The phases are somewhat ordered by the natural sequence of development, although they can be overlapping and iterative. === Conceptualization === During conceptualization (a.k.a. conceptual design, system investigation, feasibility), options and priorities are considered. A feasibility study can determine whether the development effort is worthwhile via activities such as understanding user needs, cost estimation, benefit analysis, and resource analysis. A study should address operational, financial, technical, human factors, and legal/political concerns. === Requirements analysis === Requirements analysis (a.k.a. preliminary design) involves understanding the problem and determining what is needed. Often this involves engaging users to define the requirements and recording them in a document known as a requirements specification. === Design === During the design phase (a.k.a. detail design), a solution is planned. The plan can include relatively high-level information such as describing the major components of the system. The plan can include relatively low-level information such as describing functions, screen layout, business rules, and process flow. The design phase is informed by the requirements of the system. The design must satisfy each requirement. The design may be recorded in textual documents as well as functional hierarchy diagrams, example screen images, business rules, process diagrams, pseudo-code, and data models. === Construction === During construction (a.k.a. implementation, production), the system is realized. Based on the design, hardware and software components are created and integrated. This phase includes testing sub-components, components and the integration of some components, but typically does not include testing at the complete system level. This phase may include the development of training materials, including user manuals and help files. === Acceptance === The acceptance phase (a.k.a. system testing) is about testing the complete system to ensure that it meets customer expectations (requirements). === Deployment === The deployment phase (a.k.a. implementation) involves the logistics of delivery to the customer. Some systems are deployed as a single instance (i.e. in the cloud), and deployment may be ad hoc and manual. Some systems are built in quantity and are associated with manufacturing process and commissioning. This phase may include training users to use the system. It may include transitioning future development to support staff. === Maintenance === During the maintenance phase (a.k.a. operation, utilization, support) development is largely inactive, although this phase does include customer support for resolving user issues and recording suggestions for improvement. Fixes and enhancements are handled by returning to the first phase, conceptualization. For minor changes, the cycle may be significantly abbreviated compared to initial development. === Decommission === Decommission (a.k.a. disposition, retirement, phase-out) is when the system is removed from use, i.e., when it reaches end-of-life. == Practices == === Management and control === SDLC phase objectives are described in this section with key deliverables, a description of recommended tasks, and a summary of related control objectives for effective management. It is critical for the project manager to establish and monitor control objectives while executing projects. Control objectives are clear statements of the desired result or purpose and should be defined and monitored throughout a project. Control objectives can be grouped into major categories (domains), and relate to the SDLC phases as shown in the figure. To manage and control a substantial SDLC initiative, a work breakdown structure (WBS) captures and schedules the work. The WBS and all programmatic material should be kept in the "project description" section of the project notebook. The project manager chooses a WBS format that best describes the project. The diagram shows that coverage spans numerous phases of the SDLC, but the associated MCD (Management Control Domains) shows mappings to SDLC phases. For example, Analysis and Design is primarily performed as part of the Acquisition and Implementation Domain, and System Build and Prototype is primarily performed as part of delivery and support. === Work breakdown structured organization === The upper section of the WBS provides an overview of the project scope and timeline. It should also summarize the major phases and milestones. The middle section is based on the SDLC phases. WBS elements consist of milestones and tasks to be completed rather than activities to be undertaken, and have a deadline. Each task has a measurable output (e.g., an analysis document). A WBS task may rely on one or more activities (e.g., coding). Parts of the project needing support from contractors should have a statement of work (SOW). The development of an SOW does not occur during a specific phase of SDLC but is developed to include the work from the SDLC process that may be conducted by contractors. === Baselines === Baselines are established after four of the five phases of the SDLC, and are critical to the iterative nature of the model. Baselines become milestones. functional baseline: established after the conceptual design phase. allocated baseline: established after the preliminary design phase. product baseline: established after the detailed design and development phase. updated product baseline: established after the production construction phase. In the following diagram, these stages are divided into ten steps, from definition to creation and modification of IT work products:

    Read more →
  • Natarajan dimension

    Natarajan dimension

    In the theory of Probably Approximately Correct Machine Learning, the Natarajan dimension characterizes the complexity of learning a set of functions, generalizing from the Vapnik–Chervonenkis dimension for boolean functions to multi-class functions. Originally introduced as the Generalized Dimension by Natarajan, it was subsequently renamed the Natarajan Dimension by Haussler and Long. == Definition == Let H {\displaystyle H} be a set of functions from a set X {\displaystyle X} to a set Y {\displaystyle Y} . H {\displaystyle H} shatters a set C ⊂ X {\displaystyle C\subset X} if there exist two functions f 0 , f 1 ∈ H {\displaystyle f_{0},f_{1}\in H} such that For every x ∈ C , f 0 ( x ) ≠ f 1 ( x ) {\displaystyle x\in C,f_{0}(x)\neq f_{1}(x)} . For every B ⊂ C {\displaystyle B\subset C} , there exists a function h ∈ H {\displaystyle h\in H} such that for all x ∈ B , h ( x ) = f 0 ( x ) {\displaystyle x\in B,h(x)=f_{0}(x)} and for all x ∈ C − B , h ( x ) = f 1 ( x ) {\displaystyle x\in C-B,h(x)=f_{1}(x)} . The Natarajan dimension of H is the maximal cardinality of a set shattered by H {\displaystyle H} . It is easy to see that if | Y | = 2 {\displaystyle |Y|=2} , the Natarajan dimension collapses to the Vapnik–Chervonenkis dimension. Shalev-Shwartz and Ben-David present comprehensive material on multi-class learning and the Natarajan dimension, including uniform convergence and learnability. Recently, Cohen et al showed that the Natarajan dimension is the dominant term governing agnostic multi-class PAC learnability.

    Read more →
  • Cartesian genetic programming

    Cartesian genetic programming

    Cartesian genetic programming is a form of genetic programming that uses a graph representation to encode computer programs. It grew from a method of evolving digital circuits developed by Julian F. Miller and Peter Thomson in 1997. The term ‘Cartesian genetic programming’ first appeared in 1999 and was proposed as a general form of genetic programming in 2000. It is called ‘Cartesian’ because it represents a program using a two-dimensional grid of nodes. Miller's keynote explains how CGP works. He edited a book entitled Cartesian Genetic Programming, published in 2011 by Springer. The open source project dCGP implements a differentiable version of CGP developed at the European Space Agency by Dario Izzo, Francesco Biscani and Alessio Mereta able to approach symbolic regression tasks, to find solution to differential equations, find prime integrals of dynamical systems, represent variable topology artificial neural networks and more.

    Read more →
  • LamaH

    LamaH

    LamaH (Large-Sample Data for Hydrology and Environmental Sciences) is a cross-state initiative for unified data preparation and collection in the field of catchment hydrology. Hydrological datasets, for example, are an integral component for creating flood forecasting models. == Features == LamaH datasets always consist of a combination of meteorological time series (e.g., precipitation, temperature) and hydrologically relevant catchment attributes (e.g., elevation, slope, forest area, soil, bedrock) aggregated over the respective catchment as well as associated hydrological time series at the catchment outlet (discharge). By evaluating the large and heterogeneous sample (large-sample) of catchments, it is possible to gain insights into the hydrological cycle that would probably not be achievable with local and small-scale studies. The structure of the dataset allows an evaluation based on machine learning methods (deep learning). The accompanying paper explains not only the data preparation but also any limitations, uncertainties and possible applications. == Difference to CAMELS == The LamaH datasets are quite similar to the CAMELS datasets, but additionally feature: Further basin delineations (based on intermediate catchments) and attributes (e.g. flow distance and altitude difference between two topologically adjacent discharge gauges), enabling the setup of an interconnected hydrological network Attributes for classifying catchments and runoff gauges according to the degree and type of (anthropogenic) influence == Availability == LamaH datasets are available for the following regions: Central Europe (Austria and its hydrological upstream areas in Germany, Czech Republic, Switzerland, Slovakia, Italy, Liechtenstein, Slovenia and Hungary) / 859 catchments CAMELS datasets are available for (ranked by publication date): Contiguous USA (exclusive Alaska and Hawaii) / 671 catchments Chile / 516 catchments Brazil / 897 catchments Great Britain / 671 catchments Australia / 222 catchments Both the CAMELS and LamaH datasets are licensed with Creative Commons and are therefore available barrier-free for the public.

    Read more →
  • Human Race Machine

    Human Race Machine

    The Human Race Machine (HRM) is a computerized console composed of four different programs. The Human Race Machine program allows participants to see themselves with the facial characteristics of six different races: Asian, White, African, Middle Eastern, and Indian, mapped onto their own face. The Age Machine allows viewers see an aged version of his or her face. A version of this methodology has been used for over twenty years by the FBI and the National Center for Missing and Exploited Children to help locate kidnap victims and missing children. The Couples Machine combines photographs of two people in different percentages to show the appearance of their child. The Anomaly Machine lets viewers see themselves with facial anomalies. The HRM was created by artist Nancy Burson and David Kramlich; it uses morphing technology. It was shown on Oprah on 2006-02-16.

    Read more →
  • Kernel method

    Kernel method

    In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing. Kernel methods owe their name to the use of kernel functions, which enable them to operate in a high-dimensional, implicit feature space without ever computing the coordinates of the data in that space, but rather by simply computing the inner products between the images of all pairs of data in the feature space. This operation is often computationally cheaper than the explicit computation of the coordinates. This approach is called the "kernel trick". Kernel functions have been introduced for sequence data, graphs, text, images, as well as vectors. Algorithms capable of operating with kernels include the kernel perceptron, support-vector machines (SVM), Gaussian processes, principal components analysis (PCA), canonical correlation analysis, ridge regression, spectral clustering, linear adaptive filters and many others. Most kernel algorithms are based on convex optimization or eigenproblems and are statistically well-founded. Typically, their statistical properties are analyzed using statistical learning theory (for example, using Rademacher complexity). == Motivation and informal explanation == Kernel methods can be thought of as instance-based learners: rather than learning some fixed set of parameters corresponding to the features of their inputs, they instead "remember" the i {\displaystyle i} -th training example ( x i , y i ) {\displaystyle (\mathbf {x} _{i},y_{i})} and learn for it a corresponding weight w i {\displaystyle w_{i}} . Prediction for unlabeled inputs, i.e., those not in the training set, are treated by the application of a similarity function k {\displaystyle k} , called a kernel, between the unlabeled input x ′ {\displaystyle \mathbf {x'} } and each of the training inputs x i {\displaystyle \mathbf {x} _{i}} . For instance, a kernelized binary classifier typically computes a weighted sum of similarities y ^ = sgn ⁡ ∑ i = 1 n w i y i k ( x i , x ′ ) , {\displaystyle {\hat {y}}=\operatorname {sgn} \sum _{i=1}^{n}w_{i}y_{i}k(\mathbf {x} _{i},\mathbf {x'} ),} where y ^ ∈ { − 1 , + 1 } {\displaystyle {\hat {y}}\in \{-1,+1\}} is the kernelized binary classifier's predicted label for the unlabeled input x ′ {\displaystyle \mathbf {x'} } whose hidden true label y {\displaystyle y} is of interest; k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is the kernel function that measures similarity between any pair of inputs x , x ′ ∈ X {\displaystyle \mathbf {x} ,\mathbf {x'} \in {\mathcal {X}}} ; the sum ranges over the n labeled examples { ( x i , y i ) } i = 1 n {\displaystyle \{(\mathbf {x} _{i},y_{i})\}_{i=1}^{n}} in the classifier's training set, with y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} ; the w i ∈ R {\displaystyle w_{i}\in \mathbb {R} } are the weights for the training examples, as determined by the learning algorithm; the sign function sgn {\displaystyle \operatorname {sgn} } determines whether the predicted classification y ^ {\displaystyle {\hat {y}}} comes out positive or negative. Kernel classifiers were described as early as the 1960s, with the invention of the kernel perceptron. They rose to great prominence with the popularity of the support-vector machine (SVM) in the 1990s, when the SVM was found to be competitive with neural networks on tasks such as handwriting recognition. == Mathematics: the kernel trick == The kernel trick avoids the explicit mapping that is needed to get linear learning algorithms to learn a nonlinear function or decision boundary. For all x {\displaystyle \mathbf {x} } and x ′ {\displaystyle \mathbf {x'} } in the input space X {\displaystyle {\mathcal {X}}} , certain functions k ( x , x ′ ) {\displaystyle k(\mathbf {x} ,\mathbf {x'} )} can be expressed as an inner product in another space V {\displaystyle {\mathcal {V}}} . The function k : X × X → R {\displaystyle k\colon {\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } is often referred to as a kernel or a kernel function. The word "kernel" is used in mathematics to denote a weighting function for a weighted sum or integral. Certain problems in machine learning have more structure than an arbitrary weighting function k {\displaystyle k} . The computation is made much simpler if the kernel can be written in the form of a "feature map" φ : X → V {\displaystyle \varphi \colon {\mathcal {X}}\to {\mathcal {V}}} which satisfies k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ V . {\displaystyle k(\mathbf {x} ,\mathbf {x'} )=\langle \varphi (\mathbf {x} ),\varphi (\mathbf {x'} )\rangle _{\mathcal {V}}.} The key restriction is that ⟨ ⋅ , ⋅ ⟩ V {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {V}}} must be a proper inner product. On the other hand, an explicit representation for φ {\displaystyle \varphi } is not necessary, as long as V {\displaystyle {\mathcal {V}}} is an inner product space. The alternative follows from Mercer's theorem: an implicitly defined function φ {\displaystyle \varphi } exists whenever the space X {\displaystyle {\mathcal {X}}} can be equipped with a suitable measure ensuring the function k {\displaystyle k} satisfies Mercer's condition. Mercer's theorem is similar to a generalization of the result from linear algebra that associates an inner product to any positive-definite matrix. In fact, Mercer's condition can be reduced to this simpler case. If we choose as our measure the counting measure μ ( T ) = | T | {\displaystyle \mu (T)=|T|} for all T ⊂ X {\displaystyle T\subset X} , which counts the number of points inside the set T {\displaystyle T} , then the integral in Mercer's theorem reduces to a summation ∑ i = 1 n ∑ j = 1 n k ( x i , x j ) c i c j ≥ 0. {\displaystyle \sum _{i=1}^{n}\sum _{j=1}^{n}k(\mathbf {x} _{i},\mathbf {x} _{j})c_{i}c_{j}\geq 0.} If this summation holds for all finite sequences of points ( x 1 , … , x n ) {\displaystyle (\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n})} in X {\displaystyle {\mathcal {X}}} and all choices of n {\displaystyle n} real-valued coefficients ( c 1 , … , c n ) {\displaystyle (c_{1},\dots ,c_{n})} (cf. positive definite kernel), then the function k {\displaystyle k} satisfies Mercer's condition. Some algorithms that depend on arbitrary relationships in the native space X {\displaystyle {\mathcal {X}}} would, in fact, have a linear interpretation in a different setting: the range space of φ {\displaystyle \varphi } . The linear interpretation gives us insight about the algorithm. Furthermore, there is often no need to compute φ {\displaystyle \varphi } directly during computation, as is the case with support-vector machines. Some cite this running time shortcut as the primary benefit. Researchers also use it to justify the meanings and properties of existing algorithms. Theoretically, a Gram matrix K ∈ R n × n {\displaystyle \mathbf {K} \in \mathbb {R} ^{n\times n}} with respect to { x 1 , … , x n } {\displaystyle \{\mathbf {x} _{1},\dotsc ,\mathbf {x} _{n}\}} (sometimes also called a "kernel matrix"), where K i j = k ( x i , x j ) {\displaystyle K_{ij}=k(\mathbf {x} _{i},\mathbf {x} _{j})} , must be positive semi-definite (PSD). Empirically, for machine learning heuristics, choices of a function k {\displaystyle k} that do not satisfy Mercer's condition may still perform reasonably if k {\displaystyle k} at least approximates the intuitive idea of similarity. Regardless of whether k {\displaystyle k} is a Mercer kernel, k {\displaystyle k} may still be referred to as a "kernel". If the kernel function k {\displaystyle k} is also a covariance function as used in Gaussian processes, then the Gram matrix K {\displaystyle \mathbf {K} } can also be called a covariance matrix. == Applications == Application areas of kernel methods are diverse and include geostatistics, kriging, inverse distance weighting, 3D reconstruction, bioinformatics, cheminformatics, information extraction and handwriting recognition. == Popular kernels == Fisher kernel Graph kernels Kernel smoother Polynomial kernel Radial basis function kern

    Read more →
  • Wolfram Mathematica

    Wolfram Mathematica

    Wolfram Mathematica (also known as Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimization, plotting functions and various types of data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other programming languages. It was conceived by Stephen Wolfram, and is developed by Wolfram Research of Champaign, Illinois. The Wolfram Language is the programming language used in Mathematica. Mathematica 1.0 was released on June 23, 1988 in Champaign, Illinois and Santa Clara, California. Mathematica's Wolfram Language is fundamentally based on Lisp; for example, the Mathematica command Most is identically equal to the Lisp command butlast. == Notebook interface == Mathematica is split into two parts: the kernel and the front end. The kernel interprets expressions (Wolfram Language code) and returns result expressions, which can then be displayed by the front end. The original front end, designed by Theodore Gray in 1988, consists of a notebook interface and allows the creation and editing of notebook documents that can contain code, plaintext, images, and graphics. Code development is also supported through support in a range of standard integrated development environment (IDE) including Eclipse, IntelliJ IDEA, Atom, Vim, Visual Studio Code and Git. The Mathematica Kernel also includes a command line front end. Other interfaces include JMath, based on GNU Readline and WolframScript which runs self-contained Mathematica programs (with arguments) from the UNIX command line. == High-performance computing == Capabilities for high-performance computing were extended with the introduction of packed arrays in version 4 (1999) and sparse matrices (version 5, 2003), and by adopting the GNU Multiple Precision Arithmetic Library to evaluate high-precision arithmetic. Version 5.2 (2005) added automatic multi-threading when computations are performed on multi-core computers. This release included CPU-specific optimized libraries. In addition Mathematica is supported by third party specialist acceleration hardware such as ClearSpeed. In 2002, gridMathematica was introduced to allow user level parallel programming on heterogeneous clusters and multiprocessor systems and in 2008 parallel computing technology was included in all Mathematica licenses including support for grid technology such as Windows HPC Server 2008, Microsoft Compute Cluster Server and Sun Grid. Support for CUDA and OpenCL GPU hardware was added in 2010. == Extensions == As of Version 14, there are 6,602 built-in functions and symbols in the Wolfram Language. Stephen Wolfram announced the launch of the Wolfram Function Repository in June 2019 as a way for the public Wolfram community to contribute functionality to the Wolfram Language. There are currently more than 3000 functions contributed as Resource Functions. In addition to the Wolfram Function Repository, there is a Wolfram Data Repository with computable data and the Wolfram Neural Net Repository for machine learning. Wolfram Mathematica is the basis of the Combinatorica package, which adds discrete mathematics functionality in combinatorics and graph theory to the program. == Connections to other applications, programming languages, and services == Communication with other applications can be done using a protocol called Wolfram Symbolic Transfer Protocol (WSTP). It allows communication between the Wolfram Mathematica kernel and the front end and provides a general interface between the kernel and other applications. Wolfram Research freely distributes a developer kit for linking applications written in the programming language C to the Mathematica kernel through WSTP using J/Link., a Java program that can ask Mathematica to perform computations. Similar functionality is achieved with .NET /Link, but with .NET programs instead of Java programs. Other languages that connect to Mathematica include Haskell, AppleScript, Racket, Visual Basic, Python, and Clojure. Mathematica supports the generation and execution of Modelica models for systems modeling and connects with Wolfram System Modeler. Links are also available to many third-party software packages and APIs. Mathematica can also capture real-time data from a variety of sources and can read and write to public blockchains (Bitcoin, Ethereum, and ARK). It supports import and export of over 220 data, image, video, sound, computer-aided design (CAD), geographic information systems (GIS), document, and biomedical formats. In 2019, support was added for compiling Wolfram Language code to LLVM. Version 12.3 of the Wolfram Language added support for Arduino. == Computable data == Mathematica is also integrated with Wolfram Alpha, an online answer engine that provides additional data, some of which is kept updated in real time, for users who use Mathematica with an internet connection. Some of the data sets include astronomical, chemical, geopolitical, language, biomedical, airplane, and weather data, in addition to mathematical data (such as knots and polyhedra). == Reception == BYTE in 1989 listed Mathematica as among the "Distinction" winners of the BYTE Awards, stating that it "is another breakthrough Macintosh application ... it could enable you to absorb the algebra and calculus that seemed impossible to comprehend from a textbook". Mathematica has been criticized for being closed source. Wolfram Research claims keeping Mathematica closed source is central to its business model and the continuity of the software.

    Read more →
  • Wake-sleep algorithm

    Wake-sleep algorithm

    The wake-sleep algorithm is an unsupervised learning algorithm for deep generative models, especially Helmholtz Machines. The algorithm is similar to the expectation-maximization algorithm, and optimizes the model likelihood for observed data. The name of the algorithm derives from its use of two learning phases, the “wake” phase and the “sleep” phase, which are performed alternately. It can be conceived as a model for learning in the brain, but is also being applied for machine learning. == Description == The goal of the wake-sleep algorithm is to find a hierarchical representation of observed data. In a graphical representation of the algorithm, data is applied to the algorithm at the bottom, while higher layers form gradually more abstract representations. Between each pair of layers are two sets of weights: Recognition weights, which define how representations are inferred from data, and generative weights, which define how these representations relate to data. == Training == Training consists of two phases – the “wake” phase and the “sleep” phase. It has been proven that this learning algorithm is convergent. === The "wake" phase === Neurons are fired by recognition connections (from what would be input to what would be output). Generative connections (leading from outputs to inputs) are then modified to increase probability that they would recreate the correct activity in the layer below – closer to actual data from sensory input. === The "sleep" phase === The process is reversed in the “sleep” phase – neurons are fired by generative connections while recognition connections are being modified to increase probability that they would recreate the correct activity in the layer above – further to actual data from sensory input. == Extensions == Since the recognition network is limited in its flexibility, it might not be able to approximate the posterior distribution of latent variables well. To better approximate the posterior distribution, it is possible to employ importance sampling, with the recognition network as the proposal distribution. This improved approximation of the posterior distribution also improves the overall performance of the model.

    Read more →
  • Vujak

    Vujak

    VuJak is an early video sampler, a VJ remix and mashup tool created in 1992 by Brian Kane, Lisa Eisenpresser, and Jay Haynes. The original name of the project was Mideo, but it was later changed to VuJak. VuJak was based on MIDI control of video in real-time. It was created with MAX from Opcode Systems, and utilized the newly released QuickTime 1.0 movie object. The first working version of the program was built on a Mac IIfx with 8 megs of ram, and could jump in real-time across a 160 x 120 pixel QuickTime movie via a midi keyboard. Later versions could manipulate full screen video, included the first real-time video scratch feature, had looping, vari-speed, and random play features, and allowed for recording and editing of video sequences within the application. VuJak also had networking capabilities which allowed artists to "jam" in real time across standard phone lines. The first public exhibition of VuJak was at the Digital Hollywood conference in Beverly Hills in 1993, where it was promoted by Timothy Leary. VuJak was featured in Mondo 2000, CBS Evening News, Wired Magazine, Electronic Musician, Billboard Magazine, The Hollywood Reporter, and it was used to create promotional videos for MTV. In 1994, VuJak was a featured interactive exhibition at the Exploratorium in San Francisco. Development of VuJak ceased in 1995.

    Read more →
  • Random projection

    Random projection

    In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. According to theoretical results, random projection preserves distances well, but empirical results are sparse. They have been applied to many natural language tasks under the name random indexing. == Dimensionality reduction == Dimensionality reduction, as the name suggests, is reducing the number of random variables using various mathematical methods from statistics and machine learning. Dimensionality reduction is often used to reduce the problem of managing and manipulating large data sets. Dimensionality reduction techniques generally use linear transformations in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions. For this purpose there are various related techniques, including: principal component analysis, linear discriminant analysis, canonical correlation analysis, discrete cosine transform, random projection, etc. Random projection is a simple and computationally efficient way to reduce the dimensionality of data by trading a controlled amount of error for faster processing times and smaller model sizes. The dimensions and distribution of random projection matrices are controlled so as to approximately preserve the pairwise distances between any two samples of the dataset. == Method == The core idea behind random projection is given in the Johnson-Lindenstrauss lemma, which states that if points in a vector space are of sufficiently high dimension, then they may be projected into a suitable lower-dimensional space in a way which approximately preserves pairwise distances between the points with high probability. In random projection, the original d {\displaystyle d} -dimensional data is projected to a k {\displaystyle k} -dimensional subspace, by multiplying on the left by a random matrix R ∈ R k × d {\displaystyle R\in \mathbb {R} ^{k\times d}} . Using matrix notation: If X d × N {\displaystyle X_{d\times N}} is the original set of N d-dimensional observations, then X k × N R P = R k × d X d × N {\displaystyle X_{k\times N}^{RP}=R_{k\times d}X_{d\times N}} is the projection of the data onto a lower k-dimensional subspace. Random projection is computationally simple: form the random matrix "R" and project the d × N {\displaystyle d\times N} data matrix X onto K dimensions of order O ( d k N ) {\displaystyle O(dkN)} . If the data matrix X is sparse with about c nonzero entries per column, then the complexity of this operation is of order O ( c k N ) {\displaystyle O(ckN)} . === Orthogonal random projection === A unit vector can be orthogonally projected to a random subspace. Let u {\displaystyle u} be the original unit vector, and let v {\displaystyle v} be its projection. The norm-squared ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as projecting a random point, uniformly sampled on the unit sphere, to its first k {\displaystyle k} coordinates. This is equivalent to sampling a random point in the multivariate gaussian distribution x ∼ N ( 0 , I d × d ) {\displaystyle x\sim {\mathcal {N}}(0,I_{d\times d})} , then normalizing it. Therefore, ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as ∑ i = 1 k x i 2 ∑ i = 1 k x i 2 + ∑ i = k + 1 d x i 2 {\displaystyle {\frac {\sum _{i=1}^{k}x_{i}^{2}}{\sum _{i=1}^{k}x_{i}^{2}+\sum _{i=k+1}^{d}x_{i}^{2}}}} , which by the chi-squared construction of the Beta distribution, has distribution Beta ⁡ ( k / 2 , ( d − k ) / 2 ) {\displaystyle \operatorname {Beta} (k/2,(d-k)/2)} , with mean k / d {\displaystyle k/d} . We have a concentration inequality P r [ | ‖ v ‖ 2 − k d | ≥ ϵ k d ] ≤ 3 exp ⁡ ( − k ϵ 2 / 64 ) {\displaystyle Pr\left[\left|\|v\|_{2}-{\frac {k}{d}}\right|\geq \epsilon {\sqrt {\frac {k}{d}}}\right]\leq 3\exp \left(-k\epsilon ^{2}/64\right)} for any ϵ ∈ ( 0 , 1 ) {\displaystyle \epsilon \in (0,1)} . === Gaussian random projection === The random matrix R can be generated using a Gaussian distribution. The first row is a random unit vector uniformly chosen from S d − 1 {\displaystyle S^{d-1}} . The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, and the following properties are satisfied: Spherical symmetry: For any orthogonal matrix A ∈ O ( d ) {\displaystyle A\in O(d)} , RA and R have the same distribution. Orthogonality: The rows of R are orthogonal to each other. Normality: The rows of R are unit-length vectors. === More computationally efficient random projections === Achlioptas has shown that the random matrix can be sampled more efficiently. Either the full matrix can be sampled IID according to R i , j = 3 / k × { + 1 with probability 1 6 0 with probability 2 3 − 1 with probability 1 6 {\displaystyle R_{i,j}={\sqrt {3/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{6}}\\0&{\text{with probability }}{\frac {2}{3}}\\-1&{\text{with probability }}{\frac {1}{6}}\end{cases}}} or the full matrix can be sampled IID according to R i , j = 1 / k × { + 1 with probability 1 2 − 1 with probability 1 2 {\displaystyle R_{i,j}={\sqrt {1/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{2}}\\-1&{\text{with probability }}{\frac {1}{2}}\end{cases}}} Both are efficient for database applications because the computations can be performed using integer arithmetic. More related study is conducted in. It was later shown how to use integer arithmetic while making the distribution even sparser, having very few nonzeroes per column, in work on the Sparse JL Transform. This is advantageous since a sparse embedding matrix means being able to project the data to lower dimension even faster. === Random Projection with Quantization === Random projection can be further condensed by quantization (discretization), with 1-bit (sign random projection) or multi-bits. It is the building block of SimHash, RP tree, and other memory efficient estimation and learning methods. == Large quasiorthogonal bases == The Johnson-Lindenstrauss lemma states that large sets of vectors in a high-dimensional space can be linearly mapped in a space of much lower (but still high) dimension n with approximate preservation of distances. One of the explanations of this effect is the exponentially high quasiorthogonal dimension of n-dimensional Euclidean space. There are exponentially large (in dimension n) sets of almost orthogonal vectors (with small value of inner products) in n–dimensional Euclidean space. This observation is useful in indexing of high-dimensional data. Quasiorthogonality of large random sets is important for methods of random approximation in machine learning. In high dimensions, exponentially large numbers of randomly and independently chosen vectors from equidistribution on a sphere (and from many other distributions) are almost orthogonal with probability close to one. This implies that in order to represent an element of such a high-dimensional space by linear combinations of randomly and independently chosen vectors, it may often be necessary to generate samples of exponentially large length if we use bounded coefficients in linear combinations. On the other hand, if coefficients with arbitrarily large values are allowed, the number of randomly generated elements that are sufficient for approximation is even less than dimension of the data space. == Implementations == RandPro - An R package for random projection sklearn.random_projection - A module for random projection from the scikit-learn Python library Weka implementation [1]

    Read more →
  • Automated Pain Recognition

    Automated Pain Recognition

    Automated Pain Recognition (APR) is a method for objectively measuring pain and at the same time represents an interdisciplinary research area that comprises elements of medicine, psychology, psychobiology, and computer science. The focus is on computer-aided objective recognition of pain, implemented on the basis of machine learning. Automated pain recognition allows for the valid, reliable detection and monitoring of pain in people who are unable to communicate verbally. The underlying machine learning processes are trained and validated in advance by means of unimodal or multimodal body signals. Signals used to detect pain may include facial expressions or gestures and may also be of a (psycho-)physiological or paralinguistic nature. To date, the focus has been on identifying pain intensity, but visionary efforts are also being made to recognize the quality, site, and temporal course of pain. However, the clinical implementation of this approach is a controversial topic in the field of pain research. Critics of automated pain recognition argue that pain diagnosis can only be performed subjectively by humans. == Background == Pain diagnosis under conditions where verbal reporting is restricted - such as in verbally and/or cognitively impaired people or in patients who are sedated or mechanically ventilated - is based on behavioral observations by trained professionals. However, all known observation procedures (e.g., Zurich Observation Pain Assessment (ZOPA)); Pain Assessment in Advanced Dementia Scale (PAINAD) require a great deal of specialist expertise. These procedures can be made more difficult by perception- and interpretation-related misjudgments on the part of the observer. With regard to the differences in design, methodology, evaluation sample, and conceptualization of the phenomenon of pain, it is difficult to compare the quality criteria of the various tools. Even if trained personnel could theoretically record pain intensity several times a day using observation instruments, it would not be possible to measure it every minute or second. In this respect, the goal of automated pain recognition is to use valid, robust pain response patterns that can be recorded multimodally for a temporally dynamic, high-resolution, automated pain intensity recognition system. == Procedure == For automated pain recognition, pain-relevant parameters are usually recorded using non-invasive sensor technology, which captures data on the (physical) responses of the person in pain. This can be achieved with camera technology that captures facial expressions, gestures, or posture, while audio sensors record paralinguistic features. (Psycho-)physiological information such as muscle tone and heart rate can be collected via biopotential sensors (electrodes). Pain recognition requires the extraction of meaningful characteristics or patterns from the data collected. This is achieved using machine learning techniques that are able to provide an assessment of the pain after training (learning), e.g., "no pain," "mild pain," or "severe pain." == Parameters == Although the phenomenon of pain comprises different components (sensory discriminative, affective (emotional), cognitive, vegetative, and (psycho-)motor), automated pain recognition currently relies on the measurable parameters of pain responses. These can be divided roughly into the two main categories of "physiological responses" and "behavioral responses". === Physiological responses === In humans, pain almost always initiates autonomic nervous processes that are reflected measurably in various physiological signals. ==== Physiological signals ==== Measurements can include electrodermal activity (EDA, also skin conductance), electromyography (EMG), electrocardiogram (ECG), blood volume pulse (BVP), electroencephalogram (EEG), respiration, and body temperature, which are regulatory mechanisms of the sympathetic and parasympathetic systems. Physiological signals are mainly recorded using special non-invasive surface electrodes (for EDA, EMG, ECG, and EEG), a blood volume pulse sensor (BVP), a respiratory belt (respiration), and a thermal sensor (body temperature). Endocrinological and immunological parameters can also be recorded, but this requires measures that are somewhat invasive (e.g., blood sampling). === Behavioral responses === Behavioral responses to pain fulfil two functions: protection of the body (e.g., through protective reflexes) and external communication of the pain (e.g., as a cry for help). The responses are particularly evident in facial expressions, gestures, and paralinguistic features. ==== Facial expressions ==== Behavioral signals captured comprise facial expression patterns (expressive behavior), which are measured with the aid of video signals. Facial expression recognition is based on the everyday clinical observation that pain often manifests itself in the patient's facial expressions but that this is not necessarily always the case, since facial expressions can be inhibited through self-control. Despite the possibility that facial expressions may be influenced consciously, facial expression behavior represents an essential source of information for pain diagnosis and is thus also a source of information for automatic pain recognition. One advantage of video-based facial expression recognition is the contact-free measurement of the face, provided that it can be captured on video, which is not possible in every position (e.g., lying face down) or may be limited by bandages covering the face. Facial expression analysis relies on rapid, spontaneous, and temporary changes in neuromuscular activity that lead to visually detectable changes in the face. ==== Gestures ==== Gestures are also captured predominantly using non-contact camera technology. Motor pain responses vary and are strongly dependent on the type and cause of the pain. They range from abrupt protective reflexes (e.g., spontaneous retraction of extremities or doubling up) to agitation (pathological restlessness) and avoidance behavior (hesitant, cautious movements). ==== Paralinguistic features of language ==== Among other things, pain leads to nonverbal linguistic behavior that manifests itself in sounds such as sighing, gasping, moaning, whining, etc. Paralinguistic features are usually recorded using highly sensitive microphones. == Algorithms == After the recording, pre-processing (e.g., filtering), and extraction of relevant features, an optional information fusion can be performed. During this process, modalities from different signal sources are merged to generate new or more precise knowledge. The pain is classified using machine learning processes. The method chosen has a significant influence on the recognition rate and depends greatly on the quality and granularity of the underlying data. Similar to the field of affective computing, the following classifiers are currently being used: Support Vector Machine (SVM): The goal of an SVM is to find a clearly defined optimal hyperplane with the greatest minimal distance to two (or more) classes to be separated. The hyperplane acts as a decision function for classifying an unknown pattern. Random Forest (RF): RF is based on the composition of random, uncorrelated decision trees. An unknown pattern is judged individually by each tree and assigned to a class. The final classification of the patterns by the RF is then based on a majority decision. k-Nearest Neighbors (k-NN): The k-NN algorithm classifies an unknown object using the class label that most commonly classifies the k neighbors closest to it. Its neighbors are determined using a selected similarity measure (e.g., Euclidean distance, Jaccard coefficient, etc.). Artificial neural networks (ANNs): ANNs are inspired by biological neural networks and model their organizational principles and processes in a very simplified manner. Class patterns are learned by adjusting the weights of the individual neuronal connections. == Databases == In order to classify pain in a valid manner, it is necessary to create representative, reliable, and valid pain databases that are available to the machine learner for training. An ideal database would be sufficiently large and would consist of natural (not experimental), high-quality pain responses. However, natural responses are difficult to record and can only be obtained to a limited extent; in most cases they are characterized by suboptimal quality. The databases currently available therefore contain experimental or quasi-experimental pain responses, and each database is based on a different pain model. The following list shows a selection of the most relevant pain databases (last updated: April 2020): UNBC-McMaster Shoulder Pain BioVid Heat Pain EmoPain SenseEmotion X-ITE Pain

    Read more →