Trying to pick the best AI avatar generator? An AI avatar generator is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right AI avatar generator slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.
Image fusion
The image fusion process is defined as gathering all the important information from multiple images, and their inclusion into fewer images, usually a single one. This single image is more informative and accurate than any single source image, and it consists of all the necessary information. The purpose of image fusion is not only to reduce the amount of data but also to construct images that are more appropriate and understandable for the human and machine perception. In computer vision, multisensor image fusion is the process of combining relevant information from two or more images into a single image. The resulting image will be more informative than any of the input images. In remote sensing applications, the increasing availability of space borne sensors gives a motivation for different image fusion algorithms. Several situations in image processing require high spatial and high spectral resolution in a single image. Most of the available equipment is not capable of providing such data convincingly. Image fusion techniques allow the integration of different information sources. The fused image can have complementary spatial and spectral resolution characteristics. However, the standard image fusion techniques can distort the spectral information of the multispectral data while merging. In satellite imaging, two types of images are available. The panchromatic image acquired by satellites is transmitted with the maximum resolution available and the multispectral data are transmitted with coarser resolution. This will usually be two or four times lower. At the receiver station, the panchromatic image is merged with the multispectral data to convey more information. Many methods exist to perform image fusion. The very basic one is the high-pass filtering technique. Later techniques are based on Discrete Wavelet Transform, uniform rational filter bank, and Laplacian pyramid. == Motivation == Multi sensor data fusion has become a discipline which demands more general formal solutions to a number of application cases. Several situations in image processing require both high spatial and high spectral information in a single image. This is important in remote sensing. However, the instruments are not capable of providing such information either by design or because of observational constraints. One possible solution for this is data fusion. == Methods == Image fusion methods can be broadly classified into two groups – spatial domain fusion and transform domain fusion. The fusion methods such as averaging, Brovey method, principal component analysis (PCA) and IHS based methods fall under spatial domain approaches. Another important spatial domain fusion method is the high-pass filtering based technique. Here the high frequency details are injected into upsampled version of MS images. The disadvantage of spatial domain approaches is that they produce spatial distortion in the fused image. Spectral distortion becomes a negative factor while we go for further processing, such as classification problem. Spatial distortion can be very well handled by frequency-domain approaches on image fusion. The multiresolution analysis has become a very useful tool for analysing remote sensing images. The discrete wavelet transform has become a very useful tool for fusion. Some other fusion methods are also there, such as Laplacian pyramid based, curvelet transform based etc. These methods show a better performance in spatial and spectral quality of the fused image compared to other spatial methods of fusion. The images used in image fusion should already be registered. Misregistration is a major source of error in image fusion. Some well-known image fusion methods are: High-pass filtering technique IHS transform based image fusion PCA-based image fusion Wavelet transform image fusion Pair-wise spatial frequency matching Comparative analysis of image fusion methods demonstrates that different metrics support different user needs, sensitive to different image fusion methods, and need to be tailored to the application. Categories of image fusion metrics are based on information theory features, structural similarity, or human perception. === Multi-focus image fusion === Multi-focus image fusion is used to collect useful and necessary information from input images with different focus depths in order to create an output image that ideally has all information from input images. In visual sensor network (VSN), sensors are cameras which record images and video sequences. In many applications of VSN, a camera can’t give a perfect illustration including all details of the scene. This is because of the limited depth of focus exists in the optical lens of cameras. Therefore, just the object located in the focal length of camera is focused and cleared and the other parts of image are blurred. VSN has an ability to capture images with different depth of focuses in the scene using several cameras. Due to the large amount of data generated by camera compared to other sensors such as pressure and temperature sensors and some limitation such as limited band width, energy consumption and processing time, it is essential to process the local input images to decrease the amount of transmission data. The aforementioned reasons emphasize the necessary of multi-focus images fusion. Multi-focus image fusion is a process which combines the input multi-focus images into a single image including all important information of the input images and it’s more accurate explanation of the scene than every single input image. == Applications == === In remote sensing === Image fusion in remote sensing has several application domains. An important domain is the multi-resolution image fusion (commonly referred to pan-sharpening). In satellite imagery we can have two types of images: Panchromatic images – An image collected in the broad visual wavelength range but rendered in black and white. Multispectral images – Images optically acquired in more than one spectral or wavelength interval. Each individual image is usually of the same physical area and scale but of a different spectral band. The SPOT PAN satellite provides high resolution (10m pixel) panchromatic data. While the LANDSAT TM satellite provides low resolution (30m pixel) multispectral images. Image fusion attempts to merge these images and produce a single high resolution multispectral image. The standard merging methods of image fusion are based on Red–Green–Blue (RGB) to Intensity–Hue–Saturation (IHS) transformation. The usual steps involved in satellite image fusion are as follows: Resize the low resolution multispectral images to the same size as the panchromatic image. Transform the R, G and B bands of the multispectral image into IHS components. Modify the panchromatic image with respect to the multispectral image. This is usually performed by histogram matching of the panchromatic image with Intensity component of the multispectral images as reference. Replace the intensity component by the panchromatic image and perform inverse transformation to obtain a high resolution multispectral image. Pan-sharpening can be done with Photoshop. Other applications of image fusion in remote sensing are available. === In medical imaging === Image fusion has become a common term used within medical diagnostics and treatment. The term is used when multiple images of a patient are registered and overlaid or merged to provide additional information. Fused images may be created from multiple images from the same imaging modality, or by combining information from multiple modalities, such as magnetic resonance image (MRI), computed tomography (CT), positron emission tomography (PET), and single-photon emission computed tomography (SPECT). In radiology and radiation oncology, these images serve different purposes. For example, CT images are used more often to ascertain differences in tissue density while MRI images are typically used to diagnose brain tumors. For accurate diagnosis, radiologists must integrate information from multiple image formats. Fused, anatomically consistent images are especially beneficial in diagnosing and treating cancer. With the advent of these new technologies, radiation oncologists can take full advantage of intensity modulated radiation therapy (IMRT). Being able to overlay diagnostic images into radiation planning images results in more accurate IMRT target tumor volumes.
Plate notation
In Bayesian inference, plate notation is a method of representing variables that repeat in a graphical model. Instead of drawing each repeated variable individually, a plate or rectangle is used to group variables into a subgraph that repeat together, and a number is drawn on the plate to represent the number of repetitions of the subgraph in the plate. The assumptions are that the subgraph is duplicated that many times, the variables in the subgraph are indexed by the repetition number, and any links that cross a plate boundary are replicated once for each subgraph repetition. == Example == In this example, we consider Latent Dirichlet allocation, a Bayesian network that models how documents in a corpus are topically related. There are two variables not in any plate; α is the parameter of the uniform Dirichlet prior on the per-document topic distributions, and β is the parameter of the uniform Dirichlet prior on the per-topic word distribution. The outermost plate represents all the variables related to a specific document, including θ i {\displaystyle \theta _{i}} , the topic distribution for document i. The M in the corner of the plate indicates that the variables inside are repeated M times, once for each document. The inner plate represents the variables associated with each of the N i {\displaystyle N_{i}} words in document i: z i j {\displaystyle z_{ij}} is the topic distribution for the jth word in document i, and w i j {\displaystyle w_{ij}} is the actual word used. The N in the corner represents the repetition of the variables in the inner plate N j {\displaystyle N_{j}} times, once for each word in document i. The circle representing the individual words is shaded, indicating that each w i j {\displaystyle w_{ij}} is observable, and the other circles are empty, indicating that the other variables are latent variables. The directed edges between variables indicate dependencies between the variables: for example, each w i j {\displaystyle w_{ij}} depends on z i j {\displaystyle z_{ij}} and β. == Extensions == A number of extensions have been created by various authors to express more information than simply the conditional relationships. However, few of these have become standard. Perhaps the most commonly used extension is to use rectangles in place of circles to indicate non-random variables—either parameters to be computed, hyperparameters given a fixed value (or computed through empirical Bayes), or variables whose values are computed deterministically from a random variable. The diagram on the right shows a few more non-standard conventions used in some articles in Wikipedia (e.g. variational Bayes): Variables that are actually random vectors are indicated by putting the vector size in brackets in the middle of the node. Variables that are actually random matrices are similarly indicated by putting the matrix size in brackets in the middle of the node, with commas separating row size from column size. Categorical variables are indicated by placing their size (without a bracket) in the middle of the node. Categorical variables that act as "switches", and which pick one or more other random variables to condition on from a large set of such variables (e.g. mixture components), are indicated with a special type of arrow containing a squiggly line and ending in a T junction. Boldface is consistently used for vector or matrix nodes (but not categorical nodes). == Software implementation == Plate notation has been implemented in various TeX/LaTeX drawing packages, but also as part of graphical user interfaces to Bayesian statistics programs such as BUGS and BayesiaLab and PyMC.
Concept class
In computational learning theory in mathematics, a concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept classes are a subject of computational learning theory. Concept class terminology frequently appears in model theory associated with probably approximately correct (PAC) learning. In this setting, if one takes a set Y as a set of (classifier output) labels, and X is a set of examples, the map c : X → Y {\displaystyle c:X\to Y} , i.e. from examples to classifier labels (where Y = { 0 , 1 } {\displaystyle Y=\{0,1\}} and where c is a subset of X), c is then said to be a concept. A concept class C {\displaystyle C} is then a collection of such concepts. Given a class of concepts C, a subclass D is reachable if there exists a sample s such that D contains exactly those concepts in C that are extensions to s. Not every subclass is reachable. == Background == A sample s {\displaystyle s} is a partial function from X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} . Identifying a concept with its characteristic function mapping X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} , it is a special case of a sample. Two samples are consistent if they agree on the intersection of their domains. A sample s ′ {\displaystyle s'} extends another sample s {\displaystyle s} if the two are consistent and the domain of s {\displaystyle s} is contained in the domain of s ′ {\displaystyle s'} . == Examples == Suppose that C = S + ( X ) {\displaystyle C=S^{+}(X)} . Then: the subclass { { x } } {\displaystyle \{\{x\}\}} is reachable with the sample s = { ( x , 1 ) } {\displaystyle s=\{(x,1)\}} ; the subclass S + ( Y ) {\displaystyle S^{+}(Y)} for Y ⊆ X {\displaystyle Y\subseteq X} are reachable with a sample that maps the elements of X − Y {\displaystyle X-Y} to zero; the subclass S ( X ) {\displaystyle S(X)} , which consists of the singleton sets, is not reachable. == Applications == Let C {\displaystyle C} be some concept class. For any concept c ∈ C {\displaystyle c\in C} , we call this concept 1 / d {\displaystyle 1/d} -good for a positive integer d {\displaystyle d} if, for all x ∈ X {\displaystyle x\in X} , at least 1 / d {\displaystyle 1/d} of the concepts in C {\displaystyle C} agree with c {\displaystyle c} on the classification of x {\displaystyle x} . The fingerprint dimension F D ( C ) {\displaystyle FD(C)} of the entire concept class C {\displaystyle C} is the least positive integer d {\displaystyle d} such that every reachable subclass C ′ ⊆ C {\displaystyle C'\subseteq C} contains a concept that is 1 / d {\displaystyle 1/d} -good for it. This quantity can be used to bound the minimum number of equivalence queries needed to learn a class of concepts according to the following inequality: F D ( C ) − 1 ≤ # E Q ( C ) ≤ ⌈ F D ( C ) ln ( | C | ) ⌉ {\textstyle FD(C)-1\leq \#EQ(C)\leq \lceil FD(C)\ln(|C|)\rceil } .
Win–stay, lose–switch
In psychology, game theory, statistics, and machine learning, win–stay, lose–switch (also win–stay, lose–shift or Pavlov, named after Ivan Pavlov) is a heuristic learning strategy used to model learning in decision situations. It was first invented as an improvement over randomization in bandit problems. It was later applied to the prisoner's dilemma in order to model the evolution of altruism. In most versions, it starts either with a cooperate, then proceeds as always, or starts with a "probe" of cooperate-defect-cooperate to determine the other player's strategy. A mutual cooperation is regarded as a win. The learning rule bases its decision only on the outcome of the previous play. Outcomes are divided into successes (wins) and failures (losses). If the play on the previous round resulted in a success, then the agent plays the same strategy on the next round. Alternatively, if the play resulted in a failure the agent switches to another action. A large-scale empirical study of players of the game rock, paper, scissors shows that a variation of this strategy is adopted by real-world players of the game, instead of the Nash equilibrium strategy of choosing entirely at random between the three options.
Luma (video)
In video, luma ( Y ′ {\displaystyle Y'} ) represents the brightness in an image (the "black-and-white" or achromatic portion of the image). Luma is typically paired with chroma. Luma represents the achromatic image, while the chroma components represent the color information. Converting R′G′B′ sources (such as the output of a three-CCD camera) into luma and chroma allows for chroma subsampling: because human vision has finer spatial sensitivity to luminance ("black and white") differences than chromatic differences, video systems can store and transmit chromatic information at lower resolution, optimizing perceived detail at a particular bandwidth. == Luma versus relative luminance == Luma is the weighted sum of gamma-compressed R′G′B′ components of a color video—the prime symbols ′ denote gamma compression. The word was proposed to prevent confusion between luma as implemented in video engineering and relative luminance as used in color science (i.e. as defined by CIE). Relative luminance is formed as a weighted sum of linear RGB components, not gamma-compressed ones. Even so, luma is sometimes erroneously called luminance. SMPTE EG 28 recommends the symbol Y ′ {\displaystyle Y'} to denote luma and the symbol Y {\displaystyle Y} to denote relative luminance. === Use of relative luminance === While luma is more often encountered, relative luminance is sometimes used in video engineering when referring to the brightness of a monitor. The formula used to calculate relative luminance uses coefficients based on the CIE color matching functions and the relevant standard chromaticities of red, green, and blue (e.g., the original NTSC primaries, SMPTE C, or Rec. 709). For the Rec. 709 (and sRGB) primaries, the linear combination, based on pure colorimetric considerations and the definition of relative luminance is: Y = 0.2126 R + 0.7152 G + 0.0722 B {\displaystyle Y=0.2126R+0.7152G+0.0722B} The formula used to calculate luma in the Rec. 709 spec arbitrarily also uses these same coefficients, but with gamma-compressed components: Y ′ = 0.2126 R ′ + 0.7152 G ′ + 0.0722 B ′ , {\displaystyle Y'=0.2126R'+0.7152G'+0.0722B',} where the prime symbol ′ denotes gamma compression. == Rec. 601 luma versus Rec. 709 luma coefficients == For digital formats following CCIR 601 (i.e. most digital standard definition formats), luma is calculated with this formula: Y 601 ′ = 0.299 R ′ + 0.587 G ′ + 0.114 B ′ {\displaystyle Y'_{\text{601}}=0.299R'+0.587G'+0.114B'} Formats following ITU-R Recommendation BT. 709 (i.e. most digital high definition formats) use a different formula: Y 709 ′ = 0.2126 R ′ + 0.7152 G ′ + 0.0722 B ′ {\displaystyle Y'_{\text{709}}=0.2126R'+0.7152G'+0.0722B'} Modern HDTV systems use the 709 coefficients, while transitional 1035i HDTV (MUSE) formats may use the SMPTE 240M coefficients: Y 240 ′ = 0.212 R ′ + 0.701 G ′ + 0.087 B ′ = Y 145 ′ {\displaystyle Y'_{\text{240}}=0.212R'+0.701G'+0.087B'=Y'_{\text{145}}} These coefficients correspond to the SMPTE RP 145 primaries (also known as "SMPTE C") in use at the time the standard was created. The change in the luma coefficients is to provide the "theoretically correct" coefficients that reflect the corresponding standard chromaticities ('colors') of the primaries red, green, and blue. However, there is some controversy regarding this decision. The difference in luma coefficients requires that component signals must be converted between Rec. 601 and Rec. 709 to provide accurate colors. In consumer equipment, the matrix required to perform this conversion may be omitted (to reduce cost), resulting in inaccurate color. == Luma and luminance errors == As well, the Rec. 709 luma coefficients may not necessarily provide better performance. Because of the difference between luma and relative luminance, luma does not exactly represent the luminance in an image. As a result, errors in chroma can affect luminance. Luma alone does not perfectly represent luminance; accurate luminance requires both accurate luma and chroma. Hence, errors in chroma "bleed" into the luminance of an image. Note the bleeding in lightness near the borders. Due to the widespread usage of chroma subsampling, errors in chroma typically occur when it is lowered in resolution/bandwidth. This lowered bandwidth, coupled with high frequency chroma components, can cause visible errors in luminance. An example of a high frequency chroma component would be the line between the green and magenta bars of the SMPTE color bars test pattern. Error in luminance can be seen as a dark band that occurs in this area.
Least-squares support vector machine
Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods. == From support-vector machine to least-squares support-vector machine == Given a training set { x i , y i } i = 1 N {\displaystyle \{x_{i},y_{i}\}_{i=1}^{N}} with input data x i ∈ R n {\displaystyle x_{i}\in \mathbb {R} ^{n}} and corresponding binary class labels y i ∈ { − 1 , + 1 } {\displaystyle y_{i}\in \{-1,+1\}} , the SVM classifier, according to Vapnik's original formulation, satisfies the following conditions: { w T ϕ ( x i ) + b ≥ 1 , if y i = + 1 , w T ϕ ( x i ) + b ≤ − 1 , if y i = − 1 , {\displaystyle {\begin{cases}w^{T}\phi (x_{i})+b\geq 1,&{\text{if }}\quad y_{i}=+1,\\w^{T}\phi (x_{i})+b\leq -1,&{\text{if }}\quad y_{i}=-1,\end{cases}}} which is equivalent to y i [ w T ϕ ( x i ) + b ] ≥ 1 , i = 1 , … , N , {\displaystyle y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1,\quad i=1,\ldots ,N,} where ϕ ( x ) {\displaystyle \phi (x)} is the nonlinear map from original space to the high- or infinite-dimensional space. === Inseparable data === In case such a separating hyperplane does not exist, we introduce so-called slack variables ξ i {\displaystyle \xi _{i}} such that { y i [ w T ϕ ( x i ) + b ] ≥ 1 − ξ i , i = 1 , … , N , ξ i ≥ 0 , i = 1 , … , N . {\displaystyle {\begin{cases}y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1-\xi _{i},&i=1,\ldots ,N,\\\xi _{i}\geq 0,&i=1,\ldots ,N.\end{cases}}} According to the structural risk minimization principle, the risk bound is minimized by the following minimization problem: min J 1 ( w , ξ ) = 1 2 w T w + c ∑ i = 1 N ξ i , {\displaystyle \min J_{1}(w,\xi )={\frac {1}{2}}w^{T}w+c\sum \limits _{i=1}^{N}\xi _{i},} Subject to { y i [ w T ϕ ( x i ) + b ] ≥ 1 − ξ i , i = 1 , … , N , ξ i ≥ 0 , i = 1 , … , N , {\displaystyle {\text{Subject to }}{\begin{cases}y_{i}\left[{w^{T}\phi (x_{i})+b}\right]\geq 1-\xi _{i},&i=1,\ldots ,N,\\\xi _{i}\geq 0,&i=1,\ldots ,N,\end{cases}}} To solve this problem, we could construct the Lagrangian function: L 1 ( w , b , ξ , α , β ) = 1 2 w T w + c ∑ i = 1 N ξ i − ∑ i = 1 N α i { y i [ w T ϕ ( x i ) + b ] − 1 + ξ i } − ∑ i = 1 N β i ξ i , {\displaystyle L_{1}(w,b,\xi ,\alpha ,\beta )={\frac {1}{2}}w^{T}w+c\sum \limits _{i=1}^{N}{\xi _{i}}-\sum \limits _{i=1}^{N}\alpha _{i}\left\{y_{i}\left[{w^{T}\phi (x_{i})+b}\right]-1+\xi _{i}\right\}-\sum \limits _{i=1}^{N}\beta _{i}\xi _{i},} where α i ≥ 0 , β i ≥ 0 ( i = 1 , … , N ) {\displaystyle \alpha _{i}\geq 0,\ \beta _{i}\geq 0\ (i=1,\ldots ,N)} are the Lagrangian multipliers. The optimal point will be in the saddle point of the Lagrangian function, and then we obtain By substituting w {\displaystyle w} by its expression in the Lagrangian formed from the appropriate objective and constraints, we will get the following quadratic programming problem: max Q 1 ( α ) = − 1 2 ∑ i , j = 1 N α i α j y i y j K ( x i , x j ) + ∑ i = 1 N α i , {\displaystyle \max Q_{1}(\alpha )=-{\frac {1}{2}}\sum \limits _{i,j=1}^{N}{\alpha _{i}\alpha _{j}y_{i}y_{j}K(x_{i},x_{j})}+\sum \limits _{i=1}^{N}\alpha _{i},} where K ( x i , x j ) = ⟨ ϕ ( x i ) , ϕ ( x j ) ⟩ {\displaystyle K(x_{i},x_{j})=\left\langle \phi (x_{i}),\phi (x_{j})\right\rangle } is called the kernel function. Solving this QP problem subject to constraints in (1), we will get the hyperplane in the high-dimensional space and hence the classifier in the original space. === Least-squares SVM formulation === The least-squares version of the SVM classifier is obtained by reformulating the minimization problem as min J 2 ( w , b , e ) = μ 2 w T w + ζ 2 ∑ i = 1 N e i 2 , {\displaystyle \min J_{2}(w,b,e)={\frac {\mu }{2}}w^{T}w+{\frac {\zeta }{2}}\sum \limits _{i=1}^{N}e_{i}^{2},} subject to the equality constraints y i [ w T ϕ ( x i ) + b ] = 1 − e i , i = 1 , … , N . {\displaystyle y_{i}\left[{w^{T}\phi (x_{i})+b}\right]=1-e_{i},\quad i=1,\ldots ,N.} The least-squares SVM (LS-SVM) classifier formulation above implicitly corresponds to a regression interpretation with binary targets y i = ± 1 {\displaystyle y_{i}=\pm 1} . Using y i 2 = 1 {\displaystyle y_{i}^{2}=1} , we have ∑ i = 1 N e i 2 = ∑ i = 1 N ( y i e i ) 2 = ∑ i = 1 N e i 2 = ∑ i = 1 N ( y i − ( w T ϕ ( x i ) + b ) ) 2 , {\displaystyle \sum \limits _{i=1}^{N}e_{i}^{2}=\sum \limits _{i=1}^{N}(y_{i}e_{i})^{2}=\sum \limits _{i=1}^{N}e_{i}^{2}=\sum \limits _{i=1}^{N}\left(y_{i}-(w^{T}\phi (x_{i})+b)\right)^{2},} with e i = y i − ( w T ϕ ( x i ) + b ) . {\displaystyle e_{i}=y_{i}-(w^{T}\phi (x_{i})+b).} Notice, that this error would also make sense for least-squares data fitting, so that the same end results holds for the regression case. Hence the LS-SVM classifier formulation is equivalent to J 2 ( w , b , e ) = μ E W + ζ E D {\displaystyle J_{2}(w,b,e)=\mu E_{W}+\zeta E_{D}} with E W = 1 2 w T w {\displaystyle E_{W}={\frac {1}{2}}w^{T}w} and E D = 1 2 ∑ i = 1 N e i 2 = 1 2 ∑ i = 1 N ( y i − ( w T ϕ ( x i ) + b ) ) 2 . {\displaystyle E_{D}={\frac {1}{2}}\sum \limits _{i=1}^{N}e_{i}^{2}={\frac {1}{2}}\sum \limits _{i=1}^{N}\left(y_{i}-(w^{T}\phi (x_{i})+b)\right)^{2}.} Both μ {\displaystyle \mu } and ζ {\displaystyle \zeta } should be considered as hyperparameters to tune the amount of regularization versus the sum squared error. The solution does only depend on the ratio γ = ζ / μ {\displaystyle \gamma =\zeta /\mu } , therefore the original formulation uses only γ {\displaystyle \gamma } as tuning parameter. We use both μ {\displaystyle \mu } and ζ {\displaystyle \zeta } as parameters in order to provide a Bayesian interpretation to LS-SVM. The solution of LS-SVM regressor will be obtained after we construct the Lagrangian function: { L 2 ( w , b , e , α ) = J 2 ( w , e ) − ∑ i = 1 N α i { [ w T ϕ ( x i ) + b ] + e i − y i } , = 1 2 w T w + γ 2 ∑ i = 1 N e i 2 − ∑ i = 1 N α i { [ w T ϕ ( x i ) + b ] + e i − y i } , {\displaystyle {\begin{cases}L_{2}(w,b,e,\alpha )\;=J_{2}(w,e)-\sum \limits _{i=1}^{N}\alpha _{i}\left\{{\left[{w^{T}\phi (x_{i})+b}\right]+e_{i}-y_{i}}\right\},\\\quad \quad \quad \quad \quad \;={\frac {1}{2}}w^{T}w+{\frac {\gamma }{2}}\sum \limits _{i=1}^{N}e_{i}^{2}-\sum \limits _{i=1}^{N}\alpha _{i}\left\{\left[w^{T}\phi (x_{i})+b\right]+e_{i}-y_{i}\right\},\end{cases}}} where α i ∈ R {\displaystyle \alpha _{i}\in \mathbb {R} } are the Lagrange multipliers. The conditions for optimality are { ∂ L 2 ∂ w = 0 → w = ∑ i = 1 N α i ϕ ( x i ) , ∂ L 2 ∂ b = 0 → ∑ i = 1 N α i = 0 , ∂ L 2 ∂ e i = 0 → α i = γ e i , i = 1 , … , N , ∂ L 2 ∂ α i = 0 → y i = w T ϕ ( x i ) + b + e i , i = 1 , … , N . {\displaystyle {\begin{cases}{\frac {\partial L_{2}}{\partial w}}=0\quad \to \quad w=\sum \limits _{i=1}^{N}\alpha _{i}\phi (x_{i}),\\{\frac {\partial L_{2}}{\partial b}}=0\quad \to \quad \sum \limits _{i=1}^{N}\alpha _{i}=0,\\{\frac {\partial L_{2}}{\partial e_{i}}}=0\quad \to \quad \alpha _{i}=\gamma e_{i},\;i=1,\ldots ,N,\\{\frac {\partial L_{2}}{\partial \alpha _{i}}}=0\quad \to \quad y_{i}=w^{T}\phi (x_{i})+b+e_{i},\,i=1,\ldots ,N.\end{cases}}} Elimination of w {\displaystyle w} and e {\displaystyle e} will yield a linear system instead of a quadratic programming problem: [ 0 1 N T 1 N Ω + γ − 1 I N ] [ b α ] = [ 0 Y ] , {\displaystyle \left[{\begin{matrix}0&1_{N}^{T}\\1_{N}&\Omega +\gamma ^{-1}I_{N}\end{matrix}}\right]\left[{\begin{matrix}b\\\alpha \end{matrix}}\right]=\left[{\begin{matrix}0\\Y\end{matrix}}\right],} with Y = [ y 1 , … , y N ] T {\displaystyle Y=[y_{1},\ldots ,y_{N}]^{T}} , 1 N = [ 1 , … , 1 ] T {\displaystyle 1_{N}=[1,\ldots ,1]^{T}} and α = [ α 1 , … , α N ] T {\displaystyle \alpha =[\alpha _{1},\ldots ,\alpha _{N}]^{T}} . Here, I N {\displaystyle I_{N}} is an N × N {\displaystyle N\times N} identity matrix, and Ω ∈ R N × N {\displaystyle \Omega \in \mathbb {R} ^{N\times N}} is the kernel matrix defined by Ω i j = ϕ ( x i ) T ϕ ( x j ) = K ( x i , x j ) {\displaystyle \Omega _{ij}=\phi (x_{i})^{T}\phi (x_{j})=K(x_{i},x_{j})} . === Kernel function K === For the kernel function K(•, •) one typically has the following choices: Linear kernel : K ( x , x i ) = x i T x , {\displaystyle K(x,x_{i})=x_{i}^{T}x,} Polynomial kernel of degree d {\displaystyle d} : K ( x , x i ) = ( 1 + x i T x / c ) d , {\displaystyle K(x,x_{i})=\left({1+x_{i}^{T}x/c}\right)^{d},} Radial basis function RBF kernel : K ( x , x i ) = exp ( − ‖ x − x i ‖ 2 / σ 2 ) , {\displaystyle K(x,x_{i})=\exp \left({-\left\|{x-x_{i}}\right\|^{2}/\sigma ^{2}}\right),} MLP kernel : K ( x , x i ) = tanh ( k x i T x + θ ) , {\displaystyle K(x,x_{i})=\tanh \left({k