Fuzzy electronics is an electronic technology that uses fuzzy logic, instead of the two-state Boolean logic more commonly used in digital electronics. Fuzzy electronics is fuzzy logic implemented on dedicated hardware. This is to be compared with fuzzy logic implemented in software running on a conventional processor. Fuzzy electronics has a wide range of applications, including control systems and artificial intelligence. == History == The first fuzzy electronic circuit was built by Takeshi Yamakawa et al. in 1980 using discrete bipolar transistors. The first industrial fuzzy application was in a cement kiln in Denmark in 1982. The first VLSI fuzzy electronics was by Masaki Togai and Hiroyuki Watanabe in 1984. In 1987, Yamakawa built the first analog fuzzy controller. The first digital fuzzy processors came in 1988 by Togai (Russo, pp. 2–6). In the early 1990s, the first fuzzy logic chips were presented to the public. Two companies which are Omron and NEC have announced the development of dedicated fuzzy electronic hardware in the year 1991. Two years later, the Japanese Omron Cooperation has shown a working fuzzy chip during a technical fair.
Cybernetic Serendipity
Cybernetic Serendipity was an exhibition of cybernetic art curated by Jasia Reichardt, shown at the Institute of Contemporary Arts, London, England, from 2 August to 20 October 1968, and then toured across the United States. Two stops in the United States were the Corcoran Annex (Corcoran Gallery of Art), Washington, D.C., from 16 July to 31 August 1969, and the newly opened Exploratorium in San Francisco, from 1 November to 18 December 1969. == Content == One part of the exhibition was concerned with algorithms and devices for generating music. Some exhibits were pamphlets describing the algorithms, whilst others showed musical notation produced by computers. Devices made musical effects and played tapes of sounds made by computers. Peter Zinovieff lent part of his studio equipment - visitors could sing or whistle a tune into a microphone and his equipment would improvise a piece of music based on the tune. Another part described computer projects such as Gustav Metzger's self-destructive Five Screens With Computer, a design for a new hospital, a computer programmed structure, and dance choreography. The machines and installations were a very noticeable part of the exhibition. Gordon Pask produced a collection of large mobiles (Colloquy of Mobiles (1968)) with interacting parts that let the viewers join in the conversation. Many machines formed kinetic environments or displayed moving images. Bruce Lacey contributed his radio-controlled robots and a light-sensitive owl. Nam June Paik was represented by Robot K-456 and televisions with distorted images. Jean Tinguely provided two of his painting machines. Edward Ihnatowicz's biomorphic hydraulic ear (Sound Activated Mobile (SAM, 1968)) turned toward sounds and John Billingsley's Albert 1967 turned to face light. Wen-Ying Tsai presented his interactive cybernetic sculptures of vibrating stainless-steel rods, stroboscopic light, and audio feedback control. Several artists exhibited machines that drew patterns that the visitor could take away, or involved visitors in games. Cartoonist Rowland Emett designed the mechanical computer Forget-me-not, which was commissioned by Honeywell. Another section explored the computer's ability to produce text - both essays and poetry. Different programs produced Haiku, children's stories, and essays. One of the first computer-generated poems, by Alison Knowles and James Tenney, was included in the exhibition and catalogue. Computer-generated movies were represented by John Whitney's Permutations and a Bell Labs movie on their technology for producing movies. Some samples included images of tesseracts rotating in four dimensions, a satellite orbiting the Earth, and an animated data structure. Computer graphics were also represented, including pictures produced on cathode ray oscilloscopes and digital plotters. There was a variety of posters and graphics demonstrating the power of computers to do complex (and apparently random) calculations. Other graphics showed a simulated Mondrian and the iconic decreasing squares spiral that appeared on the exhibition's poster and book. The Boeing Company exhibited their use of wireframe graphics. The innovative computer-generated sculpture, Quad 1, was displayed at the Cybernetic Serendipity exhibit. Created by the American abstract expressionist sculptor, Robert Mallary, in 1968, Quad 1 is widely believed to be the world's first Computer Aided Design sculpture. Keith Albarn & Partners contributed to the design of the exhibition. Reflecting the prominence of music in the show, a ten-track album Cybernetic Serendipity Music was released by the ICA to accompany the show. Artists featured included Iannis Xenakis, John Cage, and Peter Zinovieff, a detail of whose graphic score for 'Four Sacred April Rounds’ (1968) was used as the cover artwork. == Attendance == Time magazine noted that there had been 40,000 visitors to the London exhibition. Other reports suggested visitor numbers were as high as 44,000 to 60,000. However, the ICA did not accurately count visitors. == After-effects == The exhibition provided the energy for the formation of British Computer Arts Society which continued to explore the interaction between science, technology and art, and put on exhibitions (for example Event One at the Royal College of Art). Several pieces were purchased by the Exploratorium in 1971, some of which are on display to this day. In 2014 the ICA held a retrospective exhibition Cybernetic Serendipity: A Documentation which included documents, installation photographs, press reviews and publications and a series of discussions in one of which Peter Zinovieff took part. To coincide with the exhibition, Cybernetic Serendipity Music was re-released as a limited-edition vinyl LP by The Vinyl Factory. The Victoria and Albert Museum marked the 50th anniversary with an exhibition in 2018 entitled "Chance and Control: Art in the Age of Computers". The V&A exhibition included many works by artists who featured in the original ICA show, plus related ephemera. "Chance and Control" subsequently toured to Chester Visual Arts and Firstsite, Colchester. In 2020, The Centre Pompidou exhibited the replica of Gordon Pask's 1968 Colloquy of Mobiles, reproduced by Paul Pangaro and TJ McLeish in 2018. In 2022 the Australian National University's School of Cybernetics launched the school by presenting an exhibition Australian Cybernetic: a point through time. The exhibition included works from Cybernetic Serendipity (1968), Australia ‘75: Festival of Creative Arts and Science (1975), and contemporary pieces curated by the School of Cybernetics. In describing Reichardt's Cybernetic Serendipity exhibition the school stated that it "represented points of expanding the cybernetic imagination" and was a "ground-breaking" "glimpse of a future in which computers were entangled with people and cultures, and through this she fashioned a blueprint for the future of computing that has since inspired generations".
Principal component analysis
Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data are linearly transformed onto a new coordinate system such that the directions (principal components) capturing the largest variation in the data can be easily identified. The principal components of a collection of points in a real coordinate space are a sequence of p {\displaystyle p} unit vectors, where the i {\displaystyle i} -th vector is the direction of a line that best fits the data while being orthogonal to the first i − 1 {\displaystyle i-1} vectors. Here, a best-fitting line is defined as one that minimizes the average squared perpendicular distance from the points to the line. These directions (i.e., principal components) constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science. == Overview == When performing PCA, the first principal component of a set of p {\displaystyle p} variables is the derived variable formed as a linear combination of the original variables that explains the most variance. The second principal component explains the most variance in what is left once the effect of the first component is removed, and we may proceed through p {\displaystyle p} iterations until all the variance is explained. PCA is most commonly used when many of the variables are highly correlated with each other and it is desirable to reduce their number to an independent set. The first principal component can equivalently be defined as a direction that maximizes the variance of the projected data. The i {\displaystyle i} -th principal component can be taken as a direction orthogonal to the first i − 1 {\displaystyle i-1} principal components that maximizes the variance of the projected data. For either objective, it can be shown that the principal components are eigenvectors of the data's covariance matrix. Thus, the principal components are often computed by eigendecomposition of the data covariance matrix or singular value decomposition of the data matrix. PCA is the simplest of the true eigenvector-based multivariate analyses and is closely related to factor analysis. Factor analysis typically incorporates more domain-specific assumptions about the underlying structure and solves eigenvectors of a slightly different matrix. PCA is also related to canonical correlation analysis (CCA). CCA defines coordinate systems that optimally describe the cross-covariance between two datasets while PCA defines a new orthogonal coordinate system that optimally describes variance in a single dataset. Robust and L1-norm-based variants of standard PCA have also been proposed. == History == PCA was invented in 1901 by Karl Pearson, as an analogue of the principal axis theorem in mechanics; it was later independently developed and named by Harold Hotelling in the 1930s. Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT) in signal processing, the Hotelling transform in multivariate quality control, proper orthogonal decomposition (POD) in mechanical engineering, singular value decomposition (SVD) of X (invented in the last quarter of the 19th century), eigenvalue decomposition (EVD) of XTX in linear algebra, factor analysis (for a discussion of the differences between PCA and factor analysis see Ch. 7 of Jolliffe's Principal Component Analysis), Eckart–Young theorem (Harman, 1960), or empirical orthogonal functions (EOF) in meteorological science (Lorenz, 1956), empirical eigenfunction decomposition (Sirovich, 1987), quasiharmonic modes (Brooks et al., 1988), spectral decomposition in noise and vibration, and empirical modal analysis in structural dynamics. == Intuition == PCA can be thought of as fitting a p-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the ellipsoid is small, then the variance along that axis is also small. To find the axes of the ellipsoid, we must first center the values of each variable in the dataset on 0 by subtracting the mean of the variable's observed values from each of those values. These transformed values are used instead of the original observed values for each of the variables. Then, we compute the covariance matrix of the data and calculate the eigenvalues and corresponding eigenvectors of this covariance matrix. Then we must normalize each of the orthogonal eigenvectors to turn them into unit vectors. Once this is done, each of the mutually-orthogonal unit eigenvectors can be interpreted as an axis of the ellipsoid fitted to the data. This choice of basis will transform the covariance matrix into a diagonalized form, in which the diagonal elements represent the variance of each axis. The proportion of the variance that each eigenvector represents can be calculated by dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues. Biplots and scree plots (degree of explained variance) are used to interpret findings of the PCA. == Details == PCA is defined as an orthogonal linear transformation on a real inner product space that transforms the data to a new coordinate system such that the greatest variance by some scalar projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on. Consider an n × p {\displaystyle n\times p} data matrix, X, with column-wise zero empirical mean (the sample mean of each column has been shifted to zero), where each of the n rows represents a different repetition of the experiment, and each of the p columns gives a particular kind of feature (say, the results from a particular sensor). Mathematically, the transformation is defined by a set of size l {\displaystyle l} (where l {\displaystyle l} is usually selected to be strictly less than p {\displaystyle p} to reduce dimensionality) of p {\displaystyle p} -dimensional vectors of weights or coefficients w ( k ) = ( w 1 , … , w p ) ( k ) {\displaystyle \mathbf {w} _{(k)}=(w_{1},\dots ,w_{p})_{(k)}} that map each row vector x ( i ) = ( x 1 , … , x p ) ( i ) {\displaystyle \mathbf {x} _{(i)}=(x_{1},\dots ,x_{p})_{(i)}} of X to a new vector of principal component scores t ( i ) = ( t 1 , … , t l ) ( i ) {\displaystyle \mathbf {t} _{(i)}=(t_{1},\dots ,t_{l})_{(i)}} , given by t k ( i ) = x ( i ) ⋅ w ( k ) f o r i = 1 , … , n k = 1 , … , l {\displaystyle {t_{k}}_{(i)}=\mathbf {x} _{(i)}\cdot \mathbf {w} _{(k)}\qquad \mathrm {for} \qquad i=1,\dots ,n\qquad k=1,\dots ,l} in such a way that the individual variables t 1 , … , t l {\displaystyle t_{1},\dots ,t_{l}} of t considered over the data set successively inherit the maximum possible variance from X, with each coefficient vector w constrained to be a unit vector. The above may equivalently be written in matrix form as T = X W {\displaystyle \mathbf {T} =\mathbf {X} \mathbf {W} } where T i k = t k ( i ) {\displaystyle {\mathbf {T} }_{ik}={t_{k}}_{(i)}} , X i j = x j ( i ) {\displaystyle {\mathbf {X} }_{ij}={x_{j}}_{(i)}} , and W j k = w j ( k ) {\displaystyle {\mathbf {W} }_{jk}={w_{j}}_{(k)}} . === First component === In order to maximize variance, the first weight vector w(1) thus has to satisfy w ( 1 ) = arg max ‖ w ‖ = 1 { ∑ i ( t 1 ) ( i ) 2 } = arg max ‖ w ‖ = 1 { ∑ i ( x ( i ) ⋅ w ) 2 } {\displaystyle \mathbf {w} _{(1)}=\arg \max _{\Vert \mathbf {w} \Vert =1}\,\left\{\sum _{i}(t_{1})_{(i)}^{2}\right\}=\arg \max _{\Vert \mathbf {w} \Vert =1}\,\left\{\sum _{i}\left(\mathbf {x} _{(i)}\cdot \mathbf {w} \right)^{2}\right\}} Equivalently, writing this in matrix form gives w ( 1 ) = arg max ‖ w ‖ = 1 { ‖ X w ‖ 2 } = arg max ‖ w ‖ = 1 { w T X T X w } {\displaystyle \mathbf {w} _{(1)}=\arg \max _{\left\|\mathbf {w} \right\|=1}\left\{\left\|\mathbf {Xw} \right\|^{2}\right\}=\arg \max _{\left\|\mathbf {w} \right\|=1}\left\{\mathbf {w} ^{\mathsf {T}}\mathbf {X} ^{\mathsf {T}}\mathbf {Xw} \right\}} Since w(1) has been defined to be a unit vector, it equivalently also satisfies w ( 1 ) = arg max { w T X T X w w T w } {\displaystyle \mathbf {w} _{(1)}=\arg \max \left\{{\frac {\mathbf {w} ^{\mathsf {T}}\mathbf {X} ^{\mathsf {T}}\mathbf {Xw} }{\mathbf {w} ^{\mathsf {T}}\mathbf {w} }}\right\}} The quantity to be maximised can be recognised as a Rayleigh quotient. A standard result for a positive semidefinite matrix such as XTX is that the quotient's maximum possible value is the largest eigenvalue of the matrix, which occurs when w is the corresponding eigenvector. With w(1) found, the first principal component of a data vector
Representer theorem
For computer science, in statistical learning theory, a representer theorem is any of several related results stating that a minimizer f ∗ {\displaystyle f^{}} of a regularized empirical risk functional defined over a reproducing kernel Hilbert space can be represented as a finite linear combination of kernel products evaluated on the input points in the training set data. == Formal statement == The following Representer Theorem and its proof are due to Schölkopf, Herbrich, and Smola: Theorem: Consider a positive-definite real-valued kernel k : X × X → R {\displaystyle k:{\mathcal {X}}\times {\mathcal {X}}\to \mathbb {R} } on a non-empty set X {\displaystyle {\mathcal {X}}} with a corresponding reproducing kernel Hilbert space H k {\displaystyle H_{k}} . Let there be given a training sample ( x 1 , y 1 ) , … , ( x n , y n ) ∈ X × R {\displaystyle (x_{1},y_{1}),\dotsc ,(x_{n},y_{n})\in {\mathcal {X}}\times \mathbb {R} } , a strictly increasing real-valued function g : [ 0 , ∞ ) → R {\displaystyle g\colon [0,\infty )\to \mathbb {R} } , and an arbitrary error function E : ( X × R 2 ) n → R ∪ { ∞ } {\displaystyle E\colon ({\mathcal {X}}\times \mathbb {R} ^{2})^{n}\to \mathbb {R} \cup \lbrace \infty \rbrace } , which together define the following regularized empirical risk functional on H k {\displaystyle H_{k}} : f ↦ E ( ( x 1 , y 1 , f ( x 1 ) ) , … , ( x n , y n , f ( x n ) ) ) + g ( ‖ f ‖ ) . {\displaystyle f\mapsto E\left((x_{1},y_{1},f(x_{1})),\ldots ,(x_{n},y_{n},f(x_{n}))\right)+g\left(\lVert f\rVert \right).} Then, any minimizer of the empirical risk f ∗ = argmin f ∈ H k { E ( ( x 1 , y 1 , f ( x 1 ) ) , … , ( x n , y n , f ( x n ) ) ) + g ( ‖ f ‖ ) } , ( ∗ ) {\displaystyle f^{}={\underset {f\in H_{k}}{\operatorname {argmin} }}\left\lbrace E\left((x_{1},y_{1},f(x_{1})),\ldots ,(x_{n},y_{n},f(x_{n}))\right)+g\left(\lVert f\rVert \right)\right\rbrace ,\quad ()} admits a representation of the form: f ∗ ( ⋅ ) = ∑ i = 1 n α i k ( ⋅ , x i ) , {\displaystyle f^{}(\cdot )=\sum _{i=1}^{n}\alpha _{i}k(\cdot ,x_{i}),} where α i ∈ R {\displaystyle \alpha _{i}\in \mathbb {R} } for all 1 ≤ i ≤ n {\displaystyle 1\leq i\leq n} . Proof: Define a mapping φ : X → H k φ ( x ) = k ( ⋅ , x ) {\displaystyle {\begin{aligned}\varphi \colon {\mathcal {X}}&\to H_{k}\\\varphi (x)&=k(\cdot ,x)\end{aligned}}} (so that φ ( x ) = k ( ⋅ , x ) {\displaystyle \varphi (x)=k(\cdot ,x)} is itself a map X → R {\displaystyle {\mathcal {X}}\to \mathbb {R} } ). Since k {\displaystyle k} is a reproducing kernel, then φ ( x ) ( x ′ ) = k ( x ′ , x ) = ⟨ φ ( x ′ ) , φ ( x ) ⟩ , {\displaystyle \varphi (x)(x')=k(x',x)=\langle \varphi (x'),\varphi (x)\rangle ,} where ⟨ ⋅ , ⋅ ⟩ {\displaystyle \langle \cdot ,\cdot \rangle } is the inner product on H k {\displaystyle H_{k}} . Given any x 1 , … , x n {\displaystyle x_{1},\ldots ,x_{n}} , one can use orthogonal projection to decompose any f ∈ H k {\displaystyle f\in H_{k}} into a sum of two functions, one lying in span { φ ( x 1 ) , … , φ ( x n ) } {\displaystyle \operatorname {span} \left\lbrace \varphi (x_{1}),\ldots ,\varphi (x_{n})\right\rbrace } , and the other lying in the orthogonal complement: f = ∑ i = 1 n α i φ ( x i ) + v , {\displaystyle f=\sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})+v,} where ⟨ v , φ ( x i ) ⟩ = 0 {\displaystyle \langle v,\varphi (x_{i})\rangle =0} for all i {\displaystyle i} . The above orthogonal decomposition and the reproducing property together show that applying f {\displaystyle f} to any training point x j {\displaystyle x_{j}} produces f ( x j ) = ⟨ ∑ i = 1 n α i φ ( x i ) + v , φ ( x j ) ⟩ = ∑ i = 1 n α i ⟨ φ ( x i ) , φ ( x j ) ⟩ , {\displaystyle f(x_{j})=\left\langle \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})+v,\varphi (x_{j})\right\rangle =\sum _{i=1}^{n}\alpha _{i}\langle \varphi (x_{i}),\varphi (x_{j})\rangle ,} which we observe is independent of v {\displaystyle v} . Consequently, the value of the error function E {\displaystyle E} in () is likewise independent of v {\displaystyle v} . For the second term (the regularization term), since v {\displaystyle v} is orthogonal to ∑ i = 1 n α i φ ( x i ) {\displaystyle \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})} and g {\displaystyle g} is strictly monotonic, we have g ( ‖ f ‖ ) = g ( ‖ ∑ i = 1 n α i φ ( x i ) + v ‖ ) = g ( ‖ ∑ i = 1 n α i φ ( x i ) ‖ 2 + ‖ v ‖ 2 ) ≥ g ( ‖ ∑ i = 1 n α i φ ( x i ) ‖ ) . {\displaystyle {\begin{aligned}g\left(\lVert f\rVert \right)&=g\left(\lVert \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})+v\rVert \right)\\&=g\left({\sqrt {\lVert \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})\rVert ^{2}+\lVert v\rVert ^{2}}}\right)\\&\geq g\left(\lVert \sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})\rVert \right).\end{aligned}}} Therefore, setting v = 0 {\displaystyle v=0} does not affect the first term of (), while it strictly decreases the second term. Consequently, any minimizer f ∗ {\displaystyle f^{}} in () must have v = 0 {\displaystyle v=0} , i.e., it must be of the form f ∗ ( ⋅ ) = ∑ i = 1 n α i φ ( x i ) = ∑ i = 1 n α i k ( ⋅ , x i ) , {\displaystyle f^{}(\cdot )=\sum _{i=1}^{n}\alpha _{i}\varphi (x_{i})=\sum _{i=1}^{n}\alpha _{i}k(\cdot ,x_{i}),} which is the desired result. == Generalizations == The Theorem stated above is a particular example of a family of results that are collectively referred to as "representer theorems"; here we describe several such. The first statement of a representer theorem was due to Kimeldorf and Wahba for the special case in which E ( ( x 1 , y 1 , f ( x 1 ) ) , … , ( x n , y n , f ( x n ) ) ) = 1 n ∑ i = 1 n ( f ( x i ) − y i ) 2 , g ( ‖ f ‖ ) = λ ‖ f ‖ 2 {\displaystyle {\begin{aligned}E\left((x_{1},y_{1},f(x_{1})),\ldots ,(x_{n},y_{n},f(x_{n}))\right)&={\frac {1}{n}}\sum _{i=1}^{n}(f(x_{i})-y_{i})^{2},\\g(\lVert f\rVert )&=\lambda \lVert f\rVert ^{2}\end{aligned}}} for λ > 0 {\displaystyle \lambda >0} . Schölkopf, Herbrich, and Smola generalized this result by relaxing the assumption of the squared-loss cost and allowing the regularizer to be any strictly monotonically increasing function g ( ⋅ ) {\displaystyle g(\cdot )} of the Hilbert space norm. It is possible to generalize further by augmenting the regularized empirical risk functional through the addition of unpenalized offset terms. For example, Schölkopf, Herbrich, and Smola also consider the minimization f ~ ∗ = argmin { E ( ( x 1 , y 1 , f ~ ( x 1 ) ) , … , ( x n , y n , f ~ ( x n ) ) ) + g ( ‖ f ‖ ) ∣ f ~ = f + h ∈ H k ⊕ span { ψ p ∣ 1 ≤ p ≤ M } } , ( † ) {\displaystyle {\tilde {f}}^{}=\operatorname {argmin} \left\lbrace E\left((x_{1},y_{1},{\tilde {f}}(x_{1})),\ldots ,(x_{n},y_{n},{\tilde {f}}(x_{n}))\right)+g\left(\lVert f\rVert \right)\mid {\tilde {f}}=f+h\in H_{k}\oplus \operatorname {span} \lbrace \psi _{p}\mid 1\leq p\leq M\rbrace \right\rbrace ,\quad (\dagger )} i.e., we consider functions of the form f ~ = f + h {\displaystyle {\tilde {f}}=f+h} , where f ∈ H k {\displaystyle f\in H_{k}} and h {\displaystyle h} is an unpenalized function lying in the span of a finite set of real-valued functions { ψ p : X → R ∣ 1 ≤ p ≤ M } {\displaystyle \lbrace \psi _{p}\colon {\mathcal {X}}\to \mathbb {R} \mid 1\leq p\leq M\rbrace } . Under the assumption that the n × M {\displaystyle n\times M} matrix ( ψ p ( x i ) ) i p {\displaystyle \left(\psi _{p}(x_{i})\right)_{ip}} has rank M {\displaystyle M} , they show that the minimizer f ~ ∗ {\displaystyle {\tilde {f}}^{}} in ( † ) {\displaystyle (\dagger )} admits a representation of the form f ~ ∗ ( ⋅ ) = ∑ i = 1 n α i k ( ⋅ , x i ) + ∑ p = 1 M β p ψ p ( ⋅ ) {\displaystyle {\tilde {f}}^{}(\cdot )=\sum _{i=1}^{n}\alpha _{i}k(\cdot ,x_{i})+\sum _{p=1}^{M}\beta _{p}\psi _{p}(\cdot )} where α i , β p ∈ R {\displaystyle \alpha _{i},\beta _{p}\in \mathbb {R} } and the β p {\displaystyle \beta _{p}} are all uniquely determined. The conditions under which a representer theorem exists were investigated by Argyriou, Micchelli, and Pontil, who proved the following: Theorem: Let X {\displaystyle {\mathcal {X}}} be a nonempty set, k {\displaystyle k} a positive-definite real-valued kernel on X × X {\displaystyle {\mathcal {X}}\times {\mathcal {X}}} with corresponding reproducing kernel Hilbert space H k {\displaystyle H_{k}} , and let R : H k → R {\displaystyle R\colon H_{k}\to \mathbb {R} } be a differentiable regularization function. Then given a training sample ( x 1 , y 1 ) , … , ( x n , y n ) ∈ X × R {\displaystyle (x_{1},y_{1}),\ldots ,(x_{n},y_{n})\in {\mathcal {X}}\times \mathbb {R} } and an arbitrary error function E : ( X × R 2 ) m → R ∪ { ∞ } {\displaystyle E\colon ({\mathcal {X}}\times \mathbb {R} ^{2})^{m}\to \mathbb {R} \cup \lbrace \infty \rbrace } , a minimizer f ∗ = argmin f ∈ H k { E ( ( x 1 , y 1 , f ( x 1 ) ) , … , ( x n , y n , f ( x n ) ) ) + R ( f ) } ( ‡ ) {\displaystyle f^{}={\underset {f\in H_{k}}{\operatorname {argmin} }}\left\lbrace E\left((x_{1},y_{1},f(x_{1})),\ldots ,(x_{n},y_{n},f(x_{n}))\right)+R(f)\right\rbrace \quad (\ddagger )} of the regularized empirical risk admits a repr
Self-play
Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves". == Definition and motivation == In multi-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage: It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge. It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning. Czarnecki et al argue that most of the games that people play for fun are "Games of Skill", meaning games whose space of all possible strategies looks like a spinning top. In more detail, we can partition the space of strategies into sets L 1 , L 2 , . . . , L n {\displaystyle L_{1},L_{2},...,L_{n}} , such that any i < j , π i ∈ L i , π j ∈ L j {\displaystyle i Time-compressed speech refers to an audio recording of verbal text in which the text is presented in a much shorter time interval than it would through normally-paced real time speech. The basic purpose is to make recorded speech contain more words in a given time, yet still be understandable. For example: a paragraph that might normally be expected to take 20 seconds to read, might instead be presented in 15 seconds, which would represent a time-compression of 25% (5 seconds out of 20). The term "time-compressed speech" should not be confused with "speech compression", which controls the volume range of a sound, but does not alter its time envelope. == Methods == While some voice talents are capable of speaking at rates significantly in excess of general norms, the term "time-compressed speech" most usually refers to examples in which the time-reduction has been accomplished through some form of electronic processing of the recorded speech. In general, recorded speech can be electronically time-compressed by: increasing its speed (linear compression); removing silences (selective editing); a combination of the two (non-linear compression). The speed of a recording can be increased, which will cause the material to be presented at a faster rate (and hence in a shorter amount of time), but this has the undesirable side-effect of increasing the frequency of the whole passage, raising the pitch of the voices, which can reduce intelligibility. There are normally silences between words and sentences, and even small silences within certain words, both of which can be reduced or removed ("edited-out") which will also reduce the amount of time occupied by the full speech recording. However, this can also have the effect of removing verbal "punctuation" from the speech, causing words and sentences to run together unnaturally, again reducing intelligibility. Vowels are typically held a minimum of 20 milliseconds, over many cycles of the fundamental pitch. DSP systems can detect the beginning and end of each cycle and then skip over some fraction of those cycles, causing the material to be presented at a faster rate, without changing the pitch, maintaining a "normal" tone of voice. The current preferred method of time-compression is called "non-linear compression", which employs a combination of selectively removing silences; speeding up the speech to make the reduced silences sound normally-proportioned to the text; and finally applying various data algorithms to bring the speech back down to the proper pitch. This produces a more acceptable result than either of the two earlier techniques; however, if unrestrained, removing the silences and increasing the speed can make a selection of speech sound more insistent, possibly to the point of unpleasantness. == Applications == === Advertising === Time-compressed speech is frequently used in television and radio advertising. The advantage of time-compressed speech is that the same number of words can be compressed into a smaller amount of time, reducing advertising costs, and/or allowing more information to be included in a given radio or TV advertisement. It is usually most noticeable in the information-dense caveats and disclaimers presented (usually by legal requirement) at the end of commercials—the aural equivalent of the "fine print" in a printed contract. This practice, however, is not new: before electronic methods were developed, spokespeople who could talk extremely quickly and still be understood were widely used as voice talents for radio and TV advertisements, and especially for recording such disclaimers. === Education === Time-compressed speech has educational applications such as increasing the information density of trainings, and as a study aid. A number of studies have demonstrated that the average person is capable of relatively easily comprehending speech delivered at higher-than-normal rates, with the peak occurring at around 25% compression (that is, 25% faster than normal); this facility has been demonstrated in several languages. Conversational speech (in English) takes place at a rate of around 150 wpm (words per minute), but the average person is able to comprehend speech presented at rates of up to 200-250 wpm without undue difficulty. Blind and severely visually impaired subjects scored similar comprehension levels at even higher rates, up to 300-350 wpm. Blind people have been found to use time-compressed speech extensively, for example, when reviewing recorded lectures from high school and college classes, or professional trainings. Comprehension rates in older blind subjects have been found to be as good, or in some cases better than those found in younger sighted subjects. Other studies have determined that the ability to comprehend highly time-compressed speech tends to fall off with increased age, and is also reduced when the language of the time-compressed speech is not the listener's native language. Non-native speakers can, however, improve their comprehension level of time-compressed speech with multiday training. === Voice Mail === Voice mail systems have employed time-compressed speech since as far back as the 1970s. In this application, the technology enables the rapid review of messages in high-traffic systems, by a relatively small number of people. === Streaming Multimedia === Time-compressed speech has been explored as one of a variety of interrelated factors which may be manipulated to increase the efficiency of streaming multimedia presentations, by significantly reducing the latency times involved in the transfer of large digitally encoded media files. Multilinear principal component analysis (MPCA) is a multilinear extension of principal component analysis (PCA) that is used to analyze M-way arrays, also informally referred to as "data tensors". M-way arrays may be modeled by linear tensor models, such as CANDECOMP/Parafac, or by multilinear tensor models, such as multilinear principal component analysis (MPCA) or multilinear (tensor) independent component analysis (MICA). In 2005, Vasilescu and Terzopoulos introduced the Multilinear PCA terminology as a way to better differentiate between multilinear data models that employed 2nd order statistics versus higher order statistics to compute a set of independent components for each mode, such as Multilinear ICA Multilinear PCA may be applied to compute the causal factors of data formation, or as signal processing tool on data tensors whose individual observation have either been vectorized, or whose observations are treated as a collection of column/row observations, an "observation as a matrix", and concatenated into a data tensor. The latter approach is suitable for compression and reducing redundancy in the rows, columns and fibers that are unrelated to the causal factors of data formation. Vasilescu and Terzopoulos in their paper "TensorFaces" introduced the M-mode SVD algorithm which are algorithms misidentified in the literature as the HOSVD or the Tucker which employ the power method or gradient descent, respectively. Vasilescu and Terzopoulos framed the data analysis, recognition and synthesis problems as multilinear tensor problems. Data is viewed as the compositional consequence of several causal factors, that are well suited for multi-modal tensor factor analysis. The power of the tensor framework was showcased by analyzing human motion joint angles, facial images or textures in the following papers: Human Motion Signatures (CVPR 2001, ICPR 2002), face recognition – TensorFaces, (ECCV 2002, CVPR 2003, etc.) and computer graphics – TensorTextures (Siggraph 2004). == The algorithm == The MPCA solution follows the alternating least square (ALS) approach. It is iterative in nature. As in PCA, MPCA works on centered data. Centering is a little more complicated for tensors, and it is problem dependent. == Feature selection == MPCA features: Supervised MPCA is employed in causal factor analysis that facilitates object recognition while a semi-supervised MPCA feature selection is employed in visualization tasks. == Extensions == Various extension of MPCA: Robust MPCA (RMPCA) Multi-Tensor Factorization, that also finds the number of components automatically (MTF)Time-compressed speech
Multilinear principal component analysis