AI Chatbot Ethics

AI Chatbot Ethics — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Deluxe Paint Animation

    Deluxe Paint Animation

    DeluxePaint Animation is a 1990 graphics editor and animation creation package for MS-DOS, based on Deluxe Paint for the Amiga. It was adapted by Brent Iverson with additional animation features by Steve Shaw and released by Electronic Arts. The program requires VGA graphics, MS-DOS 2.1 or higher, and a mouse. == Features == Listed from the back of the box. Complete selection of painting tools — Draw any shape you want, any way you want. Turn any image into a brush. You can rotate, flip, shear, resize, smear, and shade it. 7 levels of magnification — Paint in magnified mode if you want. Use variable zoom for detailed editing at the pixel level. 3-D perspective — Move and rotate images in full 3-D, automatically. Use color cycling and gradient fills to create great special effects. Stencils — Protect your designs from the slip of the hand or a bad idea. A stencil masks your image so you can paint "behind" and "in front of" it. Use the handy Move Dialog to animate brushes in full 3-D — automatically! Ideal for creating spinning titles for low-cost videos. 37 multi-sized fonts

    Read more →
  • Softmax function

    Softmax function

    The softmax function, also known as softargmax or normalized exponential function, converts a tuple of K real numbers into a probability distribution over K possible outcomes. It is a generalization of the logistic function to multiple dimensions, and is used in multinomial logistic regression. The softmax function is often used as the last activation function of a neural network to normalize the output of a network to a probability distribution over predicted output classes. == Definition == The softmax function takes as input a tuple z of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. That is, prior to applying softmax, some tuple components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the interval ( 0 , 1 ) {\displaystyle (0,1)} , and the components will add up to 1, so that they can be interpreted as probabilities. Furthermore, the larger input components will correspond to larger probabilities. Formally, the standard (unit) softmax function σ : R K → ( 0 , 1 ) K {\displaystyle \sigma :\mathbb {R} ^{K}\to (0,1)^{K}} , where ⁠ K > 1 {\displaystyle K>1} ⁠, takes a tuple z = ( z 1 , … , z K ) ∈ R K {\displaystyle \mathbf {z} =(z_{1},\dotsc ,z_{K})\in \mathbb {R} ^{K}} and computes each component of vector σ ( z ) ∈ ( 0 , 1 ) K {\displaystyle \sigma (\mathbf {z} )\in (0,1)^{K}} with σ ( z ) i = e z i ∑ j = 1 K e z j . {\displaystyle \sigma (\mathbf {z} )_{i}={\frac {e^{z_{i}}}{\sum _{j=1}^{K}e^{z_{j}}}}\,.} In words, the softmax applies the standard exponential function to each element z i {\displaystyle z_{i}} of the input tuple z {\displaystyle \mathbf {z} } (consisting of K {\displaystyle K} real numbers), and normalizes these values by dividing by the sum of all these exponentials. The normalization ensures that the sum of the components of the output vector σ ( z ) {\displaystyle \sigma (\mathbf {z} )} is 1. The term "softmax" derives from the amplifying effects of the exponential on any maxima in the input tuple. For example, the standard softmax of ( 1 , 2 , 8 ) {\displaystyle (1,2,8)} is approximately ( 0.001 , 0.002 , 0.997 ) {\displaystyle (0.001,0.002,0.997)} , which amounts to assigning almost all of the total unit weight in the result to the position of the tuple's maximal element (of 8). In general, instead of e a different base b > 0 can be used. As above, if b > 1 then larger input components will result in larger output probabilities, and increasing the value of b will create probability distributions that are more concentrated around the positions of the largest input values. Conversely, if 0 < b < 1 then smaller input components will result in larger output probabilities, and decreasing the value of b will create probability distributions that are more concentrated around the positions of the smallest input values. Writing b = e β {\displaystyle b=e^{\beta }} or b = e − β {\displaystyle b=e^{-\beta }} (for real β) yields the expressions: σ ( z ) i = e β z i ∑ j = 1 K e β z j or σ ( z ) i = e − β z i ∑ j = 1 K e − β z j for i = 1 , … , K . {\displaystyle \sigma (\mathbf {z} )_{i}={\frac {e^{\beta z_{i}}}{\sum _{j=1}^{K}e^{\beta z_{j}}}}{\text{ or }}\sigma (\mathbf {z} )_{i}={\frac {e^{-\beta z_{i}}}{\sum _{j=1}^{K}e^{-\beta z_{j}}}}{\text{ for }}i=1,\dotsc ,K.} A value proportional to the reciprocal of β is sometimes referred to as the temperature: β = 1 / k T {\textstyle \beta =1/kT} , where k is typically 1 or the Boltzmann constant and T is the temperature. A higher temperature results in a more uniform output distribution (i.e. with higher entropy; it is "more random"), while a lower temperature results in a sharper output distribution, with one value dominating. In some fields, the base is fixed, corresponding to a fixed scale, while in others the parameter β (or T) is varied. The softmax function is a multiple-variable generalization of the logistic function. == Interpretations == === Smooth arg max === The Softmax function is a smooth approximation to the arg max function: the function whose value is the index of a tuple's largest element. The name "softmax" may be misleading. Softmax is not a smooth maximum (that is, a smooth approximation to the maximum function). The term "softmax" is also used for the closely related LogSumExp function, which is a smooth maximum. For this reason, some prefer the more accurate term "softargmax", though the term "softmax" is conventional in machine learning. This section uses the term "softargmax" for clarity. Formally, instead of considering the arg max as a function with categorical output 1 , … , n {\displaystyle 1,\dots ,n} (corresponding to the index), consider the arg max function with one-hot representation of the output (assuming there is a unique maximum arg): a r g m a x ⁡ ( z 1 , … , z n ) = ( y 1 , … , y n ) = ( 0 , … , 0 , 1 , 0 , … , 0 ) , {\displaystyle \operatorname {arg\,max} (z_{1},\,\dots ,\,z_{n})=(y_{1},\,\dots ,\,y_{n})=(0,\,\dots ,\,0,\,1,\,0,\,\dots ,\,0),} where the output coordinate y i = 1 {\displaystyle y_{i}=1} if and only if i {\displaystyle i} is the arg max of ( z 1 , … , z n ) {\displaystyle (z_{1},\dots ,z_{n})} , meaning z i {\displaystyle z_{i}} is the unique maximum value of ( z 1 , … , z n ) {\displaystyle (z_{1},\,\dots ,\,z_{n})} . For example, in this encoding a r g m a x ⁡ ( 1 , 5 , 10 ) = ( 0 , 0 , 1 ) , {\displaystyle \operatorname {arg\,max} (1,5,10)=(0,0,1),} since the third argument is the maximum. This can be generalized to multiple arg max values (multiple equal z i {\displaystyle z_{i}} being the maximum) by dividing the 1 between all max args; formally 1/k where k is the number of arguments assuming the maximum. For example, a r g m a x ⁡ ( 1 , 5 , 5 ) = ( 0 , 1 / 2 , 1 / 2 ) , {\displaystyle \operatorname {arg\,max} (1,\,5,\,5)=(0,\,1/2,\,1/2),} since the second and third argument are both the maximum. In case all arguments are equal, this is simply a r g m a x ⁡ ( z , … , z ) = ( 1 / n , … , 1 / n ) . {\displaystyle \operatorname {arg\,max} (z,\dots ,z)=(1/n,\dots ,1/n).} Points z with multiple arg max values are singular points (or singularities, and form the singular set) – these are the points where arg max is discontinuous (with a jump discontinuity) – while points with a single arg max are known as non-singular or regular points. With the last expression given in the introduction, softargmax is now a smooth approximation of arg max: as ⁠ β → ∞ {\displaystyle \beta \to \infty } ⁠, softargmax converges to arg max. There are various notions of convergence of a function; softargmax converges to arg max pointwise, meaning for each fixed input z as ⁠ β → ∞ {\displaystyle \beta \to \infty } ⁠, σ β ( z ) → a r g m a x ⁡ ( z ) . {\displaystyle \sigma _{\beta }(\mathbf {z} )\to \operatorname {arg\,max} (\mathbf {z} ).} However, softargmax does not converge uniformly to arg max, meaning intuitively that different points converge at different rates, and may converge arbitrarily slowly. In fact, softargmax is continuous, but arg max is not continuous at the singular set where two coordinates are equal, while the uniform limit of continuous functions is continuous. The reason it fails to converge uniformly is that for inputs where two coordinates are almost equal (and one is the maximum), the arg max is the index of one or the other, so a small change in input yields a large change in output. For example, σ β ( 1 , 1.0001 ) → ( 0 , 1 ) , {\displaystyle \sigma _{\beta }(1,\,1.0001)\to (0,1),} but σ β ( 1 , 0.9999 ) → ( 1 , 0 ) , {\displaystyle \sigma _{\beta }(1,\,0.9999)\to (1,\,0),} and σ β ( 1 , 1 ) = 1 / 2 {\displaystyle \sigma _{\beta }(1,\,1)=1/2} for all inputs: the closer the points are to the singular set ( x , x ) {\displaystyle (x,x)} , the slower they converge. However, softargmax does converge compactly on the non-singular set. Conversely, as ⁠ β → − ∞ {\displaystyle \beta \to -\infty } ⁠, softargmax converges to arg min in the same way, where here the singular set is points with two arg min values. In the language of tropical analysis, the softmax is a deformation or "quantization" of arg max and arg min, corresponding to using the log semiring instead of the max-plus semiring (respectively min-plus semiring), and recovering the arg max or arg min by taking the limit is called "tropicalization" or "dequantization". It is also the case that, for any fixed β, if one input ⁠ z i {\displaystyle z_{i}} ⁠ is much larger than the others relative to the temperature, T = 1 / β {\displaystyle T=1/\beta } , the output is approximately the arg max. For example, a difference of 10 is large relative to a temperature of 1: σ ( 0 , 10 ) := σ 1 ( 0 , 10 ) = ( 1 / ( 1 + e 10 ) , e 10 / ( 1 + e 10 ) ) ≈ ( 0.00005 , 0.99995 ) {\displaystyle \sigma (0,\,10):=\sigma _{1}(0,\,10)=\left(1/\left(1+e^{10}\right),\,e^{10}/\left(1+e^{10}\right)\right)\approx (0.00005

    Read more →
  • Correlation clustering

    Correlation clustering

    Clustering is the problem of partitioning data points into groups based on similarity or dissimilarity. Correlation clustering is a clustering framework in which a set of objects is partitioned into clusters based on pairwise similarity and dissimilarity information, without requiring the number of clusters to be specified in advance. == Description of the problem == In machine learning, correlation clustering (also known as cluster editing) considers settings in which pairwise similarity or dissimilarity relationships between objects are known. A standard formulation models the input as an unweighted complete graph G = ( V , E ) {\displaystyle G=(V,E)} , where each edge is labeled either + {\displaystyle +} or − {\displaystyle -} (that is, the graph is a signed graph), indicating whether the corresponding endpoints are similar or dissimilar. The goal is to find a clustering (that is, a partition of V {\displaystyle V} ) that either maximizes the number of agreements—the sum of positive edges whose endpoints lie in the same cluster and negative edges whose endpoints lie in different clusters—or minimizes the number of disagreements—the sum of positive edges whose endpoints are separated and negative edges whose endpoints lie in the same cluster. Unlike other clustering methods such as k-means, correlation clustering does not require choosing the number of clusters k {\displaystyle k} in advance. It is not always possible to find a clustering with zero disagreements. For example, consider a triangle graph containing two positive edges and one negative edge. In this case, every clustering incurs at least one disagreement. Such configurations are referred to in the literature as bad triangles. From a computational perspective, optimizing the correlation clustering objective is challenging. The (decision version of the) problem is NP-complete. A large body of subsequent work has developed approximation algorithms for correlation clustering under various assumptions, including complete or general graphs and unweighted or weighted graphs, for both minimization and maximization objectives. This problem is considered one of the fundamental combinatorial optimization problems, and many algorithmic techniques have been developed to address it. The problem has also been studied extensively across multiple disciplines. A comprehensive literature review of early correlation clustering research is provided by Wahid and Hassini. == Formal Definitions == Let G = ( V , E ) {\displaystyle G=(V,E)} be a graph with nodes V {\displaystyle V} and edges E {\displaystyle E} . A clustering of G {\displaystyle G} is a partition of its node set Π = { π 1 , … , π k } {\displaystyle \Pi =\{\pi _{1},\dots ,\pi _{k}\}} with V = π 1 ∪ ⋯ ∪ π k {\displaystyle V=\pi _{1}\cup \dots \cup \pi _{k}} and π i ∩ π j = ∅ {\displaystyle \pi _{i}\cap \pi _{j}=\emptyset } for i ≠ j {\displaystyle i\neq j} . For a given clustering Π {\displaystyle \Pi } , let δ ( Π ) = { { u , v } ∈ E ∣ { u , v } ⊈ π ∀ π ∈ Π } {\displaystyle \delta (\Pi )=\{\{u,v\}\in E\mid \{u,v\}\not \subseteq \pi \;\forall \pi \in \Pi \}} denote the subset of edges of G {\displaystyle G} whose endpoints are in different subsets of the clustering Π {\displaystyle \Pi } . Now, let w : E → R ≥ 0 {\displaystyle w\colon E\to \mathbb {R} _{\geq 0}} be a function that assigns a non-negative weight to each edge of the graph and let E = E + ∪ E − {\displaystyle E=E^{+}\cup E^{-}} be a partition of the edges into attractive ( E + {\displaystyle E^{+}} ) and repulsive ( E − {\displaystyle E^{-}} ) edges; that is, the edges are signed. The minimum disagreement correlation clustering problem is the following optimization problem: minimize Π ∑ e ∈ E + ∩ δ ( Π ) w e + ∑ e ∈ E − ∖ δ ( Π ) w e . {\displaystyle {\begin{aligned}&{\underset {\Pi }{\operatorname {minimize} }}&&\sum _{e\in E^{+}\cap \delta (\Pi )}w_{e}+\sum _{e\in E^{-}\setminus \delta (\Pi )}w_{e}\;.\end{aligned}}} Here, the set E + ∩ δ ( Π ) {\displaystyle E^{+}\cap \delta (\Pi )} contains the attractive edges whose endpoints are in different components with respect to the clustering Π {\displaystyle \Pi } and the set E − ∖ δ ( Π ) {\displaystyle E^{-}\setminus \delta (\Pi )} contains the repulsive edges whose endpoints are in the same component with respect to the clustering Π {\displaystyle \Pi } . Together these two sets contain all edges that disagree with the clustering Π {\displaystyle \Pi } . Similarly to the minimum disagreement correlation clustering problem, the maximum agreement correlation clustering problem is defined as maximize Π ∑ e ∈ E + ∖ δ ( Π ) w e + ∑ e ∈ E − ∩ δ ( Π ) w e . {\displaystyle {\begin{aligned}&{\underset {\Pi }{\operatorname {maximize} }}&&\sum _{e\in E^{+}\setminus \delta (\Pi )}w_{e}+\sum _{e\in E^{-}\cap \delta (\Pi )}w_{e}\;.\end{aligned}}} Here, the set E + ∖ δ ( Π ) {\displaystyle E^{+}\setminus \delta (\Pi )} contains the attractive edges whose endpoints are in the same component with respect to the clustering Π {\displaystyle \Pi } and the set E − ∩ δ ( Π ) {\displaystyle E^{-}\cap \delta (\Pi )} contains the repulsive edges whose endpoints are in different components with respect to the clustering Π {\displaystyle \Pi } . Together these two sets contain all edges that agree with the clustering Π {\displaystyle \Pi } . Instead of formulating the correlation clustering problem in terms of non-negative edge weights and a partition of the edges into attractive and repulsive edges the problem is also formulated in terms of positive and negative edge costs without partitioning the set of edges explicitly. For given weights w : E → R ≥ 0 {\displaystyle w\colon E\to \mathbb {R} _{\geq 0}} and a given partition E = E + ∪ E − {\displaystyle E=E^{+}\cup E^{-}} of the edges into attractive and repulsive edges, the edge costs can be defined by c e = { w e if e ∈ E + − w e if e ∈ E − {\displaystyle {\begin{aligned}c_{e}={\begin{cases}\;\;w_{e}&{\text{if }}e\in E^{+}\\-w_{e}&{\text{if }}e\in E^{-}\end{cases}}\end{aligned}}} for all e ∈ E {\displaystyle e\in E} . An edge whose endpoints are in different clusters is said to be cut. The set δ ( Π ) {\displaystyle \delta (\Pi )} of all edges that are cut is often called a multicut of G {\displaystyle G} . The minimum cost multicut problem is the problem of finding a clustering Π {\displaystyle \Pi } of G {\displaystyle G} such that the sum of the costs of the edges whose endpoints are in different clusters is minimal: minimize Π ∑ e ∈ δ ( Π ) c e . {\displaystyle {\begin{aligned}&{\underset {\Pi }{\operatorname {minimize} }}&&\sum _{e\in \delta (\Pi )}c_{e}\;.\end{aligned}}} Similar to the minimum cost multicut problem, coalition structure generation in weighted graph games is the problem of finding a clustering such that the sum of the costs of the edges that are not cut is maximal: maximize Π ∑ e ∈ E ∖ δ ( Π ) c e . {\displaystyle {\begin{aligned}&{\underset {\Pi }{\operatorname {maximize} }}&&\sum _{e\in E\setminus \delta (\Pi )}c_{e}\;.\end{aligned}}} This formulation is also known as the clique partitioning problem. It can be shown that all four problems that are formulated above are equivalent. This means that a clustering that is optimal with respect to any of the four objectives is optimal for all of the four objectives. == Algorithms == If the graph admits a clustering with zero disagreements, then deleting all negative edges and computing the connected components of the remaining graph yields an optimal clustering. A necessary and sufficient condition for the existence of such a clustering was given by Davis: no cycle in the graph may contain exactly one negative edge. Bansal et al. discuss the NP-completeness proof and also present both a constant factor approximation algorithm and polynomial-time approximation scheme to find the clusters in this setting. Ailon et al. propose a randomized 3-approximation algorithm for the same problem. CC-Pivot(G=(V,E+,E−)) Pick random pivot i ∈ V Set C = { i } {\displaystyle C=\{i\}} , V'=Ø For all j ∈ V, j ≠ i; If (i,j) ∈ E+ then Add j to C Else (If (i,j) ∈ E−) Add j to V' Let G' be the subgraph induced by V' Return clustering C,CC-Pivot(G') The authors show that the above algorithm is a 3-approximation algorithm for correlation clustering. The best polynomial-time approximation algorithm known at the moment for this problem achieves a ~2.06 approximation by rounding a linear program, as shown by Chawla, Makarychev, Schramm, and Yaroslavtsev. Karpinski and Schudy proved existence of a polynomial time approximation scheme (PTAS) for that problem on complete graphs and fixed number of clusters. == Optimal number of clusters == In 2011, it was shown by Bagon and Galun that the optimization of the correlation clustering functional is closely related to well known discrete optimization methods. In their work they proposed a probabilistic analysis of the underlying implicit model that allows the correlation clustering functional to estimate the

    Read more →
  • ARKA descriptors in QSAR

    ARKA descriptors in QSAR

    In computational chemistry and cheminformatics, ARKA descriptors in QSAR are a class of molecular descriptors used in quantitative structure–activity relationship (QSAR) modeling (or related approaches such as QSPR and QSTR), a computational method for predicting the biological activity or toxicity of chemical compounds based on their molecular structure. Molecular descriptors are numerical values that summarize information about a molecule's structure, topology, geometry, or physicochemical properties in a form suitable for machine learning or statistical modeling. ARKA (Arithmetic Residuals in K-Groups Analysis) descriptors differ from traditional descriptors by encoding atomic-level information through recursive autoregression techniques, which aim to capture subtle structural patterns and improve predictive accuracy. They are designed to be both interpretable and well-suited to modeling nonlinear relationships in QSAR studies. == Comparisons == While QSAR is essentially a similarity-based approach, the occurrence of activity/property cliffs may greatly reduce the predictive accuracy of the developed models. The novel Arithmetic Residuals in K-groups Analysis (ARKA) approach is a supervised dimensionality reduction technique developed by the DTC Laboratory, Jadavpur University that can easily identify activity cliffs in a data set. Activity cliffs are similar in their structures but differ considerably in their activity. The basic idea of the ARKA descriptors is to group the conventional QSAR descriptors based on a predefined criterion and then assign weightage to each descriptor in each group. ARKA descriptors have also been used to develop classification-based and regression-based QSAR models with acceptable quality statistics. The ARKA descriptors have been used for the identification of activity cliffs in QSAR studies and/or model development by multiple researchers. A tutorial presentation on the ARKA descriptors is available. Recently a multi-class ARKA framework has been proposed for improved q-RASAR model generation.

    Read more →
  • Mountain car problem

    Mountain car problem

    Mountain Car, a standard testing domain in Reinforcement learning, is a problem in which an under-powered car must drive up a steep hill. Since gravity is stronger than the car's engine, even at full throttle, the car cannot simply accelerate up the steep slope. The car is situated in a valley and must learn to leverage potential energy by driving up the opposite hill before the car is able to make it to the goal at the top of the rightmost hill. The domain has been used as a test bed in various reinforcement learning papers. == Introduction == The mountain car problem, although fairly simple, is commonly applied because it requires a reinforcement learning agent to learn on two continuous variables: position and velocity. For any given state (position and velocity) of the car, the agent is given the possibility of driving left, driving right, or not using the engine at all. In the standard version of the problem, the agent receives a negative reward at every time step when the goal is not reached; the agent has no information about the goal until an initial success. == History == The mountain car problem appeared first in Andrew Moore's PhD thesis (1990). It was later more strictly defined in Singh and Sutton's reinforcement learning paper with eligibility traces. The problem became more widely studied when Sutton and Barto added it to their book Reinforcement Learning: An Introduction (1998). Throughout the years many versions of the problem have been used, such as those which modify the reward function, termination condition, and the start state. == Techniques used to solve mountain car == Q-learning and similar techniques for mapping discrete states to discrete actions need to be extended to be able to deal with the continuous state space of the problem. Approaches often fall into one of two categories, state space discretization or function approximation. === Discretization === In this approach, two continuous state variables are pushed into discrete states by bucketing each continuous variable into multiple discrete states. This approach works with properly tuned parameters but a disadvantage is information gathered from one state is not used to evaluate another state. Tile coding can be used to improve discretization and involves continuous variables mapping into sets of buckets offset from one another. Each step of training has a wider impact on the value function approximation because when the offset grids are summed, the information is diffused. === Function approximation === Function approximation is another way to solve the mountain car. By choosing a set of basis functions beforehand, or by generating them as the car drives, the agent can approximate the value function at each state. Unlike the step-wise version of the value function created with discretization, function approximation can more cleanly estimate the true smooth function of the mountain car domain. === Eligibility traces === One aspect of the problem involves the delay of actual reward. The agent is not able to learn about the goal until a successful completion. Given a naive approach for each trial the car can only backup the reward of the goal slightly. This is a problem for naive discretization because each discrete state will only be backed up once, taking a larger number of episodes to learn the problem. This problem can be alleviated via the mechanism of eligibility traces, which will automatically backup the reward given to states before, dramatically increasing the speed of learning. Eligibility traces can be viewed as a bridge from temporal difference learning methods to Monte Carlo methods. == Technical details == The mountain car problem has undergone many iterations. This section focuses on the standard well-defined version from Sutton (2008). === State variables === Two-dimensional continuous state space. V e l o c i t y = ( − 0.07 , 0.07 ) {\displaystyle Velocity=(-0.07,0.07)} P o s i t i o n = ( − 1.2 , 0.6 ) {\displaystyle Position=(-1.2,0.6)} === Actions === One-dimensional discrete action space. m o t o r = ( l e f t , n e u t r a l , r i g h t ) {\displaystyle motor=(left,neutral,right)} === Reward === For every time step: r e w a r d = − 1 {\displaystyle reward=-1} === Update function === For every time step: A c t i o n = [ − 1 , 0 , 1 ] {\displaystyle Action=[-1,0,1]} V e l o c i t y = V e l o c i t y + ( A c t i o n ) ∗ 0.001 + cos ⁡ ( 3 ∗ P o s i t i o n ) ∗ ( − 0.0025 ) {\displaystyle Velocity=Velocity+(Action)0.001+\cos(3Position)(-0.0025)} P o s i t i o n = P o s i t i o n + V e l o c i t y {\displaystyle Position=Position+Velocity} === Starting condition === Optionally, many implementations include randomness in both parameters to show better generalized learning. P o s i t i o n = − 0.5 {\displaystyle Position=-0.5} V e l o c i t y = 0.0 {\displaystyle Velocity=0.0} === Termination condition === End the simulation when: P o s i t i o n ≥ 0.6 {\displaystyle Position\geq 0.6} == Variations == There are many versions of the mountain car which deviate in different ways from the standard model. Variables that vary include but are not limited to changing the constants (gravity and steepness) of the problem so specific tuning for specific policies become irrelevant and altering the reward function to affect the agent's ability to learn in a different manner. An example is changing the reward to be equal to the distance from the goal, or changing the reward to zero everywhere and one at the goal. Additionally, a 3D mountain car can be used, with a 4D continuous state space.

    Read more →
  • NSynth

    NSynth

    NSynth (a portmanteau of "Neural Synthesis") is a WaveNet-based autoencoder for synthesizing audio, outlined in a paper in April 2017. == Overview == The model generates sounds through a neural network based synthesis, employing a WaveNet-style autoencoder to learn its own temporal embeddings from four different sounds. Google then released an open source hardware interface for the algorithm called NSynth Super, used by notable musicians such as Grimes and YACHT to generate experimental music using artificial intelligence. The research and development of the algorithm was part of a collaboration between Google Brain, Magenta and DeepMind. == Technology == === Dataset === The NSynth dataset is composed of 305,979 one-shot instrumental notes featuring a unique pitch, timbre, and envelope, sampled from 1,006 instruments from commercial sample libraries. For each instrument the dataset contains four-second 16 kHz audio snippets by ranging over every pitch of a standard MIDI piano, as well as five different velocities. The dataset is made available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. === Machine learning model === A spectral autoencoder model and a WaveNet autoencoder model are publicly available on GitHub. The baseline model uses a spectrogram with fft_size 1024 and hop_size 256, MSE loss on the magnitudes, and the Griffin-Lim algorithm for reconstruction. The WaveNet model trains on mu-law encoded waveform chunks of size 6144. It learns embeddings with 16 dimensions that are downsampled by 512 in time. == NSynth Super == In 2018 Google released a hardware interface for the NSynth algorithm, called NSynth Super, designed to provide an accessible physical interface to the algorithm for musicians to use in their artistic production. Design files, source code and internal components are released under an open source Apache License 2.0, enabling hobbyists and musicians to freely build and use the instrument. At the core of the NSynth Super there is a Raspberry Pi, extended with a custom printed circuit board to accommodate the interface elements. == Influence == Despite not being publicly available as a commercial product, NSynth Super has been used by notable artists, including Grimes and YACHT. Grimes reported using the instrument in her 2020 studio album Miss Anthropocene. YACHT announced an extensive use of NSynth Super in their album Chain Tripping. Claire L. Evans compared the potential influence of the instrument to the Roland TR-808. The NSynth Super design was honored with a D&AD Yellow Pencil award in 2018.

    Read more →
  • Synaptic transistor

    Synaptic transistor

    A synaptic transistor is an electrical device that can learn in ways similar to a neural synapse. It optimizes its own properties for the functions it has carried out in the past. The device mimics the behavior of the property of neurons called spike-timing-dependent plasticity, or STDP. == Structure == Its structure is similar to that of a field effect transistor, where an ionic liquid takes the place of the gate insulating layer between the gate electrode and the conducting channel. That channel is composed of samarium nickelate (SmNiO3, or SNO) rather than the field effect transistor's doped silicon. == Function == A synaptic transistor has a traditional immediate response whose amount of current that passes between the source and drain contacts varies with voltage applied to the gate electrode. It also produces a much slower learned response such that the conductivity of the SNO layer varies in response to the transistor's STDP history, essentially by shuttling oxygen ions between the SNO and the ionic liquid. The analog of strengthening a synapse is to increase the SNO's conductivity, which essentially increases gain. Similarly, weakening a synapse is analogous to decreasing the SNO's conductivity, lowering the gain. The input and output of the synaptic transistor are continuous analog values, rather than digital on-off signals. While the physical structure of the device has the potential to learn from history, it contains no way to bias the transistor to control the memory effect. An external supervisory circuit converts the time delay between input and output into a voltage applied to the ionic liquid that either drives ions into the SNO or removes them. A network of such devices can learn particular responses to "sensory inputs", with those responses being learned through experience rather than explicitly programmed.

    Read more →
  • FERET database

    FERET database

    The Facial Recognition Technology (FERET) database is a dataset used for facial recognition system evaluation as part of the Face Recognition Technology (FERET) program. It was first established in 1993 under a collaborative effort between Harry Wechsler at George Mason University and Jonathon Phillips at the Army Research Laboratory in Adelphi, Maryland. The FERET database serves as a standard database of facial images for researchers to use to develop various algorithms and report results. The use of a common database also allowed one to compare the effectiveness of different approaches in methodology and gauge their strengths and weaknesses. The facial images for the database were collected between December 1993 and August 1996, accumulating a total of 14,126 images pertaining to 1,199 individuals along with 365 duplicate sets of images that were taken on a different day. In 2003, the Defense Advanced Research Projects Agency (DARPA) released a high-resolution, 24-bit color version of these images. The dataset tested includes 2,413 still facial images, representing 856 individuals. The FERET database has been used by more than 460 research groups and is managed by the National Institute of Standards and Technology (NIST).

    Read more →
  • Commonsense knowledge (artificial intelligence)

    Commonsense knowledge (artificial intelligence)

    In artificial intelligence research, commonsense knowledge consists of facts about the everyday world, such as "Lemons are sour" or "Cows say moo", that all humans are expected to know. It is currently an unsolved problem in artificial general intelligence. The first AI program to address common sense knowledge was Advice Taker in 1959 by John McCarthy. Commonsense knowledge can underpin a commonsense reasoning process, to attempt inferences such as "You might bake a cake because you want people to eat the cake." A natural language processing process can be attached to the commonsense knowledge base to allow the knowledge base to attempt to answer questions about the world. Common sense knowledge also helps to solve problems in the face of incomplete information. Using widely held beliefs about everyday objects, or common sense knowledge, AI systems make common sense assumptions or default assumptions about the unknown similar to the way people do. In an AI system or in English, this is expressed as "Normally P holds", "Usually P" or "Typically P so Assume P". For example, if we know the fact "Tweety is a bird", because we know the commonly held belief about birds, "typically birds fly," without knowing anything else about Tweety, we may reasonably assume the fact that "Tweety can fly." As more knowledge of the world is discovered or learned over time, the AI system can revise its assumptions about Tweety using a truth maintenance process. If we later learn that "Tweety is a penguin" then truth maintenance revises this assumption because we also know "penguins do not fly". == Commonsense reasoning == Commonsense reasoning simulates the human ability to use commonsense knowledge to make presumptions about the type and essence of ordinary situations they encounter every day, and to change their "minds" should new information come to light. This includes time, missing or incomplete information and cause and effect. The ability to explain cause and effect is an important aspect of explainable AI. Truth maintenance algorithms automatically provide an explanation facility because they create elaborate records of presumptions. Compared with humans, all existing computer programs that attempt human-level AI perform extremely poorly on modern "commonsense reasoning" benchmark tests such as the Winograd Schema Challenge. The problem of attaining human-level competency at "commonsense knowledge" tasks is considered to probably be "AI complete" (that is, solving it would require the ability to synthesize a fully human-level intelligence), although some oppose this notion and believe compassionate intelligence is also required for human-level AI. Common sense reasoning has been applied successfully in more limited domains such as natural language processing and automated diagnosis or analysis. == Commonsense knowledge base construction == Compiling comprehensive knowledge bases of commonsense assertions (CSKBs) is a long-standing challenge in AI research. From early expert-driven efforts like CYC and WordNet, significant advances were achieved via the crowdsourced OpenMind Commonsense project, which led to the crowdsourced ConceptNet KB. Several approaches have attempted to automate CSKB construction, most notably, via text mining (WebChild, Quasimodo, TransOMCS, Ascent), as well as harvesting these directly from pre-trained language models (AutoTOMIC). These resources are significantly larger than ConceptNet, though the automated construction mostly makes them of moderately lower quality. Challenges also remain on the representation of commonsense knowledge: Most CSKB projects follow a triple data model, which is not necessarily best suited for breaking more complex natural language assertions. A notable exception here is GenericsKB, which applies no further normalization to sentences, but retains them in full. == Applications == Around 2013, MIT researchers developed BullySpace, an extension of the commonsense knowledgebase ConceptNet, to catch taunting social media comments. BullySpace included over 200 semantic assertions based around stereotypes, to help the system infer that comments like "Put on a wig and lipstick and be who you really are" are more likely to be an insult if directed at a boy than a girl. ConceptNet has also been used by chatbots and by computers that compose original fiction. At Lawrence Livermore National Laboratory, common sense knowledge was used in an intelligent software agent to detect violations of a comprehensive nuclear test ban treaty. == Data == As an example, as of 2012 ConceptNet includes these 21 language-independent relations: IsA (An "RV" is a "vehicle" | X is an instance of a Y) UsedFor (a "cake tin" is used for "making cakes" | X is used for the purpose Y) HasA (A "rabbit" has a "tail" | X possesses Y element or feature) CapableOf (a "cook" is capable of "making baked goods" | X is capable of doing Y) Desires (a "child" desires "the aroma of baking" | X has a desire for Y) CreatedBy ("cake" is created by a "baker" | X is created by Y) PartOf (a "knife" is be part of a "knife set" | X is a part of Y) Causes ("Heat" causes "cooking"| X is what causes Y) LocatedNear (the "oven" is located near the "refrigerator" | X is located near Y) AtLocation (Somewhere a "Cook" can be at a "restaurant" | X is at the location of Y) DefinedAs (a "Cupcake" is defined as a "cake" that also has the qualities of being "small", "baked within a wrapper", and "containing only one area of frosting or icing" | X is defined as Y that also has the properties A, B & C) SymbolOf (a "heart" is a symbol of "affection" | X is a symbolic representation of Y) ReceivesAction ("cake" can receive the action of being "eaten" | X is capable of receiving action Y) HasPrerequisite ("baking" has the prerequisite of obtaining the "ingredients" | X cannot do Y unless A does B) MotivatedByGoal ("baking" is motivated by the goal of "consumption"/"eating" | X has the motivation of Y goal) CausesDesire ("baking" makesYou want to "follow recipe" | X causes the desire to do Y) MadeOf ("Cake" is made of "flour"/"eggs"/"sugar"/"oil"/etc | X is made of Y) HasFirstSubevent ("baking" has first subevent "make batter" | To do X the first thing that needs to be done is Y) HasSubevent ("eat" has subevent "swallow" | Doing X will lead to Y event following) HasLastSubevent ("sleeping" has last subevent of "waking" | Doing X ends with the event Y) == Commonsense knowledge bases == Cyc Open Mind Common Sense (data source) and ConceptNet (datastore and NLP engine) Evi Graphiq

    Read more →
  • Multi-surface method

    Multi-surface method

    The multi-surface method (MSM) is a form of decision making using the concept of piecewise-linear separability of datasets to categorize data. == Introduction == Two datasets are linearly separable if their convex hulls do not intersect. The method may be formulated as a feedforward neural network with weights that are trained via linear programming. Comparisons between neural networks trained with the MSM versus backpropagation show MSM is better able to classify data. The decision problem associated linear program for the MSM is NP-complete. == Mathematical formulation == Given two finite disjoint point sets A , B ∈ R n {\displaystyle {\mathcal {A,B}}\in \mathbb {R} ^{n}} , find a discriminant, f : R n → R {\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} } such that f ( A ) > 0 , f ( B ) ≤ 0 {\displaystyle f({\mathcal {A}})>0,f({\mathcal {B}})\leq 0} . If the intersection of convex hulls of the two sets is the empty set, then it is possible to use a single linear program to obtain a linear discriminant of the form, f ( x ) = c x + γ {\displaystyle f(x)=cx+\gamma } . Usually, in real applications, the sets' convex hulls do intersect, and a (often non-convex) piecewise-linear discriminant can be used, through the use of several linear programs.

    Read more →
  • FERET (facial recognition technology)

    FERET (facial recognition technology)

    The Facial Recognition Technology (FERET) program was a government-sponsored project that aimed to create a large, automatic face-recognition system for intelligence, security, and law enforcement purposes. The program began in 1993 under the combined leadership of Dr. Harry Wechsler at George Mason University (GMU) and Dr. Jonathon Phillips at the Army Research Laboratory (ARL) in Adelphi, Maryland and resulted in the development of the Facial Recognition Technology (FERET) database. The goal of the FERET program was to advance the field of face recognition technology by establishing a common database of facial imagery for researchers to use and setting a performance baseline for face-recognition algorithms. Potential areas where this face-recognition technology could be used include: Automated searching of mug books using surveillance photos Controlling access to restricted facilities or equipment Checking the credentials of personnel for background and security clearances Monitoring airports, border crossings, and secure manufacturing facilities for particular individuals Finding and logging multiple appearances of individuals over time in surveillance videos Verifying identities at ATM machines Searching photo ID records for fraud detection The FERET database has been used by more than 460 research groups and is currently managed by the National Institute of Standards and Technology (NIST). By 2017, the FERET database has been used to train artificial intelligence programs and computer vision algorithms to identify and sort faces. == History == The origin of facial recognition technology is largely attributed to Woodrow Wilson Bledsoe and his work in the 1960s, when he developed a system to identify faces from a database of thousands of photographs. The FERET program first began as a way to unify a large body of face-recognition technology research under a standard database. Before the program's inception, most researchers created their own facial imagery database that was attuned to their own specific area of study. These personal databases were small and usually consisted of images from less than 50 individuals. The only notable exceptions were the following: Alex Pentland’s database of around 7500 facial images at the Massachusetts Institute of Technology (MIT) Joseph Wilder's database of around 250 individuals at Rutgers University Christoph von der Malsburg’s database of around 100 facial images at the University of Southern California (USC) The lack of a common database made it difficult to compare the results of face recognition studies in the scientific literature because each report involved different assumptions, scoring methods, and images. Most of the papers that were published did not use images from a common database nor follow a standard testing protocol. As a result, researchers were unable to make informed comparisons between the performances of different face-recognition algorithms. In September 1993, the FERET program was spearheaded by Dr. Harry Wechsler and Dr. Jonathon Phillips under the sponsorship of the U.S. Department of Defense Counterdrug Technology Development Program through DARPA with ARL serving as technical agent. === Phase I === The first facial images for the FERET database were collected from August 1993 to December 1994, a time period known as Phase I. The pictures were initially taken with a 35-mm camera at both GMU and ARL facilities, and the same physical setup was used in each photography session to keep the images consistent. For each individual, the pictures were taken in sets, including two frontal views, a right and left profile, a right and left quarter profile, a right and left half profile, and sometimes at five extra locations. Therefore, a set of images consisted of 5 to 11 images per person. At the end of Phase I, the FERET database had collected 673 sets of images, resulting in over 5000 total images. At the end of Phase I, five organizations were given the opportunity to test their face-recognition algorithm on the newly created FERET database in order to compare how they performed against each other. There five principal investigators were: MIT, led by Alex Pentland Rutgers University, led by Joseph Wilder The Analytic Science Company (TASC), led by Gale Gordon The University of Illinois at Chicago (UIC) and the University of Illinois at Urbana-Champaign, led by Lewis Sadler and Thomas Huang USC, led by Christoph von der Malsburg During this evaluation, three different automatic tests were given to the principal investigators without human intervention: The large gallery test, which served to baseline how algorithms performed against a database when it has not been properly tuned. The false-alarm test, which tested how well the algorithm monitored an airport for suspected terrorists. The rotation test, which measured how well the algorithm performed when the images of an individual in the gallery had different poses compared to those in the probe set. For most of the test trials, the algorithms developed by USC and MIT managed to outperform the other three algorithms for the Phase I evaluation. === Phase II === Phase II began after Phase I, and during this time, the FERET database acquired more sets of facial images. By the start of the Phase II evaluation in March 1995, the database contained 1109 sets of images for a total of 8525 images of 884 individuals. During the second evaluation, the same algorithms from the Phase I evaluation were given a single test. However, the database now contained significantly more duplicate images (463, compared to the previous 60), making the test more challenging. === Phase III === Afterwards, the FERET program entered Phase III where another 456 sets of facial images were added to the database. The Phase III evaluation, which took place in September 1996, aimed to not only gauge the progress of the algorithms since the Phase I assessment but also identify the strengths and weaknesses of each algorithm and determine future objectives for research. By the end of 1996, the FERET database had accumulated a total of 14,126 facial images pertaining to 1199 different individuals as well as 365 duplicate sets of images. As a result of the FERET program, researchers were able to establish a common baseline for comparing different face-recognition algorithms and create a large standard database of facial images that is open for research. In 2003, DARPA released a high-resolution, 24-bit color version of the images in the FERET database (existing reference).

    Read more →
  • Online machine learning

    Online machine learning

    In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., prediction of prices in the financial international markets. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches. Online machine learning algorithms find applications in a wide variety of fields such as sponsored search to maximize ad revenue, portfolio optimization, shortest path prediction (with stochastic weights, e.g. traffic on roads for a maps application), spam filtering, real-time fraud detection, dynamic pricing for e-commerce, etc. There is also growing interest in usage of online learning paradigms for LLMs to enable continuous, real-time adaptation after the initial training. == Introduction == In the setting of supervised learning, a function of f : X → Y {\displaystyle f:X\to Y} is to be learned, where X {\displaystyle X} is thought of as a space of inputs and Y {\displaystyle Y} as a space of outputs, that predicts well on instances that are drawn from a joint probability distribution p ( x , y ) {\displaystyle p(x,y)} on X × Y {\displaystyle X\times Y} . In reality, the learner never knows the true distribution p ( x , y ) {\displaystyle p(x,y)} over instances. Instead, the learner usually has access to a training set of examples ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} . In this setting, the loss function is given as V : Y × Y → R {\displaystyle V:Y\times Y\to \mathbb {R} } , such that V ( f ( x ) , y ) {\displaystyle V(f(x),y)} measures the difference between the predicted value f ( x ) {\displaystyle f(x)} and the true value y {\displaystyle y} . The ideal goal is to select a function f ∈ H {\displaystyle f\in {\mathcal {H}}} , where H {\displaystyle {\mathcal {H}}} is a space of functions called a hypothesis space, so that some notion of total loss is minimized. Depending on the type of model (statistical or adversarial), one can devise different notions of loss, which lead to different learning algorithms. == Statistical view of online learning == In statistical learning models, the training sample ( x i , y i ) {\displaystyle (x_{i},y_{i})} are assumed to have been drawn from the true distribution p ( x , y ) {\displaystyle p(x,y)} and the objective is to minimize the expected "risk" I [ f ] = E [ V ( f ( x ) , y ) ] = ∫ V ( f ( x ) , y ) d p ( x , y ) . {\displaystyle I[f]=\mathbb {E} [V(f(x),y)]=\int V(f(x),y)\,dp(x,y)\ .} A common paradigm in this situation is to estimate a function f ^ {\displaystyle {\hat {f}}} through empirical risk minimization or regularized empirical risk minimization (usually Tikhonov regularization). The choice of loss function here gives rise to several well-known learning algorithms such as regularized least squares and support vector machines. A purely online model in this category would learn based on just the new input ( x t + 1 , y t + 1 ) {\displaystyle (x_{t+1},y_{t+1})} , the current best predictor f t {\displaystyle f_{t}} and some extra stored information (which is usually expected to have storage requirements independent of training data size). For many formulations, for example nonlinear kernel methods, true online learning is not possible, though a form of hybrid online learning with recursive algorithms can be used where f t + 1 {\displaystyle f_{t+1}} is permitted to depend on f t {\displaystyle f_{t}} and all previous data points ( x 1 , y 1 ) , … , ( x t , y t ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{t},y_{t})} . In this case, the space requirements are no longer guaranteed to be constant since it requires storing all previous data points, but the solution may take less time to compute with the addition of a new data point, as compared to batch learning techniques. A common strategy to overcome the above issues is to learn using mini-batches, which process a small batch of b ≥ 1 {\displaystyle b\geq 1} data points at a time, this can be considered as pseudo-online learning for b {\displaystyle b} much smaller than the total number of training points. Mini-batch techniques are used with repeated passing over the training data to obtain optimized out-of-core versions of machine learning algorithms, for example, stochastic gradient descent. When combined with backpropagation, this is currently the de facto training method for training artificial neural networks. === Example: linear least squares === The simple example of linear least squares is used to explain a variety of ideas in online learning. The ideas are general enough to be applied to other settings, for example, with other convex loss functions. === Batch learning === Consider the setting of supervised learning with f {\displaystyle f} being a linear function to be learned: f ( x j ) = ⟨ w , x j ⟩ = w ⋅ x j {\displaystyle f(x_{j})=\langle w,x_{j}\rangle =w\cdot x_{j}} where x j ∈ R d {\displaystyle x_{j}\in \mathbb {R} ^{d}} is a vector of inputs (data points) and w ∈ R d {\displaystyle w\in \mathbb {R} ^{d}} is a linear filter vector. The goal is to compute the filter vector w {\displaystyle w} . To this end, a square loss function V ( f ( x j ) , y j ) = ( f ( x j ) − y j ) 2 = ( ⟨ w , x j ⟩ − y j ) 2 {\displaystyle V(f(x_{j}),y_{j})=(f(x_{j})-y_{j})^{2}=(\langle w,x_{j}\rangle -y_{j})^{2}} is used to compute the vector w {\displaystyle w} that minimizes the empirical loss I n [ w ] = ∑ j = 1 n V ( ⟨ w , x j ⟩ , y j ) = ∑ j = 1 n ( x j T w − y j ) 2 {\displaystyle I_{n}[w]=\sum _{j=1}^{n}V(\langle w,x_{j}\rangle ,y_{j})=\sum _{j=1}^{n}(x_{j}^{\mathsf {T}}w-y_{j})^{2}} where y j ∈ R . {\displaystyle y_{j}\in \mathbb {R} .} Let X {\displaystyle X} be the i × d {\displaystyle i\times d} data matrix and y ∈ R i {\displaystyle y\in \mathbb {R} ^{i}} is the column vector of target values after the arrival of the first i {\displaystyle i} data points. Assuming that the covariance matrix Σ i = X T X {\displaystyle \Sigma _{i}=X^{\mathsf {T}}X} is invertible (otherwise it is preferential to proceed in a similar fashion with Tikhonov regularization), the best solution f ∗ ( x ) = ⟨ w ∗ , x ⟩ {\displaystyle f^{}(x)=\langle w^{},x\rangle } to the linear least squares problem is given by w ∗ = ( X T X ) − 1 X T y = Σ i − 1 ∑ j = 1 i x j y j . {\displaystyle w^{}=(X^{\mathsf {T}}X)^{-1}X^{\mathsf {T}}y=\Sigma _{i}^{-1}\sum _{j=1}^{i}x_{j}y_{j}.} Now, calculating the covariance matrix Σ i = ∑ j = 1 i x j x j T {\displaystyle \Sigma _{i}=\sum _{j=1}^{i}x_{j}x_{j}^{\mathsf {T}}} takes time O ( i d 2 ) {\displaystyle O(id^{2})} , inverting the d × d {\displaystyle d\times d} matrix takes time O ( d 3 ) {\displaystyle O(d^{3})} , while the rest of the multiplication takes time O ( d 2 ) {\displaystyle O(d^{2})} , giving a total time of O ( i d 2 + d 3 ) {\displaystyle O(id^{2}+d^{3})} . When there are n {\displaystyle n} total points in the dataset, to recompute the solution after the arrival of every datapoint i = 1 , … , n {\displaystyle i=1,\ldots ,n} , the naive approach will have a total complexity O ( n 2 d 2 + n d 3 ) {\displaystyle O(n^{2}d^{2}+nd^{3})} . Note that when storing the matrix Σ i {\displaystyle \Sigma _{i}} , then updating it at each step needs only adding x i + 1 x i + 1 T {\displaystyle x_{i+1}x_{i+1}^{\mathsf {T}}} , which takes O ( d 2 ) {\displaystyle O(d^{2})} time, reducing the total time to O ( n d 2 + n d 3 ) = O ( n d 3 ) {\displaystyle O(nd^{2}+nd^{3})=O(nd^{3})} , but with an additional storage space of O ( d 2 ) {\displaystyle O(d^{2})} to store Σ i {\displaystyle \Sigma _{i}} . === Online learning: recursive least squares === The recursive least squares (RLS) algorithm considers an online approach to the least squares problem. It can be shown that by initialising w 0 = 0 ∈ R d {\displaystyle \textstyle w_{0}=0\in \mathbb {R} ^{d}} and Γ 0 = I ∈ R d × d {\displaystyle \textstyle \Gamma _{0}=I\in \mathbb {R} ^{d\times d}} , the solution of the linear least squares problem given in the previous section can be computed by the following iteration: Γ i = Γ i − 1 − Γ i − 1 x i x i T Γ i − 1 1 + x i T Γ i − 1 x i {\displaystyle \Gamma _{i}=\Gamma _{i-1}-{\frac {\Gamma _{i-1}x_{i}x_{i}^{\mathsf {T}}\Gamma _{i-1}}{1+x_{i}^{\mathsf {T}}\Gamma _{i-1}x_{i}}}} w i = w i − 1 − Γ i x i ( x i T w i − 1 − y i ) {\displaystyle w_{i}=w_{i-1}-\Gamma _{i}x_{i}\left(x_{i}^{\mathsf {T}}w_{

    Read more →
  • Language resource

    Language resource

    In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications." According to Bird & Simons (2003), this includes data, i.e. "any information that documents or describes a language, such as a published monograph, a computer data file, or even a shoebox full of handwritten index cards. The information could range in content from unanalyzed sound recordings to fully transcribed and annotated texts to a complete descriptive grammar", tools, i.e., "computational resources that facilitate creating, viewing, querying, or otherwise using language data", and advice, i.e., "any information about what data sources are reliable, what tools are appropriate in a given situation, what practices to follow when creating new data". The latter aspect is usually referred to as "best practices" or "(community) standards". In a narrower sense, language resource is specifically applied to resources that are available in digital form, and then, "encompassing (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, and (b) tools/technologies/services used for their processing and management". == Typology == As of May 2020, no widely used standard typology of language resources has been established (current proposals include the LREMap, METASHARE, and, for data, the LLOD classification). Important classes of language resources include data lexical resources, e.g., machine-readable dictionaries, linguistic corpora, i.e., digital collections of natural language data, linguistic data bases such as the Cross-Linguistic Linked Data collection, tools linguistic annotations and tools for creating such annotations in a manual or semiautomated fashion (e.g., tools for annotating interlinear glossed text such as Toolbox and FLEx, or other language documentation tools), applications for search and retrieval over such data (corpus management systems), for automated annotation (part-of-speech tagging, syntactic parsing, semantic parsing, etc.), metadata and vocabularies vocabularies, repositories of linguistic terminology and language metadata, e.g., MetaShare (for language resource metadata), the ISO 12620 data category registry (for linguistic features, data structures and annotations within a language resource), or the Glottolog database (identifiers for language varieties and bibliographical database). == Language resource publication, dissemination and creation == A major concern of the language resource community has been to develop infrastructures and platforms to present, discuss and disseminate language resources. Selected contributions in this regard include: a series of International Conferences on Language Resources and Evaluation (LREC), the European Language Resources Association (ELRA, EU-based), and the Linguistic Data Consortium (LDC, US-based), which represent commercial hosting and dissemination platforms for language resources, the Open Languages Archives Community (OLAC), which provides and aggregates language resource metadata, the Language Resources and Evaluation Journal (LREJ), the European Language Grid is a European platform for language technologies (eg services), data and resources. As for the development of standards and best practices for language resources, these are subject of several community groups and standardization efforts, including ISO Technical Committee 37: Terminology and other language and content resources (ISO/TC 37), developing standards for all aspects of language resources, W3C Community Group Best Practices for Multilingual Linked Open Data (BPMLOD), working on best practice recommendations for publishing language resources as Linked Data or in RDF, W3C Community Group Linked Data for Language Technology (LD4LT), working on linguistic annotations on the web and language resource metadata, W3C Community Group Ontology-Lexica (OntoLex), working on lexical resources, the Open Linguistics working group of the Open Knowledge Foundation, working on conventions for publishing and linking open language resources, developing the Linguistic Linked Open Data cloud, the Text Encoding Initiative (TEI), working on XML-based specifications for language resources and digitally edited text.

    Read more →
  • Exploratory blockmodeling

    Exploratory blockmodeling

    Exploratory blockmodeling is an (inductive) approach (or a group of approaches) in blockmodeling regarding the specification of an ideal blockmodel. This approach, also known as hypotheses-generating, is the simplest approach, as it "merely involves the definition of the block types permitted as well as of the number of clusters." With this approach, researcher usually defines the best possible blockmodel, which then represent the base for the analysis of the whole network. This approach is usually based on: previous analyses and theoretical considerations, using stricker blockmodel and block types, where the structural equivalence is stricker than the regular equivalence and using smaller number of classes. The opposite approach is called a confirmatory blockmodeling.

    Read more →
  • Memtransistor

    Memtransistor

    The memtransistor (a blend word from Memory Transfer Resistor) is an experimental multi-terminal passive electronic component that might be used in the construction of artificial neural networks. It is a combination of the memristor and transistor technology. This technology is different from the 1T-1R approach since the devices are merged into one single entity. Multiple memristors can be embedded with a single transistor, enabling it to more accurately model a neuron with its multiple synaptic connections. A neural network produced from these would provide hardware-based artificial intelligence with a good foundation. == Applications == These types of devices would allow for a synapse model that could realise a learning rule, by which the synaptic efficacy is altered by voltages applied to the terminals of the device. An example of such a learning rule is spike-timing-dependant-plasticty by which the weight of the synapse, in this case the conductivity, could be modulated based on the timing of pre and post synaptic spikes arriving at each terminal. The advantage of this approach over two terminal memristive devices is that read and write protocols have the possibility to occur simultaneously and distinctly.

    Read more →