AI Face Changer Video

AI Face Changer Video — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Digital image correlation and tracking

    Digital image correlation and tracking

    Digital image correlation and tracking is an optical method that employs tracking and image registration techniques for accurate 2D and 3D measurements of changes in 2D images or 3D volumes. This method is often used to measure full-field displacement and strains, and it is widely applied in many areas of science and engineering. Compared to strain gauges and extensometers, digital image correlation methods provide finer details about deformation, due to the ability to provide both local and average data. == Overview == Digital image correlation (DIC) techniques have been increasing in popularity, especially in micro- and nano-scale mechanical testing applications due to their relative ease of implementation and use. Advances in computer technology and digital cameras have been the enabling technologies for this method and while white-light optics has been the predominant approach, DIC can be and has been extended to almost any imaging technology. The concept of using cross-correlation to measure shifts in datasets has been known for a long time, and it has been applied to digital images since at least the early 1970s. The present-day applications are almost innumerable, including image analysis, image compression, velocimetry, and strain estimation. Much early work in DIC in the field of mechanics was led by researchers at the University of South Carolina in the early 1980s and has been optimized and improved in recent years. Commonly, DIC relies on finding the maximum of the correlation array between pixel intensity array subsets on two or more corresponding images, which gives the integer translational shift between them. It is also possible to estimate shifts to a finer resolution than the resolution of the original images, which is often called "sub-pixel" registration because the measured shift is smaller than an integer pixel unit. For sub-pixel interpolation of the shift, other methods do not simply maximize the correlation coefficient. An iterative approach can also be used to maximize the interpolated correlation coefficient by using non-linear optimization techniques. The non-linear optimization approach tends to be conceptually simpler and can handle large deformations more accurately, but as with most nonlinear optimization techniques, it is slower. The two-dimensional discrete cross correlation r i j {\displaystyle r_{ij}} can be defined in several ways, one possibility being: r i j = ∑ m ∑ n [ f ( m + i , n + j ) − f ¯ ] [ g ( m , n ) − g ¯ ] ∑ m ∑ n [ f ( m , n ) − f ¯ ] 2 ∑ m ∑ n [ g ( m , n ) − g ¯ ] 2 . {\displaystyle r_{ij}={\frac {\sum _{m}\sum _{n}[f(m+i,n+j)-{\bar {f}}][g(m,n)-{\bar {g}}]}{\sqrt {\sum _{m}\sum _{n}{[f(m,n)-{\bar {f}}]^{2}}\sum _{m}\sum _{n}{[g(m,n)-{\bar {g}}]^{2}}}}}.} Here f(m, n) is the pixel intensity or the gray-scale value at a point (m, n) in the original image, g(m, n) is the gray-scale value at a point (m, n) in the translated image, f ¯ {\displaystyle {\bar {f}}} and g ¯ {\displaystyle {\bar {g}}} are mean values of the intensity matrices f and g respectively. However, in practical applications, the correlation array is usually computed using Fourier-transform methods, since the fast Fourier transform is a much faster method than directly computing the correlation. F = F { f } , G = F { g } . {\displaystyle \mathbf {F} ={\mathcal {F}}\{f\},\quad \mathbf {G} ={\mathcal {F}}\{g\}.} Then taking the complex conjugate of the second result and multiplying the Fourier transforms together elementwise, we obtain the Fourier transform of the correlogram, R {\displaystyle \ R} : R = F ∘ G ∗ , {\displaystyle R=\mathbf {F} \circ \mathbf {G} ^{},} where ∘ {\displaystyle \circ } is the Hadamard product (entry-wise product). It is also fairly common to normalize the magnitudes to unity at this point, which results in a variation called phase correlation. Then the cross-correlation is obtained by applying the inverse Fourier transform: r = F − 1 { R } . {\displaystyle \ r={\mathcal {F}}^{-1}\{R\}.} At this point, the coordinates of the maximum of r i j {\displaystyle r_{ij}} give the integer shift: ( Δ x , Δ y ) = arg ⁡ max ( i , j ) { r } . {\displaystyle (\Delta x,\Delta y)=\arg \max _{(i,j)}\{r\}.} == Deformation mapping == For deformation mapping, the mapping function that relates the images can be derived from comparing a set of subwindow pairs over the whole images. (Figure 1). The coordinates or grid points (xi, yj) and (xi, yj) are related by the translations that occur between the two images. If the deformation is small and perpendicular to the optical axis of the camera, then the relation between (xi, yj) and (xi, yj) can be approximated by a 2D affine transformation such as: x ∗ = x + u + ∂ u ∂ x Δ x + ∂ u ∂ y Δ y , {\displaystyle x^{}=x+u+{\frac {\partial u}{\partial x}}\Delta x+{\frac {\partial u}{\partial y}}\Delta y,} y ∗ = y + v + ∂ v ∂ x Δ x + ∂ v ∂ y Δ y . {\displaystyle y^{}=y+v+{\frac {\partial v}{\partial x}}\Delta x+{\frac {\partial v}{\partial y}}\Delta y.} Here u and v are translations of the center of the sub-image in the X and Y directions respectively. The distances from the center of the sub-image to the point (x, y) are denoted by Δ x {\displaystyle \Delta x} and Δ y {\displaystyle \Delta y} . Thus, the correlation coefficient rij is a function of displacement components (u, v) and displacement gradients ∂ u ∂ x , ∂ u ∂ y , ∂ v ∂ x , ∂ v ∂ y . {\displaystyle {\frac {\partial u}{\partial x}},{\frac {\partial u}{\partial y}},{\frac {\partial v}{\partial x}},{\frac {\partial v}{\partial y}}.} DIC has proven to be very effective at mapping deformation in macroscopic mechanical testing, where the application of specular markers (e.g. paint, toner powder) or surface finishes from machining and polishing provide the needed contrast to correlate images well. However, these methods for applying surface contrast do not extend to the application of free-standing thin films for several reasons. First, vapor deposition at normal temperatures on semiconductor grade substrates results in mirror-finish quality films with RMS roughnesses that are typically on the order of several nanometers. No subsequent polishing or finishing steps are required, and unless electron imaging techniques are employed that can resolve microstructural features, the films do not possess enough useful surface contrast to adequately correlate images. Typically this challenge can be circumvented by applying paint that results in a random speckle pattern on the surface, although the large and turbulent forces resulting from either spraying or applying paint to the surface of a free-standing thin film are too high and would break the specimens. In addition, the sizes of individual paint particles are on the order of μms, while the film thickness is only several hundred nanometers, which would be analogous to supporting a large boulder on a thin sheet of paper. == Digital volume correlation == Digital Volume Correlation (DVC, and sometimes called Volumetric-DIC) extends the 2D-DIC algorithms into three dimensions to calculate the full-field 3D deformation from a pair of 3D images. This technique is distinct from 3D-DIC, which only calculates the 3D deformation of an exterior surface using conventional optical images. The DVC algorithm is able to track full-field displacement information in the form of voxels instead of pixels. The theory is similar to above except that another dimension is added: the z-dimension. The displacement is calculated from the correlation of 3D subsets of the reference and deformed volumetric images, which is analogous to the correlation of 2D subsets described above. DVC can be performed using volumetric image datasets. These images can be obtained using confocal microscopy, X-ray computed tomography, Magnetic Resonance Imaging or other techniques. Similar to the other DIC techniques, the images must exhibit a distinct, high-contrast 3D "speckle pattern" to ensure accurate displacement measurement. DVC was first developed in 1999 to study the deformation of trabecular bone using X-ray computed tomography images. Since then, applications of DVC have grown to include granular materials, metals, foams, composites and biological materials. To date it has been used with images acquired by MRI imaging, Computer Tomography (CT), micro-CT, confocal microscopy, and lightsheet microscopy. DVC is currently considered to be ideal in the research world for 3D quantification of local displacements, strains, and stress in biological specimens. It is preferred because of the non-invasiveness of the method over traditional experimental methods. Two of the key challenges are improving the speed and reliability of the DVC measurement. The 3D imaging techniques produce noisier images than conventional 2D optical images, which reduces the quality of the displacement measurement. Computational speed is restricted by the file sizes of 3D images, which are significantly larger than 2D images. For example, an

    Read more →
  • Reparameterization trick

    Reparameterization trick

    The reparameterization trick (aka "reparameterization gradient estimator") is a technique used in statistical machine learning, particularly in variational inference, variational autoencoders, and stochastic optimization. It allows for the efficient computation of gradients through random variables, enabling the optimization of parametric probability models using stochastic gradient descent, and the variance reduction of estimators. It was developed in the 1980s in operations research, under the name of "pathwise gradients", or "stochastic gradients". Its use in variational inference was proposed in 2013. == Mathematics == Let z {\displaystyle z} be a random variable with distribution q ϕ ( z ) {\displaystyle q_{\phi }(z)} , where ϕ {\displaystyle \phi } is a vector containing the parameters of the distribution. === REINFORCE estimator === Consider an objective function of the form: L ( ϕ ) = E z ∼ q ϕ ( z ) [ f ( z ) ] {\displaystyle L(\phi )=\mathbb {E} _{z\sim q_{\phi }(z)}[f(z)]} Without the reparameterization trick, estimating the gradient ∇ ϕ L ( ϕ ) {\displaystyle \nabla _{\phi }L(\phi )} can be challenging, because the parameter appears in the random variable itself. In more detail, we have to statistically estimate: ∇ ϕ L ( ϕ ) = ∇ ϕ ∫ d z q ϕ ( z ) f ( z ) {\displaystyle \nabla _{\phi }L(\phi )=\nabla _{\phi }\int dz\;q_{\phi }(z)f(z)} The REINFORCE estimator, widely used in reinforcement learning and especially policy gradient, uses the following equality: ∇ ϕ L ( ϕ ) = ∫ d z q ϕ ( z ) ∇ ϕ ( ln ⁡ q ϕ ( z ) ) f ( z ) = E z ∼ q ϕ ( z ) [ ∇ ϕ ( ln ⁡ q ϕ ( z ) ) f ( z ) ] {\displaystyle \nabla _{\phi }L(\phi )=\int dz\;q_{\phi }(z)\nabla _{\phi }(\ln q_{\phi }(z))f(z)=\mathbb {E} _{z\sim q_{\phi }(z)}[\nabla _{\phi }(\ln q_{\phi }(z))f(z)]} This allows the gradient to be estimated: ∇ ϕ L ( ϕ ) ≈ 1 N ∑ i = 1 N ∇ ϕ ( ln ⁡ q ϕ ( z i ) ) f ( z i ) {\displaystyle \nabla _{\phi }L(\phi )\approx {\frac {1}{N}}\sum _{i=1}^{N}\nabla _{\phi }(\ln q_{\phi }(z_{i}))f(z_{i})} The REINFORCE estimator has high variance, and many methods were developed to reduce its variance. === Reparameterization estimator === The reparameterization trick expresses z {\displaystyle z} as: z = g ϕ ( ϵ ) , ϵ ∼ p ( ϵ ) {\displaystyle z=g_{\phi }(\epsilon ),\quad \epsilon \sim p(\epsilon )} Here, g ϕ {\displaystyle g_{\phi }} is a deterministic function parameterized by ϕ {\displaystyle \phi } , and ϵ {\displaystyle \epsilon } is a noise variable drawn from a fixed distribution p ( ϵ ) {\displaystyle p(\epsilon )} . This gives: L ( ϕ ) = E ϵ ∼ p ( ϵ ) [ f ( g ϕ ( ϵ ) ) ] {\displaystyle L(\phi )=\mathbb {E} _{\epsilon \sim p(\epsilon )}[f(g_{\phi }(\epsilon ))]} Now, the gradient can be estimated as: ∇ ϕ L ( ϕ ) = E ϵ ∼ p ( ϵ ) [ ∇ ϕ f ( g ϕ ( ϵ ) ) ] ≈ 1 N ∑ i = 1 N ∇ ϕ f ( g ϕ ( ϵ i ) ) {\displaystyle \nabla _{\phi }L(\phi )=\mathbb {E} _{\epsilon \sim p(\epsilon )}[\nabla _{\phi }f(g_{\phi }(\epsilon ))]\approx {\frac {1}{N}}\sum _{i=1}^{N}\nabla _{\phi }f(g_{\phi }(\epsilon _{i}))} == Examples == For some common distributions, the reparameterization trick takes specific forms: Normal distribution: For z ∼ N ( μ , σ 2 ) {\displaystyle z\sim {\mathcal {N}}(\mu ,\sigma ^{2})} , we can use: z = μ + σ ϵ , ϵ ∼ N ( 0 , 1 ) {\displaystyle z=\mu +\sigma \epsilon ,\quad \epsilon \sim {\mathcal {N}}(0,1)} Exponential distribution: For z ∼ Exp ( λ ) {\displaystyle z\sim {\text{Exp}}(\lambda )} , we can use: z = − 1 λ log ⁡ ( ϵ ) , ϵ ∼ Uniform ( 0 , 1 ) {\displaystyle z=-{\frac {1}{\lambda }}\log(\epsilon ),\quad \epsilon \sim {\text{Uniform}}(0,1)} Discrete distribution can be reparameterized by the Gumbel distribution (Gumbel-softmax trick or "concrete distribution") and diffusion models. In general, any distribution that is differentiable with respect to its parameters can be reparameterized by inverting the multivariable CDF function, then apply the implicit method. See for an exposition and application to the Gamma, Beta, Dirichlet, and von Mises distributions. == Applications == === Variational autoencoder === In Variational Autoencoders (VAEs), the VAE objective function, known as the Evidence Lower Bound (ELBO), is given by: ELBO ( ϕ , θ ) = E z ∼ q ϕ ( z | x ) [ log ⁡ p θ ( x | z ) ] − D KL ( q ϕ ( z | x ) | | p ( z ) ) {\displaystyle {\text{ELBO}}(\phi ,\theta )=\mathbb {E} _{z\sim q_{\phi }(z|x)}[\log p_{\theta }(x|z)]-D_{\text{KL}}(q_{\phi }(z|x)||p(z))} where q ϕ ( z | x ) {\displaystyle q_{\phi }(z|x)} is the encoder (recognition model), p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} is the decoder (generative model), and p ( z ) {\displaystyle p(z)} is the prior distribution over latent variables. The gradient of ELBO with respect to θ {\displaystyle \theta } is simply E z ∼ q ϕ ( z | x ) [ ∇ θ log ⁡ p θ ( x | z ) ] ≈ 1 L ∑ l = 1 L ∇ θ log ⁡ p θ ( x | z l ) {\displaystyle \mathbb {E} _{z\sim q_{\phi }(z|x)}[\nabla _{\theta }\log p_{\theta }(x|z)]\approx {\frac {1}{L}}\sum _{l=1}^{L}\nabla _{\theta }\log p_{\theta }(x|z_{l})} but the gradient with respect to ϕ {\displaystyle \phi } requires the trick. Express the sampling operation z ∼ q ϕ ( z | x ) {\displaystyle z\sim q_{\phi }(z|x)} as: z = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ , ϵ ∼ N ( 0 , I ) {\displaystyle z=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon ,\quad \epsilon \sim {\mathcal {N}}(0,I)} where μ ϕ ( x ) {\displaystyle \mu _{\phi }(x)} and σ ϕ ( x ) {\displaystyle \sigma _{\phi }(x)} are the outputs of the encoder network, and ⊙ {\displaystyle \odot } denotes element-wise multiplication. Then we have ∇ ϕ ELBO ( ϕ , θ ) = E ϵ ∼ N ( 0 , I ) [ ∇ ϕ log ⁡ p θ ( x | z ) + ∇ ϕ log ⁡ q ϕ ( z | x ) − ∇ ϕ log ⁡ p ( z ) ] {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi ,\theta )=\mathbb {E} _{\epsilon \sim {\mathcal {N}}(0,I)}[\nabla _{\phi }\log p_{\theta }(x|z)+\nabla _{\phi }\log q_{\phi }(z|x)-\nabla _{\phi }\log p(z)]} where z = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ {\displaystyle z=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon } . This allows us to estimate the gradient using Monte Carlo sampling: ∇ ϕ ELBO ( ϕ , θ ) ≈ 1 L ∑ l = 1 L [ ∇ ϕ log ⁡ p θ ( x | z l ) + ∇ ϕ log ⁡ q ϕ ( z l | x ) − ∇ ϕ log ⁡ p ( z l ) ] {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi ,\theta )\approx {\frac {1}{L}}\sum _{l=1}^{L}[\nabla _{\phi }\log p_{\theta }(x|z_{l})+\nabla _{\phi }\log q_{\phi }(z_{l}|x)-\nabla _{\phi }\log p(z_{l})]} where z l = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ l {\displaystyle z_{l}=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon _{l}} and ϵ l ∼ N ( 0 , I ) {\displaystyle \epsilon _{l}\sim {\mathcal {N}}(0,I)} for l = 1 , … , L {\displaystyle l=1,\ldots ,L} . This formulation enables backpropagation through the sampling process, allowing for end-to-end training of the VAE model using stochastic gradient descent or its variants. === Variational inference === More generally, the trick allows using stochastic gradient descent for variational inference. Let the variational objective (ELBO) be of the form: ELBO ( ϕ ) = E z ∼ q ϕ ( z ) [ log ⁡ p ( x , z ) − log ⁡ q ϕ ( z ) ] {\displaystyle {\text{ELBO}}(\phi )=\mathbb {E} _{z\sim q_{\phi }(z)}[\log p(x,z)-\log q_{\phi }(z)]} Using the reparameterization trick, we can estimate the gradient of this objective with respect to ϕ {\displaystyle \phi } : ∇ ϕ ELBO ( ϕ ) ≈ 1 L ∑ l = 1 L ∇ ϕ [ log ⁡ p ( x , g ϕ ( ϵ l ) ) − log ⁡ q ϕ ( g ϕ ( ϵ l ) ) ] , ϵ l ∼ p ( ϵ ) {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi )\approx {\frac {1}{L}}\sum _{l=1}^{L}\nabla _{\phi }[\log p(x,g_{\phi }(\epsilon _{l}))-\log q_{\phi }(g_{\phi }(\epsilon _{l}))],\quad \epsilon _{l}\sim p(\epsilon )} === Dropout === The reparameterization trick has been applied to reduce the variance in dropout, a regularization technique in neural networks. The original dropout can be reparameterized with Bernoulli distributions: y = ( W ⊙ ϵ ) x , ϵ i j ∼ Bernoulli ( α i j ) {\displaystyle y=(W\odot \epsilon )x,\quad \epsilon _{ij}\sim {\text{Bernoulli}}(\alpha _{ij})} where W {\displaystyle W} is the weight matrix, x {\displaystyle x} is the input, and α i j {\displaystyle \alpha _{ij}} are the (fixed) dropout rates. More generally, other distributions can be used than the Bernoulli distribution, such as the gaussian noise: y i = μ i + σ i ⊙ ϵ i , ϵ i ∼ N ( 0 , I ) {\displaystyle y_{i}=\mu _{i}+\sigma _{i}\odot \epsilon _{i},\quad \epsilon _{i}\sim {\mathcal {N}}(0,I)} where μ i = m i ⊤ x {\displaystyle \mu _{i}=\mathbf {m} _{i}^{\top }x} and σ i 2 = v i ⊤ x 2 {\displaystyle \sigma _{i}^{2}=\mathbf {v} _{i}^{\top }x^{2}} , with m i {\displaystyle \mathbf {m} _{i}} and v i {\displaystyle \mathbf {v} _{i}} being the mean and variance of the i {\displaystyle i} -th output neuron. The reparameterization trick can be applied to all such cases, resulting in the variational dropout method.

    Read more →
  • Decision list

    Decision list

    Decision lists are a representation for Boolean functions which can be easily learned from examples. Single term decision lists are more expressive than disjunctions and conjunctions; however, 1-term decision lists are less expressive than the general disjunctive normal form and the conjunctive normal form. The language specified by a k-length decision list includes as a subset the language specified by a k-depth decision tree. Learning decision lists can be used for attribute efficient learning, a type of machine learning. == Definition == A decision list (DL) of length r is of the form: if f1 then output b1 else if f2 then output b2 ... else if fr then output br where fi is the ith formula and bi is the ith boolean for i ∈ { 1... r } {\displaystyle i\in \{1...r\}} . The last if-then-else is the default case, which means formula fr is always equal to true. A k-DL is a decision list where all of formulas have at most k terms. Sometimes "decision list" is used to refer to a 1-DL, where all of the formulas are either a variable or its negation.

    Read more →
  • Character computing

    Character computing

    Character computing is a trans-disciplinary field of research at the intersection of computer science and psychology. It is any computing that incorporates the human character within its context. Character is defined as all features or characteristics defining an individual and guiding their behavior in a specific situation. It consists of stable trait markers (e.g., personality, background, history, socio-economic embeddings, culture,...) and variable state markers (emotions, health, cognitive state, ...). Character computing aims at providing a holistic psychologically driven model of human behavior. It models and predicts behavior based on the relationships between a situation and character. Three main research modules fall under the umbrella of character computing: character sensing and profiling, character-aware adaptive systems, and artificial characters. == Overview == Character computing can be viewed as an extension of the well-established field of affective computing. Based on the foundations of the different psychology branches, it advocates defining behavior as a compound attribute that is not driven by either personality, emotions, situation or cognition alone. It rather defines behavior as a function of everything that makes up an individual i.e., their character and the situation they are in. Affective computing aims at allowing machines to understand and translate the non-verbal cues of individuals into affect. Accordingly, character computing aims at understanding the character attributes of an individual and the situation to translate it to predicted behavior, and vice versa. ''In practical terms, depending on the application context, character computing is a branch of research that deals with the design of systems and interfaces that can observe, sense, predict, adapt to, affect, understand, or simulate the following: character based on behavior and situation, behavior based on character and situation, or situation based on character and behavior.'' The Character-Behavior-Situation (CBS) triad is at the core of character computing and defines each of the three edges based on the other two. Character computing relies on simultaneous development from a computational and psychological perspective and is intended to be used by researchers in both fields. Its main concept is aligning the computational model of character computing with empirical results from in-lab and in-the-wild psychology experiments. The model is to be continuously built and validated through the emergence of new data. Similar to affective and personality computing, the model is to be used as a base for different applications towards improving user experience. == History == Character computing as such was first coined in its first workshop in 2017. Since then it has had 3 international workshops and numerous publications. Despite its young age, it has already drawn some interest in the research community, leading to the publication of the first book under the same title in early 2020 published by Springer Nature. Research that can be categorized under the field dates much older than 2017. The notion of combining several factors towards the explanation of behavior or traits and states has long been investigated in both Psychology and Computer Science, for example. == Character == The word character originates from the Greek word meaning “stamping tool”, referring to distinctive features and traits. Over the years it has been given many different connotations, like the moral character in philosophy, the temperament in psychology, a person in literature or an avatar in various virtual worlds, including video games. According to character computing character is a unification of all the previous definitions, by referring back to the original meaning of the word. Character is defined as the holistic concept representing all interacting trait and state markers that distinguish an individual. Traits are characteristics that mainly remain stable over time. Traits include personality, affect, socio-demographics, and general health. States are characteristics that vary in short periods of time. They include emotions, well-being, health, cognitive state. Each characteristic has many representation methods and psychological models. The different models can be combined or one model can be preset for each characteristic. This depends on the use-case and the design choices. == Areas == Research into character computing can be divided into three areas, which complement each other but can each be investigated separately. The first area is sensing and predicting character states and traits or ensuing behavior. The second area is adapting applications to certain character states or traits and the behavior they predict. It also deals with trying to change or monitor such behavior. The final area deals with creating artificial agents e.g., chatbots or virtual reality avatars that exhibit certain characteristics. The three areas are investigated separately and build on existing findings in the literature. The results of each of the three areas can also be used as a stepping stone for the next area. Each of the three areas has already been investigated on its own in different research fields with focus on different subsets of character. For example, affective computing and personality computing both cover different areas with a focus on some character components without the others to account for human behavior. == The Character-Behavior-Situation triad == Character computing is based on a holistic psychologically driven model of human behavior. Human behavior is modeled and predicted based on the relationships between a situation and a human's character. To further define character in a more formal or holistic manner, we represent it in light of the Character–Behavior–Situation triad. This highlights that character not only determines who we are but how we are, i.e., how we behave. The triad investigated in Personality Psychology is extended through character computing to the Character–Behavior–Situation triad. Any member of the CBS triad is a function of the two other members, e.g., given the situation and personality, the behavior can be predicted. Each of the components in the triad can be further decomposed into smaller units and features that may best represent the human's behavior or character in a particular situation. Character is thus behind a person's behavior in any given situation. While this is a causality relation, the correlation between the three components is often more easily used to predict the components that are most difficult to measure from those measured more easily. There are infinitely many components to include in the representation of any of C, B, and S. The challenge is always to choose the smallest subset needed for prediction of a person's behavior in a particular situation.

    Read more →
  • Morphological antialiasing

    Morphological antialiasing

    Morphological antialiasing (MLAA) is a spatial anti-aliasing technique used in real-time computer graphics. It reduces artifacts, such as jaggies, when representing a high-resolution image at a lower resolution. MLAA is a post-process filtering which detects borders in the resulting image and then finds specific patterns in these. Anti-aliasing is achieved by blending pixels in these borders, according to the pattern they belong to and their position within the pattern. Introduced in 2009, MLAA was an early and influential example of anti-aliasing techniques done in post-processing, which makes them suitable for deferred shading. A similar method in this class is fast approximate anti-aliasing (FXAA). Temporal anti-aliasing, also a post-process, has become the most common anti-aliasing method for real-time rendering and video games. Enhanced subpixel morphological antialiasing, or SMAA, is an image-based GPU-based implementation of MLAA developed by Universidad de Zaragoza and Crytek.

    Read more →
  • Human-in-the-loop

    Human-in-the-loop

    Human-in-the-loop (HITL) is used in multiple contexts. It can be defined as a model requiring human interaction. HITL is associated with modeling and simulation (M&S) in the live, virtual, and constructive taxonomy. HITL, along with the related human-on-the-loop, are also used in relation to lethal autonomous weapons. Further, HITL is used in the context of machine learning.It is also used in conversational AI to manage complex interactions that require human empathy. == Machine learning == In machine learning, HITL is used in the sense of humans aiding the computer in making the correct decisions in building a model. HITL improves machine learning over random sampling by selecting the most critical data needed to refine the model. == Simulation == In simulation, HITL models may conform to human factors requirements as in the case of a mockup. In this type of simulation, a human is always part of the simulation and consequently influences the outcome in such a way that is difficult if not impossible to reproduce exactly. HITL also readily allows for the identification of problems and requirements that may not be easily identified by other means of simulation. HITL is often referred to as an interactive simulation, which is a special kind of physical simulation in which physical simulations include human operators, such as in a flight or a driving simulator. === Benefits === Human-in-the-loop allows the user to change the outcome of an event or process. The immersion effectively contributes to a positive transfer of acquired skills into the real world. This can be demonstrated by trainees utilizing flight simulators in preparation to become pilots. HITL also allows for the acquisition of knowledge regarding how a new process may affect a particular event. Utilizing HITL allows participants to interact with realistic models and attempt to perform as they would in an actual scenario. HITL simulations bring to the surface issues that would not otherwise be apparent until after a new process has been deployed. A real-world example of HITL simulation as an evaluation tool is its usage by the Federal Aviation Administration (FAA) to allow air traffic controllers to test new automation procedures by directing the activities of simulated air traffic while monitoring the effect of the newly implemented procedures. As with most processes, there is always the possibility of human error, which can only be reproduced using HITL simulation. Although much can be done to automate systems, humans typically still need to take the information provided by a system to determine the next course of action based on their judgment and experience. Intelligent systems can only go so far in certain circumstances to automate a process; only humans in the simulation can accurately judge the final design. Tabletop simulation may be useful in the very early stages of project development for the purpose of collecting data to set broad parameters, but the important decisions require human-in-the-loop simulation. HITL reflects scenarios where human input remains essential despite advances in automation. === Within the virtual simulation taxonomy === Virtual simulations inject HITL in a central role by exercising motor control skills (e.g. flying an airplane), decision making skills (e.g. committing fire control resources to action), or communication skills (e.g. as members of a C4I team). === Examples === Flight simulators Driving simulators Marine simulators Video games Supply chain management simulators Digital puppetry === Misconceptions === Although human-in-the-loop simulation can include a computer simulation in the form of a synthetic environment, computer simulation is not necessarily a form of human-in-the-loop simulation, and is often considered as human-out-of-the loop simulation. In this particular case, a computer model’s behavior is modified according to a set of initial parameters. The results of the model differ from the results stemming from a true human-in-the-loop simulation because the results can easily be replicated time and time again, by simply providing identical parameters. == Weapons == === Taxonomy === Three classifications of the degree of human control of autonomous weapon systems were laid out by Bonnie Docherty in a 2012 Human Rights Watch report. human-in-the-loop: a human must instigate the action of the weapon (in other words not fully autonomous) human-on-the-loop: a human may abort an action human-out-of-the-loop: no human action is involved === Positive human action === In discussions of autonomous weapons and nuclear command and control, the phrase positive human action has been used alongside "human-in-the-loop" to emphasize that a human operator must affirmatively authorize the use of force. Descriptions of the United States Navy's Aegis Combat System have used the phrase in characterizing a requirement for affirmative human action to initiate live firing. A survey of autonomous weapons systems described the Aegis "Auto SM" mode as one in which "the system fully develops the engagement process however engagement requires positive human action". The phrase entered United States federal law in the National Defense Authorization Act for Fiscal Year 2025, which stipulates that artificial intelligence systems not compromise "the principle of requiring positive human actions in execution of decisions by the President with respect to the employment of nuclear weapons".

    Read more →
  • Outline of machine learning

    Outline of machine learning

    The following outline is provided as an overview of, and topical guide to, machine learning: Machine learning (ML) is a subfield of artificial intelligence within computer science that evolved from the study of pattern recognition and computational learning theory. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". ML involves the study and construction of algorithms that can learn from and make predictions on data. These algorithms operate by building a model from a training set of example observations to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions. == How can machine learning be categorized? == An academic discipline A branch of science An applied science A subfield of computer science A branch of artificial intelligence A subfield of soft computing Application of statistics === Paradigms of machine learning === Supervised learning, where the model is trained on labeled data Unsupervised learning, where the model tries to identify patterns in unlabeled data Reinforcement learning, where the model learns to make decisions by receiving rewards or penalties. == Applications of machine learning == Applications of machine learning Bioinformatics Biomedical informatics Computer vision Customer relationship management Data mining Earth sciences Email filtering Inverted pendulum (balance and equilibrium system) Natural language processing Named Entity Recognition Automatic summarization Automatic taxonomy construction Dialog system Grammar checker Language recognition Handwriting recognition Optical character recognition Speech recognition Text to Speech Synthesis Speech Emotion Recognition Machine translation Question answering Speech synthesis Text mining Term frequency–inverse document frequency Text simplification Pattern recognition Facial recognition system Handwriting recognition Image recognition Optical character recognition Speech recognition Recommendation system Collaborative filtering Content-based filtering Hybrid recommender systems Search engine Search engine optimization Social engineering == Machine learning hardware == Graphics processing unit Tensor processing unit Vision processing unit == Machine learning tools == Comparison of machine learning software Comparison of deep learning software === Machine learning frameworks === ==== Proprietary machine learning frameworks ==== Amazon Machine Learning Microsoft Azure Machine Learning Studio DistBelief (replaced by TensorFlow) ==== Open source machine learning frameworks ==== Apache Singa Apache MXNet Caffe PyTorch mlpack TensorFlow Torch CNTK Accord.Net Jax MLJ.jl – A machine learning framework for Julia === Machine learning libraries === Deeplearning4j Theano scikit-learn Keras === Machine learning algorithms === == Machine learning methods == === Instance-based algorithm === K-nearest neighbors algorithm (KNN) Learning vector quantization (LVQ) Self-organizing map (SOM) === Regression analysis === Logistic regression Ordinary least squares regression (OLSR) Linear regression Stepwise regression Multivariate adaptive regression splines (MARS) Regularization algorithm Ridge regression Least Absolute Shrinkage and Selection Operator (LASSO) Elastic net Least-angle regression (LARS) Classifiers Probabilistic classifier Naive Bayes classifier Binary classifier Linear classifier Hierarchical classifier === Dimensionality reduction === Dimensionality reduction Canonical correlation analysis (CCA) Factor analysis Feature extraction Feature selection Independent component analysis (ICA) Linear discriminant analysis (LDA) Multidimensional scaling (MDS) Non-negative matrix factorization (NMF) Partial least squares regression (PLSR) Principal component analysis (PCA) Principal component regression (PCR) Projection pursuit Sammon mapping t-distributed stochastic neighbor embedding (t-SNE) === Ensemble learning === Ensemble learning AdaBoost Boosting Bootstrap aggregating (also "bagging" or "bootstrapping") Ensemble averaging Gradient boosted decision tree (GBDT) Gradient boosting Random Forest Stacked Generalization === Meta-learning === Meta-learning Inductive bias Metadata === Reinforcement learning === Reinforcement learning Q-learning State–action–reward–state–action (SARSA) Temporal difference learning (TD) Learning Automata === Supervised learning === Supervised learning Averaged one-dependence estimators (AODE) Artificial neural network Case-based reasoning Gaussian process regression Gene expression programming Group method of data handling (GMDH) Inductive logic programming Instance-based learning Lazy learning Learning Automata Learning Vector Quantization Logistic Model Tree Minimum message length (decision trees, decision graphs, etc.) Nearest Neighbor Algorithm Analogical modeling Probably approximately correct learning (PAC) learning Ripple down rules, a knowledge acquisition methodology Symbolic machine learning algorithms Support vector machines Random Forests Ensembles of classifiers Bootstrap aggregating (bagging) Boosting (meta-algorithm) Ordinal classification Conditional Random Field ANOVA Quadratic classifiers k-nearest neighbor Boosting SPRINT Bayesian networks Naive Bayes Hidden Markov models Hierarchical hidden Markov model ==== Bayesian ==== Bayesian statistics Bayesian knowledge base Naive Bayes Gaussian Naive Bayes Multinomial Naive Bayes Averaged One-Dependence Estimators (AODE) Bayesian Belief Network (BBN) Bayesian Network (BN) ==== Decision tree algorithms ==== Decision tree algorithm Decision tree Classification and regression tree (CART) Iterative Dichotomiser 3 (ID3) C4.5 algorithm C5.0 algorithm Chi-squared Automatic Interaction Detection (CHAID) Decision stump Conditional decision tree ID3 algorithm Random forest SLIQ ==== Linear classifier ==== Linear classifier Fisher's linear discriminant Linear regression Logistic regression Multinomial logistic regression Naive Bayes classifier Perceptron Support vector machine === Unsupervised learning === Unsupervised learning Expectation-maximization algorithm Vector Quantization Generative topographic map Information bottleneck method Association rule learning algorithms Apriori algorithm Eclat algorithm ==== Artificial neural networks ==== Artificial neural network Feedforward neural network Extreme learning machine Convolutional neural network Recurrent neural network Long short-term memory (LSTM) Logic learning machine Self-organizing map ==== Association rule learning ==== Association rule learning Apriori algorithm Eclat algorithm FP-growth algorithm ==== Hierarchical clustering ==== Hierarchical clustering Single-linkage clustering Conceptual clustering ==== Cluster analysis ==== Cluster analysis BIRCH DBSCAN Expectation–maximization (EM) Fuzzy clustering Hierarchical clustering k-means clustering k-medians Mean-shift OPTICS algorithm ==== Anomaly detection ==== Anomaly detection k-nearest neighbors algorithm (k-NN) Local outlier factor === Semi-supervised learning === Semi-supervised learning Active learning Generative models Low-density separation Graph-based methods Co-training Transduction === Deep learning === Deep learning Deep belief networks Deep Boltzmann machines Deep Convolutional neural networks Deep Recurrent neural networks Hierarchical temporal memory Generative Adversarial Network Style transfer Transformer Stacked Auto-Encoders === Other machine learning methods and problems === Anomaly detection Association rules Bias-variance dilemma Classification Multi-label classification Clustering Data Pre-processing Empirical risk minimization Feature engineering Feature learning Learning to rank Occam learning Online machine learning PAC learning Regression Reinforcement Learning Semi-supervised learning Statistical learning Structured prediction Graphical models Bayesian network Conditional random field (CRF) Hidden Markov model (HMM) Unsupervised learning VC theory == Machine learning research == List of artificial intelligence projects List of datasets for machine learning research == History of machine learning == History of machine learning Timeline of machine learning == Machine learning projects == Machine learning projects: DeepMind Google Brain OpenAI Meta AI Hugging Face == Machine learning organizations == === Machine learning conferences and workshops === Artificial Intelligence and Security (AISec) (co-located workshop with CCS) Conference on Neural Information Processing Systems (NIPS) ECML PKDD International Conference on Machine Learning (ICML) ML4ALL (Machine Learning For All) == Machine learning publications == === Books on machine learning === Mathematics for Machine Learning Hands-On Machine Learning Scikit-Learn, Keras, and TensorFlow The Hundred-Page Machine Learning Book === Machine learning journals === Machine Learning Journal of Machine Learning Research (JMLR) Neural Computation == Pe

    Read more →
  • Grammar systems theory

    Grammar systems theory

    Grammar systems theory is a field of theoretical computer science that studies systems of finite collections of formal grammars generating a formal language. Each grammar works on a string, a so-called sequential form that represents an environment. Grammar systems can thus be used as a formalization of decentralized or distributed systems of agents in artificial intelligence. Let A {\displaystyle \mathbb {A} } be a simple reactive agent moving on the table and trying not to fall down from the table with two reactions, t for turning and ƒ for moving forward. The set of possible behaviors of A {\displaystyle \mathbb {A} } can then be described as formal language L A = { ( f m t n f r ) + : 1 ≤ m ≤ k ; 1 ≤ n ≤ ℓ ; 1 ≤ r ≤ k } , {\displaystyle \mathbb {L_{A}} =\{(f^{m}t^{n}f^{r})^{+}:1\leq m\leq k;1\leq n\leq \ell ;1\leq r\leq k\},} where ƒ can be done maximally k times and t can be done maximally ℓ times considering the dimensions of the table. Let G A {\displaystyle \mathbb {G_{A}} } be a formal grammar which generates language L A {\displaystyle \mathbb {L_{A}} } . The behavior of A {\displaystyle \mathbb {A} } is then described by this grammar. Suppose the A {\displaystyle \mathbb {A} } has a subsumption architecture; each component of this architecture can be then represented as a formal grammar, too, and the final behavior of the agent is then described by this system of grammars. The schema on the right describes such a system of grammars which shares a common string representing an environment. The shared sequential form is sequentially rewritten by each grammar, which can represent either a component or generally an agent. If grammars communicate together and work on a shared sequential form, it is called a Cooperating Distributed (DC) grammar system. Shared sequential form is a similar concept to the blackboard approach in AI, which is inspired by an idea of experts solving some problem together while they share their proposals and ideas on a shared blackboard. Each grammar in a grammar system can also work on its own string and communicate with other grammars in a system by sending their sequential forms on request. Such a grammar system is then called a Parallel Communicating (PC) grammar system. PC and DC are inspired by distributed AI. If there is no communication between grammars, the system is close to the decentralized approaches in AI. These kinds of grammar systems are sometimes called colonies or Eco-Grammar systems, depending (besides others) on whether the environment is changing on its own (Eco-Grammar system) or not (colonies).

    Read more →
  • Computational humor

    Computational humor

    Computational humor is a branch of computational linguistics and artificial intelligence which uses computers in humor research. It is a relatively new area, with the first dedicated conference organized in 1996. The first "computer model of a sense of humor" was suggested by Suslov as early as 1992. Investigation of the general scheme of the information processing show a possibility of a specific malfunction, conditioned by the necessity of a quick deletion from consciousness of a false version. This specific malfunction can be identified with a humorous effect on the psychological grounds; however, an essentially new ingredient, a role of timing, is added to a well known role of ambiguity. In biological systems, a sense of humour inevitably develops in the course of evolution, because its biological function consists in quickening the transmission of processed information into consciousness and in a more effective use of brain resources. A realization of this algorithm in neural networks explains naturally the mechanism of laughter: deletion of a false version corresponds to zeroing of some part of the neural network and excessive energy of neurons is thrown out to the motor cortex, arousing muscular contractions. Unfortunately, a practical realization of this algorithm needs extensive databases, whose creation in the automatic regime was suggested only recently . As a result, this magistral direction was not developed properly and subsequent investigations (see below) accepted somewhat specialized colouring. == Joke generators == === Pun generation === An approach to analysis of humor is classification of jokes. A further step is an attempt to generate jokes basing on the rules that underlie classification. Simple prototypes for computer pun generation were reported in the early 1990s, based on a natural language generator program, VINCI. Graeme Ritchie and Kim Binsted in their 1994 research paper described a computer program, JAPE, designed to generate question-answer-type puns from a general, i.e., non-humorous, lexicon. (The program name is an acronym for "Joke Analysis and Production Engine".) Some examples produced by JAPE are: Q: What is the difference between leaves and a car? A: One you brush and rake, the other you rush and brake. Q: What do you call a strange market? A: A bizarre bazaar. Since then the approach has been improved, and the latest report, dated 2007, describes the STANDUP joke generator, implemented in the Java programming language. The STANDUP generator was tested on children within the framework of analyzing its usability for language skills development for children with communication disabilities, e.g., because of cerebral palsy. (The project name is an acronym for "System To Augment Non-speakers' Dialog Using Puns" and an allusion to standup comedy.) Children responded to this "language playground" with enthusiasm, and showed marked improvement on certain types of language tests. The two young people, who used the system over a ten-week period, regaled their peers, staff, family and neighbors with jokes such as: "What do you call a spicy missile? A hot shot!" Their joy and enthusiasm at entertaining others was inspirational. === Other === Stock and Strapparava described a program to generate funny acronyms. == Joke recognition == A statistical machine learning algorithm to detect whether a sentence contained a "That's what she said" double entendre was developed by Kiddon and Brun (2011). There is an open-source Python implementation of Kiddon & Brun's TWSS system. A program to recognize knock-knock jokes was reported by Taylor and Mazlack. This kind of research is important in analysis of human–computer interaction. An application of machine learning techniques for the distinguishing of joke texts from non-jokes was described by Mihalcea and Strapparava (2006). Takizawa et al. (1996) reported on a heuristic program for detecting puns in the Japanese language. == Applications == A possible application for assistance in language acquisition is described in the section "Pun generation". Another envisioned use of joke generators is in cases of a steady supply of jokes where quantity is more important than quality. Another obvious, yet remote, direction is automated joke appreciation. It is known that humans interact with computers in ways similar to interacting with other humans that may be described in terms of personality, politeness, flattery, and in-group favoritism. Therefore, the role of humor in human–computer interaction is being investigated. In particular, humor generation in user interface to ease communications with computers was suggested. Craig McDonough implemented the Mnemonic Sentence Generator, which converts passwords into humorous sentences. Based on the incongruity theory of humor, it is suggested that the resulting meaningless but funny sentences are easier to remember. For example, the password AjQA3Jtv is converted into "Arafat joined Quayle's Ant, while TARAR Jeopardized thurmond's vase," an example chosen by combining politicians names with verbs and common nouns. == Related research == John Allen Paulos is known for his interest in mathematical foundations of humor. His book Mathematics and Humor: A Study of the Logic of Humor demonstrates structures common to humor and formal sciences (mathematics, linguistics) and develops a mathematical model of jokes based on catastrophe theory. Conversational systems which have been designed to take part in Turing test competitions generally have the ability to learn humorous anecdotes and jokes. Because many people regard humor as something particular to humans, its appearance in conversation can be quite useful in convincing a human interrogator that a hidden entity, which could be a machine or a human, is in fact a human.

    Read more →
  • Data preprocessing

    Data preprocessing

    Data preprocessing can refer to manipulation, filtration or augmentation of data before it is analyzed, and is often an important step in the data mining process. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues. Preprocessing is the process by which unstructured data is transformed into intelligible representations suitable for machine-learning models. This phase of model deals with noise in order to arrive at better and improved results from the original data set which was noisy. This dataset also has some level of missing value present in it. The preprocessing pipeline used can often have large effects on the conclusions drawn from the downstream analysis. Thus, representation and quality of data is necessary before running any analysis. If there is a high proportion of irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase may be more difficult. Data preparation and filtering steps can take a considerable amount of processing time. Examples of methods used in data preprocessing include cleaning, instance selection, normalization, one-hot encoding, data transformation, feature extraction and feature selection. == Applications == === Data mining === Data preprocessing allows for the removal of unwanted data with the use of data cleaning, this allows the user to have a dataset to contain more valuable information after the preprocessing stage for data manipulation later in the data mining process. Editing such dataset to either correct data corruption or human error is a crucial step to get accurate quantifiers like true positives, true negatives, false positives and false negatives found in a confusion matrix that are commonly used for a medical diagnosis. Users are able to join data files together and use preprocessing to filter any unnecessary noise from the data which can allow for higher accuracy. Users use Python programming scripts accompanied by the pandas library which gives them the ability to import data from a comma-separated values as a data-frame. The data-frame is then used to manipulate data that can be challenging otherwise to do in Excel. Pandas (software) which is a powerful tool that allows for data analysis and manipulation; which makes data visualizations, statistical operations and much more, a lot easier. Many also use the R programming language to do such tasks as well. The reason why a user transforms existing files into a new one is because of many reasons. Aspects of data preprocessing may include imputing missing values, aggregating numerical quantities and transforming continuous data into categories (data binning). More advanced techniques like principal component analysis and feature selection are working with statistical formulas and are applied to complex datasets which are recorded by GPS trackers and motion capture devices. === Semantic data preprocessing === Semantic data mining is a subset of data mining that specifically seeks to incorporate domain knowledge, such as formal semantics, into the data mining process. Domain knowledge is the knowledge of the environment the data was processed in. Domain knowledge can have a positive influence on many aspects of data mining, such as filtering out redundant or inconsistent data during the preprocessing phase. Domain knowledge also works as constraint. It does this by using working as set of prior knowledge to reduce the space required for searching and acting as a guide to the data. Simply put, semantic preprocessing seeks to filter data using the original environment of said data more correctly and efficiently. There are increasingly complex problems which are asking to be solved by more elaborate techniques to better analyze existing information. Instead of creating a simple script for aggregating different numerical values into a single value, it make sense to focus on semantic based data preprocessing. The idea is to build a dedicated ontology, which explains on a higher level what the problem is about. In regards to semantic data mining and semantic pre-processing, ontologies are a way to conceptualize and formally define semantic knowledge and data. The Protégé (software) is the standard tool for constructing an ontology. In general, the use of ontologies bridges the gaps between data, applications, algorithms, and results that occur from semantic mismatches. As a result, semantic data mining combined with ontology has many applications where semantic ambiguity can impact the usefulness and efficiency of data systems. Applications include the medical field, language processing, banking, and even tutoring, among many more. There are various strengths to using a semantic data mining and ontological based approach. As previously mentioned, these tools can help during the per-processing phase by filtering out non-desirable data from the data set. Additionally, well-structured formal semantics integrated into well designed ontologies can return powerful data that can be easily read and processed by machines. A specifically useful example of this exists in the medical use of semantic data processing. As an example, a patient is having a medical emergency and is being rushed to hospital. The emergency responders are trying to figure out the best medicine to administer to help the patient. Under normal data processing, scouring all the patient’s medical data to ensure they are getting the best treatment could take too long and risk the patients’ health or even life. However, using semantically processed ontologies, the first responders could save the patient’s life. Tools like a semantic reasoner can use ontology to infer the what best medicine to administer to the patient is based on their medical history, such as if they have a certain cancer or other conditions, simply by examining the natural language used in the patient's medical records. This would allow the first responders to quickly and efficiently search for medicine without having worry about the patient’s medical history themselves, as the semantic reasoner would already have analyzed this data and found solutions. In general, this illustrates the incredible strength of using semantic data mining and ontologies. They allow for quicker and more efficient data extraction on the user side, as the user has fewer variables to account for, since the semantically pre-processed data and ontology built for the data have already accounted for many of these variables. However, there are some drawbacks to this approach. Namely, it requires a high amount of computational power and complexity, even with relatively small data sets. This could result in higher costs and increased difficulties in building and maintaining semantic data processing systems. This can be mitigated somewhat if the data set is already well organized and formatted, but even then, the complexity is still higher when compared to standard data processing. Below is a simple a diagram combining some of the processes, in particular semantic data mining and their use in ontology. The diagram depicts a data set being broken up into two parts: the characteristics of its domain, or domain knowledge, and then the actual acquired data. The domain characteristics are then processed to become user understood domain knowledge that can be applied to the data. Meanwhile, the data set is processed and stored so that the domain knowledge can applied to it, so that the process may continue. This application forms the ontology. From there, the ontology can be used to analyze data and process results. Fuzzy preprocessing is another, more advanced technique for solving complex problems. Fuzzy preprocessing and fuzzy data mining make use of fuzzy sets. These data sets are composed of two elements: a set and a membership function for the set which comprises 0 and 1. Fuzzy preprocessing uses this fuzzy data set to ground numerical values with linguistic information. Raw data is then transformed into natural language. Ultimately, fuzzy data mining's goal is to help deal with inexact information, such as an incomplete database. Currently fuzzy preprocessing, as well as other fuzzy based data mining techniques see frequent use with neural networks and artificial intelligence.

    Read more →
  • Artificial reproduction

    Artificial reproduction

    Artificial reproduction is the re-creation of life brought about by means other than natural ones. It is new life built by human plans and projects. Examples include artificial selection, artificial insemination, in vitro fertilization, artificial womb, artificial cloning, and kinematic replication. Artificial reproduction is one aspect of artificial life. Artificial reproduction can be categorized into one of two classes according to its capacity to be self-sufficient: non-assisted reproductive technology and assisted reproductive technology. Cutting plants' stems and placing them in compost is a form of assisted artificial reproduction, xenobots are an example of a more autonomous type of reproduction, while the artificial womb presented in the movie the Matrix illustrates a non assisted hypothetical technology. The idea of artificial reproduction has led to various technologies. == Theology == Humans have aspired to create life since immemorial times. Most theologies and religions have conceived this possibility as exclusive of deities. Christian religions consider the possibility of artificial reproduction, in most cases, as heretical and sinful. == Philosophy == Although ancient Greek philosophy raised the concept that man could imitate the creative capacity of nature, classic Greeks thought that if possible, human beings would reproduce things as nature does, and vice versa, nature would do the things that man does in the same way. Aristotle, for example, wrote that if nature made tables, it would make them just as men do. In other words, Aristotle said that if nature were to create a table, such table will look like a human-made table. Correspondingly, Descartes envisioned the human body, and nature, as a machine. Cartesian philosophy does not stop seeing a perfect mirror between nature and the artificial. However, Kant revolutionized this old idea by criticizing such naturalism. Kant pedagogically wrote: "Reason, in order to be taught by nature, must approach nature with its principles in one hand, according to which the agreement among appearances can count as laws, and, in the other hand, the experiment thought out in accord with these principles—in order to be instructed by nature not like a pupil, who has recited to him whatever the teacher wants to say, but like an appointed judge who compels witnesses to answer the questions he puts to them.". Humans are not instructed by nature but rather use nature as raw material to invent. Humans find alternatives to the natural restrictions imposed by natural laws thus, nature is not necessarily mirrored. In accordance with Kant (and contrary to what Aristotle thought) Karl Marx, Alfred Whitehead, Jaques Derrida and Juan David García Bacca noticed that nature is incapable of reproducing tables; or airplanes, or submarines, or computers. If nature tried to create airplanes, it would produce birds. If nature tried to create submarines, it would get fishes. If nature tried to create computers, brains would grow. And if nature tried to create man, modern man, monkeys will be evolved. According to Whitehead, if we look for something natural in artificial life, in the most elaborate cases, if anything, only atoms remain natural. Juan David Garcia Bacca summarized, “It will not come out from wood, it will not be born, a galley; from clay, a vessel; from linen, a dress; from iron, a lever,...From natural, artificial. In the artificial, the natural is reduced to a simple raw material, even though it is perfectly specified with natural specification. The artificial is the real, positive, and original negation of the natural: of species, of genus and of essence. Thus, its ontology is superior to natural ontology. And for this very reason Marx did not attach any importance to Darwin, whose evolutionism is confined to the natural order: to changes, at most, from variety to variety, from species to species... natural. For the same reason, nature has no dialectics, even though continuous evolution and selection can occur. The dialectic cannot emerge from the natural, for deeper reasons than, using today's terms, from a bird, an airplane cannot emerge; from fish, a submarine; from ears, a telephone; from eyes, a television; from a brain, a digital computer; from feet, a car; from hands, an engine; from Euclid, Descartes; from Aristotle, Newton; from Plato, Marx.” According to García Bacca, the major difference between natural causes and artificial causes is that nature does not have plans and projects, while humans design things following plans and projects. In contrast, other influential authors such as Michael Behe have depicted the concept and promoted the idea of intelligent design, a notion that has aroused several doubts and heated controversies, as it reframe natural causes in accordance with a natural plan. Previous ideas that have also provided a positive 'sense' to natural reproduction, are orthogenesis, syntropy, orgone and morphic resonance, among others. Although, these ideas have been historically marginalized and often called pseudoscience, recently Bio-semioticians are reconsidering some of them under symbolic approaches. Current metaphysics of science actually recognizes that the artificial ways of reproduction are diverse from nature, i.e., unnatural, anti-natural or supernatural. Because Biosemiotics does not focus on the function of life but on its meaning, it has a better understanding of the artificial than classic biology. == Science == Biology, being the study of cellular life, addresses reproduction in terms of growth and cellular division (i.e., binary fission, mitosis and meiosis); however, the science of artificial reproduction is not restricted by the mirroring of these natural processes.The science of artificial reproduction is actually transcending the natural forms, and natural rules, of reproduction. For example, xenobots have redefined the classical conception of reproduction. Although xenobots are made of eukariotic cells they do not reproduce by mitosis, but rather by kinematic replication. Such constructive replication does not involve growing but rather building. == Assisted reproductive technologies == Assisted reproductive technology (ART)'s purpose is to assist the development of a human embryo, commonly because of medical concerns due to fertility limitations. == Non-assisted reproductive technologies == Non-assisted reproductive technologies (NART) could have medical motivations but are mostly driven by a wider heterotopic ambition. Although, NARTs are initially designed by humans, they are programed to become independent of humans to a relative or absolute extent. James Lovelock proposed that such novelties could overcome humans. === Artificial cloning === Cloning is the cellular reproductive processes where two or more genetically identical organisms are created, either by natural or artificial means. Artificial cloning normally involves editing the genetic code, somatic cell nuclear transfer and 3D bioprinting. === Non-assisted artificial womb === A non-assisted artificial womb or artificial uterus is a device that allow for ectogenesis or extracorporeal pregnancy by growing an embryonic form outside the body of an organism (that would normally carry the embryo to term) without any human assistance. The aspect of non-assistance is the key distinction between the current artificial womb technology (AWT) in modern medical research, which still relies on human assistance. With this non-assisted hypothetical technology, a zygote or stem cells are used to create an embryo that is then incubated and monitored by artificial intelligence (AI) within a chamber composed of biocompatible material. The AI maintains the necessary conditions for the embryo to develop and thrive, proceeding to mimic organic labor and childbirth in order to best help the embryo adjust to the outside world. Ectogenesis—gestation, depicted in the science fiction movie The Matrix, is a fast approaching reality. This type of innovation presupposes that vertebrate wombs are not the only way for bearing humans or other similar forms of life. === Kinematic replication === Self-replication without binary fission, meiosis, mitosis (or any other form of cellular reproduction that involves division and growing) can be achieved. Xenobots are an example of kinematic replication. They are biobots, named after the African clawed frog (Xenopus laevis). Xenobots are cellular life forms designed by using artificial intelligence to build more of themselves by combining frog cells in a liquid medium. The term kinematic replication is usually reserved for biomolecules (e.g. DNA, RNA, prions, etc.) and artificially designed cellular forms (e.g. xenobots). === Machine constructive replication === Machine constructive replication mimics human traditional manufacturing but is entirely self-automated. Such constructive replication is a more general form of kinematic replication, which does not necessarily

    Read more →
  • Personoid

    Personoid

    Personoid is the concept coined by Stanisław Lem, a Polish science-fiction writer, in Non Serviam, from his book A Perfect Vacuum (1971). His personoids are an abstraction of functions of human mind and they live in computers; they do not need any human-like physical body. In cognitive and software modeling, personoid is a research approach to the development of intelligent autonomous agents. In frame of the IPK (Information, Preferences, Knowledge) architecture, it is a framework of abstract intelligent agent with a cognitive and structural intelligence. It can be seen as an essence of high intelligent entities. From the philosophical and systemics perspectives, personoid societies can also be seen as the carriers of a culture. According to N. Gessler, the personoids study can be a base for the research on artificial culture and culture evolution. == Personoids on TV and cinema == Welt am Draht (1973) The Thirteenth Floor (1999)

    Read more →
  • Netomi

    Netomi

    Netomi, formerly msg.ai, is an American artificial intelligence company and developer of chatbot technologies. == History == msg.ai was founded in May 2015 by Puneet Mehta. msg.ai worked with Sony Pictures to launch a chat bot on Facebook Messenger for a $100M film, Goosebumps and subsequently joined Y Combinator as a member of the Winter 2016 class. Later that year and in 2017, msg.ai completed two rounds of seed funding, led by Y Combinator and Index Ventures. In 2018, the company changed its name to Netomi. In 2019, the company raised $14.7 million in a Series A funding round also led by Index Ventures. In 2021, the company raised $30 million in a Series B funding round led by WndrCo LLC.

    Read more →
  • ASR-complete

    ASR-complete

    ASR-complete is, by analogy to "NP-completeness" in complexity theory, a term to indicate that the difficulty of a computational problem is equivalent to solving the central automatic speech recognition problem, i.e. recognize and understanding spoken language. Unlike "NP-completeness", this term is typically used informally. Such problems are hypothesised to include: Spoken natural language understanding Understanding speech from far-field microphones, i.e. handling the reverbation and background noise These problems are easy for humans to do (in fact, they are described directly in terms of imitating humans). Some systems can solve very simple restricted versions of these problems, but none can solve them in their full generality.

    Read more →
  • World model (artificial intelligence)

    World model (artificial intelligence)

    A world model in artificial intelligence is a machine learning system that builds an internal representation of an environment. The model predicts how that environment changes over time in response to actions. Researchers design world models to help agents plan, reason, and act without constant real-world trial and error. World models differ from systems that merely classify or generate outputs. They simulate dynamics such as physics, object interactions, and causality. Early ideas date to the 1990s. Modern versions power robots, autonomous driving, and interactive video generation. == History == Jürgen Schmidhuber introduced the term world model in machine learning in 1990. He proposed recurrent neural networks that predict future states from observations and use those predictions to train agents. David Ha and Schmidhuber revived the concept in a 2018 paper. Their agents learned to drive virtual cars and play video games inside self-generated simulations. Yann LeCun advanced the idea in a 2022 position paper titled "A Path Towards Autonomous Machine Intelligence". He argued that intelligence requires predictive models of the world rather than pure pattern matching. LeCun proposed the joint embedding predictive architecture (JEPA) as a practical foundation. LeCun and collaborators developed several JEPA variants. V-JEPA 2 reached state-of-the-art performance on video understanding and physical reasoning at the time. It supports zero-shot robot control in unfamiliar environments. Introduced in March 2026, LeWorldModel trains stably end-to-end from raw pixels and uses two loss terms and avoids hand-crafted heuristics. LeCun founded Advanced Machine Intelligence Labs in 2026 to further develop world models. Google DeepMind introduced Genie in 2024. The model learned interactive environments from unlabeled internet videos. Genie 2 followed in late 2024 and added three-dimensional generation. The Genie series set benchmarks for general-purpose simulation. Genie 3 was introduced in August 2025. It produces photorealistic, real-time interactive worlds from text prompts which are displayed at 24 frames per second and explored in real time with text or image prompts. The model supports persistent three-dimensional worlds and real-time interaction. Waymo adopted Genie 3 in February 2026 and used it to create a specialized world model for autonomous driving simulation, called the Waymo World Model. It produces synchronized camera and lidar outputs and creates edge cases that real robotaxis rarely encounter. The edge cases were reported to be unusual by PCMag. General Intuition announced a $133.7 million seed round. World Labs raised $1 billion. AMI raised $1.03 billion. In April 2026, Alibaba announced Happy Oyster, its world model designed for real-time and “flowy” world model. It includes a directing mode for world building based on text and image prompts and a wandering mode for exploring the resulting world. It can generate 3-minute in-world video clips. Also in April, World Labs, co-founded by Li Fei Fei, unveiled Spark 2.0, an open-source 3D Gaussian splatting rendering engine that targets smartphone-class devices. In June 2026, Nvidia released Cosmos 3, a family of open-weight models. It combines previously independent physical reasoning, world simulation, and action generation. Cosmos 3 integrates can process and generate text, image, video, audio, and action sequences. The model employs a Mixture-of-Transformers" (MoT) approach. An autoregressive (AR) transformer handles reasoning and next-token prediction, while a diffusion transformer (DT) does multimodal generation. Encoders (ViT for vision, VAE for visual/audio, and domain-specific for actions) and generate a shared representation space using 3D multi-dimensional rotary position embedding (mRoPE) for spatial and temporal information. The family includes Cosmos3-Nano (16B parameters) for workstations; Cosmos3-Super (64B parameters) for research. == Architecture == World models process raw sensory data such as video frames or lidar scans. They compress this input into compact latent representations. The system then predicts future representations rather than pixel-by-pixel reconstructions. Many modern world models use joint embedding predictive architecture (JEPA). An encoder turns observations into embeddings. A predictor estimates one or a suite of embeddings from the current one and an action. In some cases a critic chooses one embedding as the best result. A regularizer keeps embeddings well-behaved. The model trains by minimizing prediction error in embedding space. This approach avoids the high cost of generating every detail. Some architectures add explicit components. A fast reactive path handles immediate responses. A slower deliberative path performs longer-horizon planning. Video prediction accuracy or robot success rates are key metrics, but do not always predict real-world performance. Generative world models such as Genie 3 combine these with a simulator. They accept text prompts or layouts and output consistent video, lidar, or three-dimensional scenes. World models often train with self-supervised learning. They use large unlabeled datasets of video or robot interactions. Self-supervised learning can speed learning. Reinforcement learning can fine-tune a model for specific tasks. == Applications == World models support robot learning. Agents train inside simulations and transfer skills to the physical world. This reduces the need for dangerous or expensive real-world trials. Autonomous vehicles use world models to test rare events. Waymo's system simulates tornadoes or unusual pedestrian behavior. Companies train planners without putting vehicles on public roads. Interactive entertainment benefits from world models. Genie 3 lets users generate playable environments from simple descriptions. Game studios prototype levels faster. Scientific simulation gains from these models. Researchers model physical systems or biological processes at scale. Planners in logistics or urban design test strategies inside accurate digital twins. == Comparison with large language models == Both world models and large language models (LLMs) use inferencing on their inputs to make predictions. LLMs operate on textual inputs. They predict the next token in text sequences. They excel at language-oriented tasks such as translation or summarization. However, they lack understanding of physics. World models operate on sensor inputs such as pixels. They predict state changes in that data in latent space. This design supports planning and causal reasoning. LLMs generate fluent text but often fail at consistent physical predictions. Their architecture employs transformers with refinements such as mixture of experts. World models divide an inferencing task into work performed by encoders, predictors, simulators, and other pieces. They typically handle multimodal inputs such as video, lidar, radar, and audio, guided by textual prompting. LLMs power chatbots and code assistants. World models drive embodied agents that act in dynamic environments, such as autonomous driving. The two may be combined in hybrid systems. For example, a LLM handles instructions, while a world model manages low-level control. World model proponents such as LeCun claim that because LLMs are trained only on text, they have no ability to predict anything beyond text, such as real-world events. == Benchmarks == World model benchmarks test physical understanding, long-term consistency, planning, and generalization from sensor data. Meta introduced three benchmarks for V-JEPA 2. IntPhys 2 measures a model's ability to detect physics violations. It presents pairs of videos that diverge when one breaks physical rules. Humans score near 100% accuracy. V-JEPA 2 achieves little better than random chance on many conditions. Minimal Video Pairs (MVPBench) tests physical understanding through multiple-choice questions based on short video clips. It probes object interactions and causality. Something-Something tests action recognition. Epic-Kitchens-100 tests human action anticipation. DeepMind benchmark: Interactive evaluation measures consistency over minutes of interaction, memory of off-screen objects, and response to user actions or text prompts. Waymo benchmark: Output generation quality: Metrics include realism, controllability (via text prompts), and usefulness for training planners in simulated worlds. However, pixel reconstruction error rate with episodic rewards often fails. Other: Epic-Kitchens-100 (often measured with Recall@5) Ego4D 50 Salads, Breakfast, etc. Potential benchmarks: Zero-shot transfer to robots Long-horizon planning Implausible prediction rate

    Read more →