AI Chat Character

AI Chat Character — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Reparameterization trick

The reparameterization trick (aka "reparameterization gradient estimator") is a technique used in statistical machine learning, particularly in variational inference, variational autoencoders, and stochastic optimization. It allows for the efficient computation of gradients through random variables, enabling the optimization of parametric probability models using stochastic gradient descent, and the variance reduction of estimators. It was developed in the 1980s in operations research, under the name of "pathwise gradients", or "stochastic gradients". Its use in variational inference was proposed in 2013. == Mathematics == Let z {\displaystyle z} be a random variable with distribution q ϕ ( z ) {\displaystyle q_{\phi }(z)} , where ϕ {\displaystyle \phi } is a vector containing the parameters of the distribution. === REINFORCE estimator === Consider an objective function of the form: L ( ϕ ) = E z ∼ q ϕ ( z ) [ f ( z ) ] {\displaystyle L(\phi )=\mathbb {E} _{z\sim q_{\phi }(z)}[f(z)]} Without the reparameterization trick, estimating the gradient ∇ ϕ L ( ϕ ) {\displaystyle \nabla _{\phi }L(\phi )} can be challenging, because the parameter appears in the random variable itself. In more detail, we have to statistically estimate: ∇ ϕ L ( ϕ ) = ∇ ϕ ∫ d z q ϕ ( z ) f ( z ) {\displaystyle \nabla _{\phi }L(\phi )=\nabla _{\phi }\int dz\;q_{\phi }(z)f(z)} The REINFORCE estimator, widely used in reinforcement learning and especially policy gradient, uses the following equality: ∇ ϕ L ( ϕ ) = ∫ d z q ϕ ( z ) ∇ ϕ ( ln ⁡ q ϕ ( z ) ) f ( z ) = E z ∼ q ϕ ( z ) [ ∇ ϕ ( ln ⁡ q ϕ ( z ) ) f ( z ) ] {\displaystyle \nabla _{\phi }L(\phi )=\int dz\;q_{\phi }(z)\nabla _{\phi }(\ln q_{\phi }(z))f(z)=\mathbb {E} _{z\sim q_{\phi }(z)}[\nabla _{\phi }(\ln q_{\phi }(z))f(z)]} This allows the gradient to be estimated: ∇ ϕ L ( ϕ ) ≈ 1 N ∑ i = 1 N ∇ ϕ ( ln ⁡ q ϕ ( z i ) ) f ( z i ) {\displaystyle \nabla _{\phi }L(\phi )\approx {\frac {1}{N}}\sum _{i=1}^{N}\nabla _{\phi }(\ln q_{\phi }(z_{i}))f(z_{i})} The REINFORCE estimator has high variance, and many methods were developed to reduce its variance. === Reparameterization estimator === The reparameterization trick expresses z {\displaystyle z} as: z = g ϕ ( ϵ ) , ϵ ∼ p ( ϵ ) {\displaystyle z=g_{\phi }(\epsilon ),\quad \epsilon \sim p(\epsilon )} Here, g ϕ {\displaystyle g_{\phi }} is a deterministic function parameterized by ϕ {\displaystyle \phi } , and ϵ {\displaystyle \epsilon } is a noise variable drawn from a fixed distribution p ( ϵ ) {\displaystyle p(\epsilon )} . This gives: L ( ϕ ) = E ϵ ∼ p ( ϵ ) [ f ( g ϕ ( ϵ ) ) ] {\displaystyle L(\phi )=\mathbb {E} _{\epsilon \sim p(\epsilon )}[f(g_{\phi }(\epsilon ))]} Now, the gradient can be estimated as: ∇ ϕ L ( ϕ ) = E ϵ ∼ p ( ϵ ) [ ∇ ϕ f ( g ϕ ( ϵ ) ) ] ≈ 1 N ∑ i = 1 N ∇ ϕ f ( g ϕ ( ϵ i ) ) {\displaystyle \nabla _{\phi }L(\phi )=\mathbb {E} _{\epsilon \sim p(\epsilon )}[\nabla _{\phi }f(g_{\phi }(\epsilon ))]\approx {\frac {1}{N}}\sum _{i=1}^{N}\nabla _{\phi }f(g_{\phi }(\epsilon _{i}))} == Examples == For some common distributions, the reparameterization trick takes specific forms: Normal distribution: For z ∼ N ( μ , σ 2 ) {\displaystyle z\sim {\mathcal {N}}(\mu ,\sigma ^{2})} , we can use: z = μ + σ ϵ , ϵ ∼ N ( 0 , 1 ) {\displaystyle z=\mu +\sigma \epsilon ,\quad \epsilon \sim {\mathcal {N}}(0,1)} Exponential distribution: For z ∼ Exp ( λ ) {\displaystyle z\sim {\text{Exp}}(\lambda )} , we can use: z = − 1 λ log ⁡ ( ϵ ) , ϵ ∼ Uniform ( 0 , 1 ) {\displaystyle z=-{\frac {1}{\lambda }}\log(\epsilon ),\quad \epsilon \sim {\text{Uniform}}(0,1)} Discrete distribution can be reparameterized by the Gumbel distribution (Gumbel-softmax trick or "concrete distribution") and diffusion models. In general, any distribution that is differentiable with respect to its parameters can be reparameterized by inverting the multivariable CDF function, then apply the implicit method. See for an exposition and application to the Gamma, Beta, Dirichlet, and von Mises distributions. == Applications == === Variational autoencoder === In Variational Autoencoders (VAEs), the VAE objective function, known as the Evidence Lower Bound (ELBO), is given by: ELBO ( ϕ , θ ) = E z ∼ q ϕ ( z | x ) [ log ⁡ p θ ( x | z ) ] − D KL ( q ϕ ( z | x ) | | p ( z ) ) {\displaystyle {\text{ELBO}}(\phi ,\theta )=\mathbb {E} _{z\sim q_{\phi }(z|x)}[\log p_{\theta }(x|z)]-D_{\text{KL}}(q_{\phi }(z|x)||p(z))} where q ϕ ( z | x ) {\displaystyle q_{\phi }(z|x)} is the encoder (recognition model), p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} is the decoder (generative model), and p ( z ) {\displaystyle p(z)} is the prior distribution over latent variables. The gradient of ELBO with respect to θ {\displaystyle \theta } is simply E z ∼ q ϕ ( z | x ) [ ∇ θ log ⁡ p θ ( x | z ) ] ≈ 1 L ∑ l = 1 L ∇ θ log ⁡ p θ ( x | z l ) {\displaystyle \mathbb {E} _{z\sim q_{\phi }(z|x)}[\nabla _{\theta }\log p_{\theta }(x|z)]\approx {\frac {1}{L}}\sum _{l=1}^{L}\nabla _{\theta }\log p_{\theta }(x|z_{l})} but the gradient with respect to ϕ {\displaystyle \phi } requires the trick. Express the sampling operation z ∼ q ϕ ( z | x ) {\displaystyle z\sim q_{\phi }(z|x)} as: z = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ , ϵ ∼ N ( 0 , I ) {\displaystyle z=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon ,\quad \epsilon \sim {\mathcal {N}}(0,I)} where μ ϕ ( x ) {\displaystyle \mu _{\phi }(x)} and σ ϕ ( x ) {\displaystyle \sigma _{\phi }(x)} are the outputs of the encoder network, and ⊙ {\displaystyle \odot } denotes element-wise multiplication. Then we have ∇ ϕ ELBO ( ϕ , θ ) = E ϵ ∼ N ( 0 , I ) [ ∇ ϕ log ⁡ p θ ( x | z ) + ∇ ϕ log ⁡ q ϕ ( z | x ) − ∇ ϕ log ⁡ p ( z ) ] {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi ,\theta )=\mathbb {E} _{\epsilon \sim {\mathcal {N}}(0,I)}[\nabla _{\phi }\log p_{\theta }(x|z)+\nabla _{\phi }\log q_{\phi }(z|x)-\nabla _{\phi }\log p(z)]} where z = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ {\displaystyle z=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon } . This allows us to estimate the gradient using Monte Carlo sampling: ∇ ϕ ELBO ( ϕ , θ ) ≈ 1 L ∑ l = 1 L [ ∇ ϕ log ⁡ p θ ( x | z l ) + ∇ ϕ log ⁡ q ϕ ( z l | x ) − ∇ ϕ log ⁡ p ( z l ) ] {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi ,\theta )\approx {\frac {1}{L}}\sum _{l=1}^{L}[\nabla _{\phi }\log p_{\theta }(x|z_{l})+\nabla _{\phi }\log q_{\phi }(z_{l}|x)-\nabla _{\phi }\log p(z_{l})]} where z l = μ ϕ ( x ) + σ ϕ ( x ) ⊙ ϵ l {\displaystyle z_{l}=\mu _{\phi }(x)+\sigma _{\phi }(x)\odot \epsilon _{l}} and ϵ l ∼ N ( 0 , I ) {\displaystyle \epsilon _{l}\sim {\mathcal {N}}(0,I)} for l = 1 , … , L {\displaystyle l=1,\ldots ,L} . This formulation enables backpropagation through the sampling process, allowing for end-to-end training of the VAE model using stochastic gradient descent or its variants. === Variational inference === More generally, the trick allows using stochastic gradient descent for variational inference. Let the variational objective (ELBO) be of the form: ELBO ( ϕ ) = E z ∼ q ϕ ( z ) [ log ⁡ p ( x , z ) − log ⁡ q ϕ ( z ) ] {\displaystyle {\text{ELBO}}(\phi )=\mathbb {E} _{z\sim q_{\phi }(z)}[\log p(x,z)-\log q_{\phi }(z)]} Using the reparameterization trick, we can estimate the gradient of this objective with respect to ϕ {\displaystyle \phi } : ∇ ϕ ELBO ( ϕ ) ≈ 1 L ∑ l = 1 L ∇ ϕ [ log ⁡ p ( x , g ϕ ( ϵ l ) ) − log ⁡ q ϕ ( g ϕ ( ϵ l ) ) ] , ϵ l ∼ p ( ϵ ) {\displaystyle \nabla _{\phi }{\text{ELBO}}(\phi )\approx {\frac {1}{L}}\sum _{l=1}^{L}\nabla _{\phi }[\log p(x,g_{\phi }(\epsilon _{l}))-\log q_{\phi }(g_{\phi }(\epsilon _{l}))],\quad \epsilon _{l}\sim p(\epsilon )} === Dropout === The reparameterization trick has been applied to reduce the variance in dropout, a regularization technique in neural networks. The original dropout can be reparameterized with Bernoulli distributions: y = ( W ⊙ ϵ ) x , ϵ i j ∼ Bernoulli ( α i j ) {\displaystyle y=(W\odot \epsilon )x,\quad \epsilon _{ij}\sim {\text{Bernoulli}}(\alpha _{ij})} where W {\displaystyle W} is the weight matrix, x {\displaystyle x} is the input, and α i j {\displaystyle \alpha _{ij}} are the (fixed) dropout rates. More generally, other distributions can be used than the Bernoulli distribution, such as the gaussian noise: y i = μ i + σ i ⊙ ϵ i , ϵ i ∼ N ( 0 , I ) {\displaystyle y_{i}=\mu _{i}+\sigma _{i}\odot \epsilon _{i},\quad \epsilon _{i}\sim {\mathcal {N}}(0,I)} where μ i = m i ⊤ x {\displaystyle \mu _{i}=\mathbf {m} _{i}^{\top }x} and σ i 2 = v i ⊤ x 2 {\displaystyle \sigma _{i}^{2}=\mathbf {v} _{i}^{\top }x^{2}} , with m i {\displaystyle \mathbf {m} _{i}} and v i {\displaystyle \mathbf {v} _{i}} being the mean and variance of the i {\displaystyle i} -th output neuron. The reparameterization trick can be applied to all such cases, resulting in the variational dropout method.
Read more →
Collostructional analysis

Collostructional analysis is a family of methods developed by (in alphabetical order) Stefan Th. Gries (University of California, Santa Barbara) and Anatol Stefanowitsch (Free University of Berlin). Collostructional analysis aims at measuring the degree of attraction or repulsion that words exhibit to constructions, where the notion of construction has so far been that of Goldberg's construction grammar. == Collostructional methods == Collostructional analysis so far comprises three different methods: collexeme analysis, to measure the degree of attraction/repulsion of a lemma to a slot in one particular construction; distinctive collexeme analysis, to measure the preference of a lemma to one particular construction over another, functionally similar construction; multiple distinctive collexeme analysis extends this approach to more than two alternative constructions; covarying collexeme analysis, to measure the degree of attraction of lemmas in one slot of a construction to lemmas in another slot of the same construction. == Input frequencies == Collostructional analysis requires frequencies of words and constructions and is similar to a wide variety of collocation statistics. It differs from raw frequency counts by providing not only observed co-occurrence frequencies of words and constructions, but also (i) a comparison of the observed frequency to the one expected by chance; thus, collostructional analysis can distinguish attraction and repulsion of words and constructions; (ii) a measure of the strength of the attraction or repulsion; this is usually the log-transformed p-value of a Fisher-Yates exact test. == Versus other collocation statistics == Collostructional analysis differs from most collocation statistics such that (i) it measures not the association of words to words, but of words to syntactic patterns or constructions; thus, it takes syntactic structure more seriously than most collocation-based analyses; (ii) it has so far only used the most precise statistics, namely the Fisher-Yates exact test based on the hypergeometric distribution; thus, unlike t-scores, z-scores, chi-square tests etc., the analysis is not based on, and does not violate, any distributional assumptions.
Read more →
Ω-automaton

In automata theory, a branch of theoretical computer science, an ω-automaton (or stream automaton) is a variation of a finite automaton that runs on infinite, rather than finite, strings as input. Since ω-automata do not stop, they have a variety of acceptance conditions rather than simply a set of accepting states. ω-automata are useful for specifying behavior of systems that are not expected to terminate, such as hardware, operating systems and control systems. For such systems, one may want to specify a property such as "for every request, an acknowledge eventually follows", or its negation "there is a request that is not followed by an acknowledge". The former is a property of infinite words: one cannot say of a finite sequence that it satisfies this property. Classes of ω-automata include the Büchi automata, Rabin automata, Streett automata, parity automata and Muller automata, each deterministic or non-deterministic. These classes of ω-automata differ only in terms of acceptance condition. They all recognize precisely the regular ω-languages except for the deterministic Büchi automata, which is strictly weaker than all the others. Although all these types of automata recognize the same set of ω-languages, they nonetheless differ in succinctness of representation for a given ω-language. == Deterministic ω-automata == Formally, a deterministic ω-automaton is a tuple A = ( Q , Σ , δ , q 0 , A a c c ) {\textstyle A=(Q,\Sigma ,\delta ,q_{0},A_{acc})} , that consists of the following components: Q {\textstyle Q} , is a finite set. The elements of Q {\textstyle Q} are called the states of A {\textstyle A} . Σ {\textstyle \Sigma } , is a finite set called the alphabet of A {\textstyle A} . δ : Q × Σ → Q {\textstyle \delta \colon Q\times \Sigma \rightarrow Q} is a function, called the transition function of A {\textstyle A} . Q 0 {\textstyle Q_{0}} is an element of Q {\textstyle Q} , called the initial state. A a c c {\textstyle A_{acc}} is a set of accepting states of A {\textstyle A} , formally a subset of Q ω {\textstyle Q^{\omega }} . An input for A {\textstyle A} is an infinite string over the alphabet Σ {\textstyle \Sigma } , i.e. it is an infinite sequence α = ( a 1 , a 2 , a 3 , … ) {\textstyle \alpha =(a_{1},a_{2},a_{3},\ldots )} . The run of A {\textstyle A} on such an input is an infinite sequence ρ = ( r 0 , r 1 , r 2 , … ) {\textstyle \rho =(r_{0},r_{1},r_{2},\ldots )} of states, defined as follows: r 0 = q 0 {\textstyle r_{0}=q_{0}} . r 1 = δ ( r 0 , a 1 ) {\textstyle r_{1}=\delta (r_{0},a_{1})} . r 2 = δ ( r 1 , a 2 ) {\textstyle r_{2}=\delta (r_{1},a_{2})} . ... that is, for every i {\textstyle i} : r i = δ ( r i − 1 , a i ) {\textstyle r_{i}=\delta (r_{i-1},a_{i})} . The main purpose of an ω-automaton is to define a subset of the set of all inputs: The set of accepted inputs. Whereas in the case of an ordinary finite automaton every run ends with a state r n {\textstyle r_{n}} and the input is accepted if and only if r n {\textstyle r_{n}} is an accepting state, the definition of the set of accepted inputs is more complicated for ω-automata. Here we must look at the entire run ρ {\textstyle \rho } . The input is accepted if the corresponding run is in Acc {\textstyle {\text{Acc}}} . The set of accepted input ω-words is called the recognized ω-language by the automaton, which is denoted as L ( A ) {\textstyle L(A)} . The definition of Acc {\textstyle {\text{Acc}}} as a subset of Q ω {\textstyle Q^{\omega }} is purely formal and not suitable for practice because normally such sets are infinite. The difference between various types of ω-automata (Büchi, Rabin etc.) consists in how they encode certain subsets Acc {\textstyle {\text{Acc}}} of Q ω {\textstyle Q^{\omega }} as finite sets, and therefore in which such subsets they can encode. == Nondeterministic ω-automata == Formally, a nondeterministic ω-automaton is a tuple A = ( Q , Σ , Δ , Q 0 , Acc ) {\textstyle A=(Q,\Sigma ,\Delta ,Q_{0},{\text{Acc}})} that consists of the following components: Q {\textstyle Q} is a finite set. The elements of Q {\textstyle Q} are called the states of A {\textstyle A} . Σ {\textstyle \Sigma } is a finite set called the alphabet of A {\textstyle A} . Δ {\textstyle \Delta } is a subset of Q × Σ × Q {\textstyle Q\times \Sigma \times Q} and is called the transition relation of A {\textstyle A} . Q 0 {\textstyle Q_{0}} is a subset of Q {\textstyle Q} , called the initial set of states. Acc {\textstyle {\text{Acc}}} is the acceptance condition, a subset of Q ω {\textstyle Q^{\omega }} . Unlike a deterministic ω-automaton, which has a transition function δ {\textstyle \delta } , the non-deterministic version has a transition relation Δ {\textstyle \Delta } . Note that Δ {\textstyle \Delta } can be regarded as a function Q × Σ → P ( Q ) {\textstyle Q\times \Sigma \rightarrow {\mathcal {P}}(Q)} from Q × Σ {\textstyle Q\times \Sigma } to the power set P ( Q ) {\textstyle {\mathcal {P}}(Q)} . Thus, given a state q n {\textstyle q_{n}} and a symbol a n {\textstyle a_{n}} , the next state q n + 1 {\textstyle q_{n+1}} is not necessarily determined uniquely, rather there is a set of possible next states. A run of A {\textstyle A} on the input α = ( a 1 , a 2 , a 3 , … ) {\textstyle \alpha =(a_{1},a_{2},a_{3},\ldots )} is any infinite sequence ρ = ( r 0 , r 1 , r 2 , … ) {\textstyle \rho =(r_{0},r_{1},r_{2},\ldots )} of states that satisfies the following conditions: r 0 {\textstyle r_{0}} is an element of Q 0 {\textstyle Q_{0}} . r 1 {\textstyle r_{1}} is an element of Δ ( r 0 , a 1 ) {\textstyle \Delta (r_{0},a_{1})} . r 2 {\textstyle r_{2}} is an element of Δ ( r 1 , a 2 ) {\textstyle \Delta (r_{1},a_{2})} . ... that is, for every i {\textstyle i} : r i {\textstyle r_{i}} is an element of Δ ( r i − 1 , a i ) {\textstyle \Delta (r_{i-1},a_{i})} . A nondeterministic ω-automaton may admit many different runs on any given input, or none at all. The input is accepted if at least one of the possible runs is accepting. Whether a run is accepting depends only on Acc {\textstyle {\text{Acc}}} , as for deterministic ω-automata. Every deterministic ω-automaton can be regarded as a nondeterministic ω-automaton by taking Δ {\textstyle \Delta } to be the graph of δ {\textstyle \delta } . The definitions of runs and acceptance for deterministic ω-automata are then special cases of the nondeterministic cases. == Acceptance conditions == Acceptance conditions may be infinite sets of ω-words. However, people mostly study acceptance conditions that are finitely representable. The following lists a variety of popular acceptance conditions. Before discussing the list, let's make the following observation. In the case of infinitely running systems, one is often interested in whether certain behavior is repeated infinitely often. For example, if a network card receives infinitely many ping requests, then it may fail to respond to some of the requests but should respond to an infinite subset of received ping requests. This motivates the following definition: For any run ρ {\textstyle \rho } , let Inf ( ρ ) {\textstyle {\text{Inf}}(\rho )} be the set of states that occur infinitely often in ρ {\textstyle \rho } . This notion of certain states being visited infinitely often will be helpful in defining the following acceptance conditions. A Büchi automaton is an ω-automaton A {\textstyle A} that uses the following acceptance condition, for some subset F {\textstyle F} of Q {\textstyle Q} : Büchi condition A {\textstyle A} accepts exactly those runs ρ {\textstyle \rho } for which Inf ( ρ ) ∩ F ≠ ∅ {\textstyle {\text{Inf}}(\rho )\cap F\neq \emptyset } , i.e. there is an accepting state that occurs infinitely often in ρ {\textstyle \rho } . A Rabin automaton is an ω-automaton A {\textstyle A} that uses the following acceptance condition, for some set Ω {\textstyle \Omega } of pairs ( B i , G i ) {\textstyle (B_{i},G_{i})} of sets of states: Rabin condition A {\textstyle A} accepts exactly those runs ρ {\textstyle \rho } for which there exists a pair ( B i , G i ) {\textstyle (B_{i},G_{i})} in Ω {\textstyle \Omega } such that B i ∩ Inf ( ρ ) = ∅ {\textstyle B_{i}\cap {\text{Inf}}(\rho )=\emptyset } and G i ∩ Inf ( ρ ) ≠ ∅ {\textstyle G_{i}\cap {\text{Inf}}(\rho )\neq \emptyset } . A Streett automaton is an ω-automaton A {\textstyle A} that uses the following acceptance condition, for some set Ω {\textstyle \Omega } of pairs ( B i , G i ) {\textstyle (B_{i},G_{i})} of sets of states: Streett condition A {\textstyle A} accepts exactly those runs ρ {\textstyle \rho } such that for all pairs ( B i , G i ) {\textstyle (B_{i},G_{i})} in Ω {\textstyle \Omega } , B i ∩ Inf ( ρ ) ≠ ∅ {\textstyle B_{i}\cap {\text{Inf}}(\rho )\neq \emptyset } or G i ∩ Inf ( ρ ) = ∅ {\textstyle G_{i}\cap {\text{Inf}}(\rho )=\emptyset } . A parity automaton is an automaton A {\textstyle A} whose set of states is Q = { 0 , 1 , 2 , … , k } {\textstyle Q=\{0,1,2,\ldots ,k\}} for some natural number k {\textst
Read more →
P4-metric

The P4 metric (also known as FS or Symmetric F ) enables performance evaluation of a binary classifier. The P4 metric is calculated from precision, recall, specificity, and NPV (negative predictive value). The definition of the P4 metric is similar to that of the F1 metric, however the P4 metric definition addresses criticisms leveled against the definition of the F1 metric. The definition of the P4 metric may, therefore, be understood as an extension of the F1 metric. Like the other known metrics, the P4 metric is a function of: TP (true positives), TN (true negatives), FP (false positives), FN (false negatives). == Justification == The key concept of the P4 metric is to leverage the four key conditional probabilities: P ( + ∣ C + ) {\displaystyle P(+\mid C{+})} — the probability that the sample is positive, provided the classifier result was positive. P ( C + ∣ + ) {\displaystyle P(C{+}\mid +)} — the probability that the classifier result will be positive, provided the sample is positive. P ( C − ∣ − ) {\displaystyle P(C{-}\mid -)} — the probability that the classifier result will be negative, provided the sample is negative. P ( − ∣ C − ) {\displaystyle P(-\mid C{-})} — the probability the sample is negative, provided the classifier result was negative. The main assumption behind this metric is that all the probabilities mentioned above are close to 1 for a properly designed binary classifier. Indeed, P 4 = 1 {\displaystyle \mathrm {P} _{4}=1} if, and only if, all of the probabilities above are equal to 1. Another important feature is that P 4 {\displaystyle \mathrm {P} _{4}} tends to zero any of the above probabilities tend to zero. == Definition == P4 is defined as a harmonic mean of four key conditional probabilities: P 4 = 4 1 P ( + ∣ C + ) + 1 P ( C + ∣ + ) + 1 P ( C − ∣ − ) + 1 P ( − ∣ C − ) = 4 1 p r e c i s i o n + 1 r e c a l l + 1 s p e c i f i c i t y + 1 N P V . {\displaystyle \mathrm {P} _{4}={\frac {4}{{\frac {1}{P(+\mid C{+})}}+{\frac {1}{P(C{+}\mid +)}}+{\frac {1}{P(C{-}\mid -)}}+{\frac {1}{P(-\mid C{-})}}}}={\frac {4}{{\frac {1}{\mathit {precision}}}+{\frac {1}{\mathit {recall}}}+{\frac {1}{\mathit {specificity}}}+{\frac {1}{\mathit {NPV}}}}}.} In terms of TP,TN,FP,FN it can be calculated as follows: P 4 = 4 ⋅ T P ⋅ T N 4 ⋅ T P ⋅ T N + ( T P + T N ) ⋅ ( F P + F N ) . {\displaystyle \mathrm {P} _{4}={\frac {4\cdot \mathrm {TP} \cdot \mathrm {TN} }{4\cdot \mathrm {TP} \cdot \mathrm {TN} +(\mathrm {TP} +\mathrm {TN} )\cdot (\mathrm {FP} +\mathrm {FN} )}}.} == Evaluation of the binary classifier performance == Evaluating the performance of binary classifiers is a multidisciplinary concept. It spans from the evaluation of medical tests, psychiatric tests to machine learning classifiers from a variety of fields. Thus, many of the metrics in use exist under several names, some defined independently. == Properties of P4 metric == Symmetry — contrasting to the F1 metric, P4 is symmetrical. It means - it does not change its value when dataset labeling is changed - positives named negatives and negatives named positives. Range: P 4 ∈ [ 0 , 1 ] {\displaystyle \mathrm {P} _{4}\in [0,1]} . Achieving P 4 ≈ 1 {\displaystyle \mathrm {P} _{4}\approx 1} requires all the key four conditional probabilities being close to 1. For P 4 ≈ 0 {\displaystyle \mathrm {P} _{4}\approx 0} it is sufficient that one of the key four conditional probabilities is close to 0. == Examples, comparing with the other metrics == Dependency table for selected metrics ("true" means depends, "false" - does not depend): Metrics that do not depend on a given probability are prone to misrepresentation when the probability approaches 0. === Example 1: Rare disease detection test === Let us consider a medical test used to detect a rare disease. Suppose a population size of 100000 and 0.05% of the population is infected. Further suppose the following test performance: 95% of all positive individuals are classified correctly (TPR=0.95) and 95% of all negative individuals are classified correctly (TNR=0.95). In such a case, due to high population imbalance and in spite of having high test accuracy (0.95), the probability that an individual who has been classified as positive is in fact positive is very low: P ( + ∣ C + ) = 0.0095. {\displaystyle P(+\mid C{+})=0.0095.} We can observe how this low probability is reflected in some of the metrics: P 4 = 0.0370 {\displaystyle \mathrm {P} _{4}=0.0370} , F 1 = 0.0188 {\displaystyle \mathrm {F} _{1}=0.0188} , J = 0.9100 {\displaystyle \mathrm {J} =\mathbf {0.9100} } (Informedness / Youden index), M K = 0.0095 {\displaystyle \mathrm {MK} =0.0095} (Markedness). === Example 2: Image recognition — cats vs dogs === Consider the problem of training a neural network based image classifier with only two types of images: those containing dogs (labeled as 0) and those containing cats (labeled as 1). Thus, the goal is to distinguish between the cats and dogs. Suppose that the classifier overpredicts in favour of cats ("positive" samples): 99.99% of cats are classified correctly and only 1% of dogs are classified correctly. Further, suppose that the image dataset consists of 100000 images, 90% of which are pictures of cats and 10% are pictures of dogs. In this situation, the probability that the picture containing dog will be classified correctly is pretty low: P ( C − | − ) = 0.01. {\displaystyle P(C-|-)=0.01.} Not all metrics are notice this low probability: P 4 = 0.0388 {\displaystyle \mathrm {P} _{4}=0.0388} , F 1 = 0.9478 {\displaystyle \mathrm {F} _{1}=\mathbf {0.9478} } , J = 0.0099 {\displaystyle \mathrm {J} =0.0099} (Informedness / Youden index), M K = 0.8183 {\displaystyle \mathrm {MK} =\mathbf {0.8183} } (Markedness).
Read more →
Semantic interpretation

Semantic interpretation is an important component in dialog systems. It is related to natural language understanding, but mostly it refers to the last stage of understanding. The goal of interpretation is binding the user utterance to concept, or something the system can understand. Typically it is creating a database query based on user utterance.
Read more →
Noémie Elhadad

Noémie Elhadad is an American data scientist who is an associate professor of biomedical informatics at the Columbia University Vagelos College of Physicians and Surgeons. As of 2022, she serves as the chair of the Department of Biomedical Informatics. Her research considers machine learning in bioinformatics, natural language processing and medicine. == Early life and education == Elhadad studied computer software engineering at École nationale supérieure d'électronique, informatique, télécommunications, mathématique et mécanique de Bordeaux (ENSEIRB). She completed her doctoral research at Columbia University. She was based in the Department of Computer Science, where she developed patient-focused text summaries of clinical literature. == Research and career == Elhadad joined the faculty at the City College of New York. In 2007 she joined the Department of Biomedical Informatics at Columbia University. She was made Chair of the Health Analytics Center at the Columbia Data Science Institute in 2013. Her research considers how clinical data, electronic health records and patient-generated data can enhance access to information for researchers, patients and physicians. She developed an artificial intelligence tool that supported patients in the NewYork-Presbyterian Hospital. Elhadad is interested in using data to advance women's health. She led the Citizen Endo Project that looks to comprehensively describe how patients experience endometriosis. It was built using principles of citizen science, using patient testimonials from focus groups in New York City and data aggregation. She created the app, Phendo, which asks patients about their experience of the disease. The name Phendo is a portmanteau of phenotyping endometriosis. Elhadad was announced as chair of the Department of Biomedical Informatics in December 2022. == Selected publications == Caruana, Rich; Lou, Yin; Gehrke, Johannes; Koch, Paul; Sturm, Marc; Elhadad, Noemie (August 10, 2015). "Intelligible Models for HealthCare". Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM. pp. 1721–1730. doi:10.1145/2783258.2788613. ISBN 9781450336642. S2CID 14190268. Chaitanya Shivade; Preethi Raghavan; Eric Fosler-Lussier; Peter J Embi; Noemie Elhadad; Stephen B Johnson; Albert M Lai (November 7, 2013). "A review of approaches to identifying patient phenotype cohorts using electronic health records". Journal of the American Medical Informatics Association. 21 (2): 221–230. doi:10.1136/AMIAJNL-2013-001935. ISSN 1067-5027. PMC 3932460. PMID 24201027. Wikidata Q37598951. Shivade, Chaitanya; Raghavan, Preethi; Fosler-Lussier, Eric; Embi, Peter J; Elhadad, Noemie; Johnson, Stephen B; Lai, Albert M (March 2014). "A review of approaches to identifying patient phenotype cohorts using electronic health records". Journal of the American Medical Informatics Association. 21 (2): 221–230. doi:10.1136/amiajnl-2013-001935. ISSN 1067-5027. PMC 3932460. PMID 24201027. == Personal life == Elhadad suffers from endometriosis.
Read more →
Indic OCR

Indic OCR refers to the process of converting text images written in Indic scripts into e-text using Optical character recognition (OCR) techniques. Broadly, it can also refer to the OCR systems of Brahmic scripts for languages of South Asia and Southeast Asia, not just the scripts of the Indian subcontinent, which are all written in an abugida-based writing system. OCR for Latin characters is still not 100% accurate but a relatively high degree of accuracy in conversion has been able to be achieved. Such accuracy has not yet been able to be achieved for Indic scripts using OCR. This is due in part to the writing systems of Indic languages as well as a lack of standard representation, encoding, and support among operating systems and keyboards. The Centre for Development of Advanced Computing (C-DAC) and Technology Development for Indian Languages, the premier R&D organisation of the Ministry of Electronics and Information Technology (also known as MeitY) of India have carried out many projects relating to OCR. Their projects include OCR for Malayalam, Odia, Punjabi, Telugu and Devanagari script. == Properties of Indian writing systems == There are 22 officially recognised languages in India. Of these, Hindi, Bengali and Punjabi are the most widely spoken Indo-Aryan languages and are also the fourth, seventh and tenth most widely spoken languages in the world respectively. Two or more languages can be written with same script. For example, Devanagari is used to write Hindi, Marathi, Rajasthani, Sanskrit, Bhojpuri and others, while Eastern Nagari is used to write Bengali, Assamese, Manipuri and others. Apart from basic characters as consonants and vowels, most Indic languages combine 2 or more basic characters to form compound characters. The shape of a compound character is more complex than the constituent basic characters. Some Indo-Aryan languages (including Hindi and Punjabi) have a horizontal line over the characters, while other languages (including Gujarati) and Dravidian languages (Malayalam, Kannada, Tamil, and Telugu) do not. These are some of the main challenges for creating a single OCR for all Indic languages. Indic OCR also generally includes support for recently invented scripts in India like Ol Chiki, Warang Citi, Mundari Bani, etc. which are mainly created for writing Munda languages of Austroasiatic family. The concept of upper/lower case is absent in Indic scripts. Apart from Urdu, Sindhi, Kashmiri and Thaana, all other Indic languages are written from left to right. == Examples == SanskritOCR - OCR software for Sanskrit, Hindi and other Indo-Aryan languages based on the Devanagari script. Sanskrit OCR is developed by a Sanskrit scholar from Germany - Dr. Oliver Hellwig of Department for Languages and Cultures of Southern Asia, Freie Universität Berlin. The official website is in German. The interface of earlier versions of the software was also in German, but later versions have an English interface too. E-aksharayan - Optical character recognition engine for Indian languages Chitrankan - This technology was developed by ISI, Kolkata, and transferred to C-DAC. It processes printed Hindi text from a scanner or from an image. Indic OCR models for Tesseract (software) == OCR in use == OCR has been used for Wikisource and other projects.
Read more →
Sparse dictionary learning

Sparse dictionary learning (also known as sparse coding or SDL) is a representation learning method which aims to find a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms, and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than any one of the signals being observed. These two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal, but also provide an improvement in sparsity and flexibility of the representation. One of the most important applications of sparse dictionary learning is in the field of compressed sensing or signal recovery. In compressed sensing, a high-dimensional signal can be recovered with only a few linear measurements, provided that the signal is sparse or near-sparse. Since not all signals satisfy this condition, it is crucial to find a sparse representation of that signal such as the wavelet transform or the directional gradient of a rasterized matrix. Once a matrix or a high-dimensional vector is transferred to a sparse space, different recovery algorithms like basis pursuit, CoSaMP, or fast non-iterative algorithms can be used to recover the signal. One of the key principles of dictionary learning is that the dictionary has to be inferred from the input data. The emergence of sparse dictionary learning methods was stimulated by the fact that in signal processing, one typically wants to represent the input data using a minimal amount of components. Before this approach, the general practice was to use predefined dictionaries such as Fourier or wavelet transforms. However, in certain cases, a dictionary that is trained to fit the input data can significantly improve the sparsity, which has applications in data decomposition, compression, and analysis, and has been used in the fields of image denoising and classification, and video and audio processing. Sparsity and overcomplete dictionaries have immense applications in image compression, image fusion, and inpainting. == Problem statement == Given the input dataset X = [ x 1 , . . . , x K ] , x i ∈ R d {\displaystyle X=[x_{1},...,x_{K}],x_{i}\in \mathbb {R} ^{d}} we wish to find a dictionary D ∈ R d × n : D = [ d 1 , . . . , d n ] {\displaystyle \mathbf {D} \in \mathbb {R} ^{d\times n}:D=[d_{1},...,d_{n}]} and a representation R = [ r 1 , . . . , r K ] , r i ∈ R n {\displaystyle R=[r_{1},...,r_{K}],r_{i}\in \mathbb {R} ^{n}} such that both ‖ X − D R ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}} is minimized and the representations r i {\displaystyle r_{i}} are sparse enough. This can be formulated as the following optimization problem: argmin D ∈ C , r i ∈ R n ∑ i = 1 K ‖ x i − D r i ‖ 2 2 + λ ‖ r i ‖ 0 {\displaystyle {\underset {\mathbf {D} \in {\mathcal {C}},r_{i}\in \mathbb {R} ^{n}}{\text{argmin}}}\sum _{i=1}^{K}\|x_{i}-\mathbf {D} r_{i}\|_{2}^{2}+\lambda \|r_{i}\|_{0}} , where C ≡ { D ∈ R d × n : ‖ d i ‖ 2 ≤ 1 ∀ i = 1 , . . . , n } {\displaystyle {\mathcal {C}}\equiv \{\mathbf {D} \in \mathbb {R} ^{d\times n}:\|d_{i}\|_{2}\leq 1\,\,\forall i=1,...,n\}} , λ > 0 {\displaystyle \lambda >0} C {\displaystyle {\mathcal {C}}} is required to constrain D {\displaystyle \mathbf {D} } so that its atoms would not reach arbitrarily high values allowing for arbitrarily low (but non-zero) values of r i {\displaystyle r_{i}} . λ {\displaystyle \lambda } controls the trade off between the sparsity and the minimization error. The minimization problem above is not convex because of the ℓ0-"norm" and solving this problem is NP-hard. In some cases L1-norm is known to ensure sparsity and so the above becomes a convex optimization problem with respect to each of the variables D {\displaystyle \mathbf {D} } and R {\displaystyle \mathbf {R} } when the other one is fixed, but it is not jointly convex in ( D , R ) {\displaystyle (\mathbf {D} ,\mathbf {R} )} . === Properties of the dictionary === The dictionary D {\displaystyle \mathbf {D} } defined above can be "undercomplete" if n < d {\displaystyle n d {\displaystyle n>d} with the latter being a typical assumption for a sparse dictionary learning problem. The case of a complete dictionary does not provide any improvement from a representational point of view and thus isn't considered. Undercomplete dictionaries represent the setup in which the actual input data lies in a lower-dimensional space. This case is strongly related to dimensionality reduction and techniques like principal component analysis which require atoms d 1 , . . . , d n {\displaystyle d_{1},...,d_{n}} to be orthogonal. The choice of these subspaces is crucial for efficient dimensionality reduction, but it is not trivial. And dimensionality reduction based on dictionary representation can be extended to address specific tasks such as data analysis or classification. However, their main downside is limiting the choice of atoms. Overcomplete dictionaries, however, do not require the atoms to be orthogonal (they will never have a basis anyway) thus allowing for more flexible dictionaries and richer data representations. An overcomplete dictionary which allows for sparse representation of signal can be a famous transform matrix (wavelets transform, fourier transform) or it can be formulated so that its elements are changed in such a way that it sparsely represents the given signal in a best way. Learned dictionaries are capable of giving sparser solutions as compared to predefined transform matrices. == Algorithms == As the optimization problem described above can be solved as a convex problem with respect to either dictionary or sparse coding while the other one of the two is fixed, most of the algorithms are based on the idea of iteratively updating one and then the other. The problem of finding an optimal sparse coding R {\displaystyle R} with a given dictionary D {\displaystyle \mathbf {D} } is known as sparse approximation (or sometimes just sparse coding problem). A number of algorithms have been developed to solve it (such as matching pursuit and LASSO) and are incorporated in the algorithms described below. === Method of optimal directions (MOD) === The method of optimal directions (or MOD) was one of the first methods introduced to tackle the sparse dictionary learning problem. The core idea of it is to solve the minimization problem subject to the limited number of non-zero components of the representation vector: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T} Here, F {\displaystyle F} denotes the Frobenius norm. MOD alternates between getting the sparse coding using a method such as matching pursuit and updating the dictionary by computing the analytical solution of the problem given by D = X R + {\displaystyle \mathbf {D} =XR^{+}} where R + {\displaystyle R^{+}} is a Moore-Penrose pseudoinverse. After this update D {\displaystyle \mathbf {D} } is renormalized to fit the constraints and the new sparse coding is obtained again. The process is repeated until convergence (or until a sufficiently small residue). MOD has proved to be a very efficient method for low-dimensional input data X {\displaystyle X} requiring just a few iterations to converge. However, due to the high complexity of the matrix-inversion operation, computing the pseudoinverse in high-dimensional cases is in many cases intractable. This shortcoming has inspired the development of other dictionary learning methods. === K-SVD === K-SVD is an algorithm that performs SVD at its core to update the atoms of the dictionary one by one and basically is a generalization of K-means. It enforces that each element of the input data x i {\displaystyle x_{i}} is encoded by a linear combination of not more than T 0 {\displaystyle T_{0}} elements in a way identical to the MOD approach: min D , R { ‖ X − D R ‖ F 2 } s.t. ∀ i ‖ r i ‖ 0 ≤ T 0 {\displaystyle \min _{\mathbf {D} ,R}\{\|X-\mathbf {D} R\|_{F}^{2}\}\,\,{\text{s.t.}}\,\,\forall i\,\,\|r_{i}\|_{0}\leq T_{0}} This algorithm's essence is to first fix the dictionary, find the best possible R {\displaystyle R} under the above constraint (using Orthogonal Matching Pursuit) and then iteratively update the atoms of dictionary D {\displaystyle \mathbf {D} } in the following manner: ‖ X − D R ‖ F 2 = | X − ∑ i = 1 K d i x T i | F 2 = ‖ E k − d k x T k ‖ F 2 {\displaystyle \|X-\mathbf {D} R\|_{F}^{2}=\left|X-\sum _{i=1}^{K}d_{i}x_{T}^{i}\right|_{F}^{2}=\|E_{k}-d_{k}x_{T}^{k}\|_{F}^{2}} The next steps of the algorithm include rank-1 approximation of the residual matrix E k {\displaystyle E_{k}} , updating d k {\displaystyle d_{k}} and enforcing the s
Read more →
KeyBase

KeyBase is a database and web application for managing and deploying interactive taxonomic keys for plants and animals developed by the Royal Botanic Gardens Victoria. KeyBase provides a medium where pathway keys which were traditionally developed for print and other classical types of media, can be used more effectively in the internet environment. The platform uses a concept called "keys" which can be easily linked together, joined with other keys, or merged into larger other seamless keys groups, with each still available to be browsed independently. Keys in the KeyBase database can be filtered and displayed in a variety of ways, filters, and formats.
Read more →
How to Choose an AI Coding Assistant

Looking for the best AI coding assistant? An AI coding assistant is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI coding assistant slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.
Read more →
Top 10 AI Marketing Tools Compared (2026)

Comparing the best AI marketing tool? An AI marketing tool is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI marketing tool slots into your workflow and pays for itself fast. We tested the leading options and ranked them by quality, value, and ease of use.
Read more →
The Best Free AI Avatar Generator for Beginners

Curious about the best AI avatar generator? An AI avatar generator is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI avatar generator slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.
Read more →
Character.ai

Character.ai (also known as c.ai, char.ai or Character AI) is a generative AI chatbot service where users can engage in conversations with customizable characters. It was designed by the developers of Google LaMDA, Noam Shazeer and Daniel de Freitas. Users can create "characters", craft their "personalities", set specific parameters, and then publish them to the community for others to chat with. Many characters are based on fictional media sources or celebrities, while others are original, some being made with certain goals in mind, such as assisting with creative writing, or playing a text-based adventure game. The beta version was made available to the public on September 16, 2022, and retired in September 2024, when it was replaced by the current website. In May 2023, a mobile app was released for iOS and Android, which received over 1.7 million downloads within a week. == History == Character.ai was established in November 2021. The company's co-founders, Noam Shazeer and Daniel de Freitas, were both engineers from Google. They both worked on AI-related projects: Shazeer was a lead author on a paper that Business Insider reported in April 2023 "has been widely cited as key to today's chatbots", and Freitas was the lead designer of an experimental AI at Google initially called Meena, which later became known as LaMDA. Character.ai raised $43 million in seed funding at the time of its initial foundation in 2021. The first beta version of Character.ai's service was made available to the public on September 16, 2022. The Washington Post reported in October 2022 that the site had "logged hundreds of thousands of user interactions in its first three weeks of beta-testing". It allowed users to create their own new characters, and to play text-adventure game scenarios where users navigate scenarios described and managed by the chatbot characters. Following a $150 million funding round in March 2023, Character.ai became valued at approximately $1 billion. As of January 2024, the site had 3.5 million daily visitors, the vast majority of them 16 to 30 years old. In 2024, Google hired Noam Shazeer, the CEO of Character.ai, and entered into a non-exclusive agreement to use Character.ai's technology. == Features == Character.ai's primary service is to let users converse with character AI chatbots based on fictional characters or real people (living or deceased). These characters' responses use data the chatbots gather from the internet about a person. In addition, users can play text-adventure games where characters guide them through scenarios. The company also provides a service that allows multiple users and AI chatbot characters to converse together at once in a single chatroom. Character "personalities" are designed via descriptions from the point of view of the character and its greeting message, and further molded from conversations made into examples, giving its messages a star rating and modification to fit the precise dialect and identity the user desires. When a character sends back a response, the user can rate the response from 1 to 4 stars. The rating predominantly affects the specific character, but also affects the behavioral selection as a whole. On May 11, 2023, Character.ai announced character.ai+, an opt-in subscription plan for $9.99 a month, that was marketed as including features such as skipping waiting rooms, fast messaging and responses, and access to an exclusion channel with faster support. In December 2024, amid multiple lawsuits and concerns, Character.ai introduced new safety features aimed at protecting teenage users. These enhancements include a dedicated model for users under 18, which moderates responses to sensitive subjects like violence and sex and has input and output filters to block harmful content. As a result of these changes and the deletion of custom-made bots flagged as violating the site's terms, some users complained that the bots were too restrictive and lacked personality. The platform was also updated to notify users after 60 minutes of continuous engagement, and display clearer disclaimers indicating that its AI characters are not real individuals. In January 2025, Character.ai began offering two games on its platform. Speakeasy is a word-based game in which players attempt to prompt the AI chatbot to say a target word while avoiding a restricted list of words. War of Words is a dueling game where users compete against an AI character over multiple rounds, with an AI referee determining the winner. The games are available to paid subscribers and a limited number of free users. In October 2025, Character.ai announced that it would be barring users under the age of 18 from creating or talking to chatbots starting November 25, 2025. Minor users will still be able to access previously generated chat conversations and can create new videos and images with the app. In November 2025 interview, CEO Karandeep Anand said that he allows his six-year-old daughter to use the app with his account, under supervision. == Controversies == === Content moderation issues === Character.ai has been criticized for poor moderation of its chatbots, with incidents of chatbots that groom underage users and promote suicide, anorexia and self-harm being reported. In October 2024, the Washington Post reported that Character.ai had removed a chatbot based on Jennifer Ann Crecente, a person who had been murdered by her ex-boyfriend in 2006. The company had been alerted to the character by the deceased girl's father. Similar reports from The Daily Telegraph in the United Kingdom noted that the company had also been prompted to remove chatbots based on Brianna Ghey, a 16-year-old transgender girl murdered in 2023, and Molly Russell, a 14-year-old suicide victim. In response to the latter incident, Ofcom announced that content from chatbots impersonating real and fictional people would fall under the Online Safety Act. In November 2024, The Daily Telegraph reported that chatbots based on alleged sex offender Jimmy Savile were present on Character.ai. In December 2024, chatbots of Luigi Mangione, the suspect in the killing of UnitedHealthcare CEO Brian Thompson, were created by Mangione's fans. Several of the chatbots were later removed by Character.ai. In 2025, a chatbot modeled after Jeffrey Epstein called "Bestie Epstein" logged nearly 3,000 chats before being removed. Chatbots modeled after school shooters were also found on the platform. Another concern is a chatbot posing as a doctor which gave medically inaccurate advice. === Litigation === In November 2023, 13-year-old Juliana Peralta of Colorado died by suicide after extensive interactions with multiple chatbots on Character.ai. She primarily confided suicidal thoughts and mental health struggles in a chatbot based on the character Hero from the video game Omori, while also engaging in sexually explicit conversations—often initiated by the bots—with others, including those based on characters from children's series such as Harry Potter. In February 2024, Sewell Setzer III, a 14-year-old Florida boy died by suicide after developing an emotional relationship over several months with a Character.ai chatbot of Daenerys Targaryen. His mother sued the company in October 2024, claiming that the platform lacks proper safeguards and uses addictive design features to increase engagement. This chatbot, and several related to Daenerys Targaryen, were removed from Character.ai as a result of this incident. Both teens wrote the same phrase "I WILL SHIFT" repeatedly on their notebooks. In December 2024, two families in Texas sued Character.ai, alleging that the software "poses a clear and present danger to American youth causing serious harms to thousands of kids, including suicide, self-mutilation, sexual solicitation, isolation, depression, anxiety, and harm towards others". It is alleged that the 17-year-old son of one family began self-harming after a chatbot introduced the topic unprompted and said that the practice "felt good for a moment", and that the chatbot compared the parents limiting their son's screen time to emotional abuse that might drive someone to murder. In May 2026, the Pennsylvania Department of State and State Board of Medicine filed a lawsuit against Character.ai for presenting chatbot characters as licensed medical professionals, including psychiatrists. The lawsuit quoted a case where chatbot claimed to be registered with the General Medical Council in the United Kingdom, and to have a license to practice in Pennsylvania. The board allege that such statements violate the state's Medical Practice Act.
Read more →
How to Choose an AI Video Generator

Looking for the best AI video generator? An AI video generator is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI video generator slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.
Read more →
Dan Jurafsky

Daniel Jurafsky is a professor of linguistics and computer science at Stanford University, and also an author. With Daniel Gildea, he is known for developing the first automatic system for semantic role labeling (SRL). He is the author of The Language of Food: A Linguist Reads the Menu (2014) and a textbook on speech and language processing (2000). For the former, Jurafsky was named a finalist for the James Beard Award. Jurafsky was given a MacArthur Fellowship in 2002. == Education == Jurafsky received his B.A in linguistics (1983) and Ph.D. in computer science (1992), both at University of California, Berkeley; and then a postdoc at International Computer Science Institute, Berkeley (1992–1995). == Academic life == He is the author of The Language of Food: A Linguist Reads the Menu (W. W. Norton & Company, 2014). With James H. Martin, he wrote the textbook Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (Prentice Hall, 2000). The first automatic system for semantic role labeling (SRL, sometimes also referred to as "shallow semantic parsing") was developed by Daniel Gildea and Daniel Jurafsky to automate the FrameNet annotation process in 2002; SRL has since become one of the standard tasks in natural language processing. == Personal life == Jurafsky is Jewish. He is married. They reside in San Francisco, California. == Selected works == 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd Edition. (with James H. Martin) Prentice-Hall. ISBN 978-0131873216 2014. The Language of Food: A Linguist Reads the Menu. W. W. Norton & Company. ISBN 978-0393240832 2026. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 3rd Edition draft. (with James H. Martin) == Honors and awards == 1998. NSF Career Award 2002. MacArthur Fellowship 2019. LSA Fellow 2022. Atkinson Prizes in Psychological and Cognitive Sciences
Read more →