NovelAI is an online cloud-based, SaaS model, and a paid subscription service for AI-assisted storywriting and text-to-image synthesis, originally launched in beta on June 15, 2021, with the image generation feature being implemented later on October 3, 2022. NovelAI is owned and operated by Anlatan, which is headquartered in Wilmington, Delaware. == Features == NovelAI uses GPT-based large language models (LLMs) to generate storywriting and prose. It has several models, such as Calliope, Sigurd, Euterpe, Krake, and Genji, with Genji being a Japanese-language model. The service also offers encrypted servers and customizable editors. For AI art generation, which generates images from text prompts, NovelAI uses a custom version of the source-available Stable Diffusion text-to-image diffusion model called NovelAI Diffusion, which is trained on a Danbooru-based dataset. NovelAI is also capable of generating a new image based on an existing image. The NovelAI terms of service states that all generated content belongs to the user, regardless if the user is an individual or a corporation. Anlatan states that generated images are not stored locally on their servers. == History == On April 28, 2021, Anlatan officially launched NovelAI. On June 15, 2021, Anlatan released their finetuned GPT-Neo-2.7B model from EleutherAI named Calliope, after the Greek Muses. A day later, they released their Opus-exclusive GPT-J-6B finetuned model named Sigurd, after the Norse/Germanic hero. On March 21, 2023, Nvidia and CoreWeave announced Anlatan being one of the first CoreWeave customers to deploy NVIDIA's H100 Tensor Core GPUs for new LLM model inferencing and training. On April 1, 2023, Anlatan added ControlNet features to their text-to-image NovelAI Diffusion model. On May 16, 2023, Anlatan announced that they named their H100 cluster Shoggy, a reference to H.P. Lovecraft's Shoggoths, which was used to pre-train an undisclosed 8192 token context LLM in-house model. == Reception and controversy == Following the implementation of image generation, NovelAI became a widely-discussed topic in Japan, with some online commentators noting that its image synthesis features are very adept at producing close impressions of anime characters, including lolicon and shotacon imagery, while others have expressed concern that it is a paid service reliant on a diffusion model, while the original machine learning training data consists of images used without the consent of the original artists. Attorney Kosuke Terauchi notes that, since a revision of the law in 2018, it is no longer illegal in Japan for machine learning models to scrape copyrighted content from the internet to use as training data; meanwhile, in the United States where NovelAI is based, there is no specific legal framework which regulates machine learning, and thus the fair use doctrine of US copyright law applies instead. Danbooru has posted an official statement in regards to NovelAI's use of the site's content for AI training, expressing that Danbooru is not affiliated with NovelAI, and does not endorse nor condone NovelAI's use of artists' artworks for machine learning. FayerWayer described NovelAI as a service capable of generating hentai. Manga artist Izumi Ū commented that while the manga style art generated by NovelAI is highly accurate, there are still imperfections in the output, although he views these as human-like in a favourable light nonetheless. In response to the topic of NovelAI, Narugami, founder of the Japanese freelance artist commissioning website Skeb, stated on October 5, 2022 that the use of AI image generation is prohibited on the platform since 2018. Illustrations using NovelAI have been posted on social media and illustration posting sites, and by October 13, 2,111 works tagged with #NovelAI were posted on Pixiv. Pixiv has stated that it is not considering a complete elimination of creations that use AI, though it requires AI-generated posts to be marked as such and allows users to filter them out. == Incidents == On October 6, 2022, NovelAI experienced a data breach where its software's source code was leaked.
Curve (tonality)
In image editing, a curve is a remapping of image tonality, specified as a function from input level to output level, used as a way to emphasize colours or other elements in a picture. Curves can usually be applied to all channels together in an image, or to each channel individually. Applying a curve to all channels typically changes the brightness in part of the spectrum. Light parts of a picture can be easily made lighter and dark parts darker to increase contrast. Applying a curve to individual channels can be used to stress a colour. This is particularly efficient in the Lab colour space due to the separation of luminance and chromaticity, but it can also be used in RGB, CMYK or whatever other colour models the software supports.
Q-learning
Q-learning is a reinforcement learning algorithm that trains an agent to assign values to its possible actions based on its current state, without requiring a model of the environment (model-free). It can handle problems with stochastic transitions and rewards without requiring adaptations. For example, in a grid maze, an agent learns to reach an exit worth 10 points. At a junction, Q-learning might assign a higher value to moving right than left if right gets to the exit faster, improving this choice by trying both directions over time. For any finite Markov decision process, Q-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state. Q-learning can identify an optimal action-selection policy for any given finite Markov decision process, given infinite exploration time and a partly random policy. "Q" refers to the function that the algorithm computes: the expected reward—that is, the quality—of an action taken in a given state. == Reinforcement learning == Reinforcement learning involves an agent, a set of states S {\displaystyle {\mathcal {S}}} , and a set A {\displaystyle {\mathcal {A}}} of actions per state. By performing an action a ∈ A {\displaystyle a\in {\mathcal {A}}} , the agent transitions from state to state. Executing an action in a specific state provides the agent with a reward (a numerical score). The goal of the agent is to maximize its total reward. It does this by adding the maximum reward attainable from future states to the reward for achieving its current state, effectively influencing the current action by the potential future reward. This potential reward is a weighted sum of expected values of the rewards of all future steps starting from the current state. As an example, consider the process of boarding a train, in which the reward is measured by the negative of the total time spent boarding (alternatively, the cost of boarding the train is equal to the boarding time). One strategy is to enter the train door as soon as they open, minimizing the initial wait time for yourself. If the train is crowded, however, then you will have a slow entry after the initial action of entering the door as people are fighting you to depart the train as you attempt to board. The total boarding time, or cost, is then: 0 seconds wait time + 15 seconds fight time On the next day, by random chance (exploration), you decide to wait and let other people depart first. This initially results in a longer wait time. However, less time is spent fighting the departing passengers. Overall, this path has a higher reward than that of the previous day, since the total boarding time is now: 5 second wait time + 0 second fight time Through exploration, despite the initial (patient) action resulting in a larger cost (or negative reward) than in the forceful strategy, the overall cost is lower, thus revealing a more rewarding strategy. == Algorithm == After Δ t {\displaystyle \Delta t} steps into the future the agent will decide some next step. The weight for this step is calculated as γ Δ t {\displaystyle \gamma ^{\Delta t}} , where γ {\displaystyle \gamma } (the discount factor) is a number between 0 and 1 ( 0 ≤ γ ≤ 1 {\displaystyle 0\leq \gamma \leq 1} ). Assuming γ < 1 {\displaystyle \gamma <1} , it has the effect of valuing rewards received earlier higher than those received later (reflecting the value of a "good start"). γ {\displaystyle \gamma } may also be interpreted as the probability to succeed (or survive) at every step Δ t {\displaystyle \Delta t} . The algorithm, therefore, has a function that calculates the quality of a state–action combination: Q : S × A → R {\displaystyle Q:{\mathcal {S}}\times {\mathcal {A}}\to \mathbb {R} } . Before learning begins, Q {\displaystyle Q} is initialized to a possibly arbitrary fixed value (chosen by the programmer). Then, at each time t {\displaystyle t} the agent selects an action A t {\displaystyle A_{t}} , observes a reward R t + 1 {\displaystyle R_{t+1}} , enters a new state S t + 1 {\displaystyle S_{t+1}} (that may depend on both the previous state S t {\displaystyle S_{t}} and the selected action), and Q {\displaystyle Q} is updated. The core of the algorithm is a Bellman equation as a simple value iteration update, using the weighted average of the current value and the new information: Q n e w ( S t , A t ) ← ( 1 − α ⏟ learning rate ) ⋅ Q ( S t , A t ) ⏟ current value + α ⏟ learning rate ⋅ ( R t + 1 ⏟ reward + γ ⏟ discount factor ⋅ max a Q ( S t + 1 , a ) ⏟ estimate of optimal future value ⏟ new value (temporal difference target) ) {\displaystyle Q^{new}(S_{t},A_{t})\leftarrow (1-\underbrace {\alpha } _{\text{learning rate}})\cdot \underbrace {Q(S_{t},A_{t})} _{\text{current value}}+\underbrace {\alpha } _{\text{learning rate}}\cdot {\bigg (}\underbrace {\underbrace {R_{t+1}} _{\text{reward}}+\underbrace {\gamma } _{\text{discount factor}}\cdot \underbrace {\max _{a}Q(S_{t+1},a)} _{\text{estimate of optimal future value}}} _{\text{new value (temporal difference target)}}{\bigg )}} where R t + 1 {\displaystyle R_{t+1}} is the reward received when moving from the state S t {\displaystyle S_{t}} to the state S t + 1 {\displaystyle S_{t+1}} , and α {\displaystyle \alpha } is the learning rate ( 0 < α ≤ 1 ) {\displaystyle (0<\alpha \leq 1)} . Note that Q n e w ( S t , A t ) {\displaystyle Q^{new}(S_{t},A_{t})} is the sum of three terms: ( 1 − α ) Q ( S t , A t ) {\displaystyle (1-\alpha )Q(S_{t},A_{t})} : the current value (weighted by one minus the learning rate) α R t + 1 {\displaystyle \alpha \,R_{t+1}} : the reward R t + 1 {\displaystyle R_{t+1}} to obtain if action A t {\displaystyle A_{t}} is taken when in state S t {\displaystyle S_{t}} (weighted by learning rate) α γ max a Q ( S t + 1 , a ) {\displaystyle \alpha \gamma \max _{a}Q(S_{t+1},a)} : the maximum reward that can be obtained from state S t + 1 {\displaystyle S_{t+1}} (weighted by learning rate and discount factor) An episode of the algorithm ends when state S t + 1 {\displaystyle S_{t+1}} is a final or terminal state. However, Q-learning can also learn in non-episodic tasks (as a result of the property of convergent infinite series). If the discount factor is lower than 1, the action values are finite even if the problem can contain infinite loops or paths. For all final states s f {\displaystyle s_{f}} , Q ( s f , a ) {\displaystyle Q(s_{f},a)} is never updated, but is set to the reward value r {\displaystyle r} observed for state s f {\displaystyle s_{f}} . In most cases, Q ( s f , a ) {\displaystyle Q(s_{f},a)} can be taken to equal zero. == Influence of variables == === Learning rate === The learning rate or step size determines to what extent newly acquired information overrides old information. A factor of 0 makes the agent learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the agent consider only the most recent information (ignoring prior knowledge to explore possibilities). In fully deterministic environments, a learning rate of α t = 1 {\displaystyle \alpha _{t}=1} is optimal. When the problem is stochastic, the algorithm converges under some technical conditions on the learning rate that require it to decrease to zero. In practice, often a constant learning rate is used, such as α t = 0.1 {\displaystyle \alpha _{t}=0.1} for all t {\displaystyle t} . === Discount factor === The discount factor γ {\displaystyle \gamma } determines the importance of future rewards. A factor of 0 will make the agent "myopic" (or short-sighted) by only considering current rewards, i.e. r t {\displaystyle r_{t}} (in the update rule above), while a factor approaching 1 will make it strive for a long-term high reward. If the discount factor meets or exceeds 1, the action values may diverge. For γ = 1 {\displaystyle \gamma =1} , without a terminal state, or if the agent never reaches one, all environment histories become infinitely long, and utilities with additive, undiscounted rewards generally become infinite. Even with a discount factor only slightly lower than 1, Q-function learning leads to propagation of errors and instabilities when the value function is approximated with an artificial neural network. In that case, starting with a lower discount factor and increasing it towards its final value accelerates learning. === Initial conditions (Q0) === Since Q-learning is an iterative algorithm, it implicitly assumes an initial condition before the first update occurs. High initial values, also known as "optimistic initial conditions", can encourage exploration: no matter what action is selected, the update rule will cause it to have lower values than the other alternative, thus increasing their choice probability. The first reward r {\displaystyle r} can be used to reset the initial conditions. According to this idea, the first time an action is taken the reward is used to set the value
Elastic net regularization
In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods. Nevertheless, elastic net regularization is typically more accurate than both methods with regard to reconstruction. == Specification == The elastic net method overcomes the limitations of the LASSO (least absolute shrinkage and selection operator) method which uses a penalty function based on ‖ β ‖ 1 = ∑ j = 1 p | β j | . {\displaystyle \|\beta \|_{1}=\textstyle \sum _{j=1}^{p}|\beta _{j}|.} Use of this penalty function has several limitations. For example, in the "large p, small n" case (high-dimensional data with few examples), the LASSO selects at most n variables before it saturates. Also if there is a group of highly correlated variables, then the LASSO tends to select one variable from a group and ignore the others. To overcome these limitations, the elastic net adds a quadratic part ( ‖ β ‖ 2 {\displaystyle \|\beta \|^{2}} ) to the penalty, which when used alone is ridge regression (known also as Tikhonov regularization). The estimates from the elastic net method are defined by β ^ ≡ argmin β ( ‖ y − X β ‖ 2 + λ 2 ‖ β ‖ 2 + λ 1 ‖ β ‖ 1 ) . {\displaystyle {\hat {\beta }}\equiv {\underset {\beta }{\operatorname {argmin} }}(\|y-X\beta \|^{2}+\lambda _{2}\|\beta \|^{2}+\lambda _{1}\|\beta \|_{1}).} The quadratic penalty term makes the loss function strongly convex, and it therefore has a unique minimum. The elastic net method includes the LASSO and ridge regression: in other words, each of them is a special case where λ 1 = λ , λ 2 = 0 {\displaystyle \lambda _{1}=\lambda ,\lambda _{2}=0} or λ 1 = 0 , λ 2 = λ {\displaystyle \lambda _{1}=0,\lambda _{2}=\lambda } . Meanwhile, the naive version of elastic net method finds an estimator in a two-stage procedure : first for each fixed λ 2 {\displaystyle \lambda _{2}} it finds the ridge regression coefficients, and then does a LASSO type shrinkage. This kind of estimation incurs a double amount of shrinkage, which leads to increased bias and poor predictions. To improve the prediction performance, sometimes the coefficients of the naive version of elastic net is rescaled by multiplying the estimated coefficients by ( 1 + λ 2 ) {\displaystyle (1+\lambda _{2})} . Examples of where the elastic net method has been applied are: Support vector machine Metric learning Portfolio optimization Cancer prognosis == Reduction to support vector machine == It was proven in 2014 that the elastic net can be reduced to the linear support vector machine. A similar reduction was previously proven for the LASSO in 2014. The authors showed that for every instance of the elastic net, an artificial binary classification problem can be constructed such that the hyper-plane solution of a linear support vector machine (SVM) is identical to the solution β {\displaystyle \beta } (after re-scaling). The reduction immediately enables the use of highly optimized SVM solvers for elastic net problems. It also enables the use of GPU acceleration, which is often already used for large-scale SVM solvers. The reduction is a simple transformation of the original data and regularization constants X ∈ R n × p , y ∈ R n , λ 1 ≥ 0 , λ 2 ≥ 0 {\displaystyle X\in {\mathbb {R} }^{n\times p},y\in {\mathbb {R} }^{n},\lambda _{1}\geq 0,\lambda _{2}\geq 0} into new artificial data instances and a regularization constant that specify a binary classification problem and the SVM regularization constant X 2 ∈ R 2 p × n , y 2 ∈ { − 1 , 1 } 2 p , C ≥ 0. {\displaystyle X_{2}\in {\mathbb {R} }^{2p\times n},y_{2}\in \{-1,1\}^{2p},C\geq 0.} Here, y 2 {\displaystyle y_{2}} consists of binary labels − 1 , 1 {\displaystyle {-1,1}} . When 2 p > n {\displaystyle 2p>n} it is typically faster to solve the linear SVM in the primal, whereas otherwise the dual formulation is faster. Some authors have referred to the transformation as Support Vector Elastic Net (SVEN), and provided the following MATLAB pseudo-code: == Software == "Glmnet: Lasso and elastic-net regularized generalized linear models" is a software which is implemented as an R source package and as a MATLAB toolbox. This includes fast algorithms for estimation of generalized linear models with ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two penalties (the elastic net) using cyclical coordinate descent, computed along a regularization path. JMP Pro 11 includes elastic net regularization, using the Generalized Regression personality with Fit Model. "pensim: Simulation of high-dimensional data and parallelized repeated penalized regression" implements an alternate, parallelised "2D" tuning method of the ℓ parameters, a method claimed to result in improved prediction accuracy. scikit-learn includes linear regression and logistic regression with elastic net regularization. SVEN, a Matlab implementation of Support Vector Elastic Net. This solver reduces the Elastic Net problem to an instance of SVM binary classification and uses a Matlab SVM solver to find the solution. Because SVM is easily parallelizable, the code can be faster than Glmnet on modern hardware. SpaSM, a Matlab implementation of sparse regression, classification and principal component analysis, including elastic net regularized regression. Apache Spark provides support for Elastic Net Regression in its MLlib machine learning library. The method is available as a parameter of the more general LinearRegression class. SAS (software) The SAS procedure Glmselect and SAS Viya procedure Regselect support the use of elastic net regularization for model selection.
Dynamic time warping
In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications. In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules: Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match) The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match) The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if j > i {\displaystyle j>i} are indices from the first sequence, then there must not be two indices l > k {\displaystyle l>k} in the other sequence, such that index i {\displaystyle i} is matched with index l {\displaystyle l} and index j {\displaystyle j} is matched with index k {\displaystyle k} , and vice versa We can plot each match between the sequences 1 : M {\displaystyle 1:M} and 1 : N {\displaystyle 1:N} as a path in a M × N {\displaystyle M\times N} matrix from ( 1 , 1 ) {\displaystyle (1,1)} to ( M , N ) {\displaystyle (M,N)} , such that each step is one of ( 0 , 1 ) , ( 1 , 0 ) , ( 1 , 1 ) {\displaystyle (0,1),(1,0),(1,1)} . In this formulation, we see that the number of possible matches is the Delannoy number. The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values. The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the triangle inequality to hold. In addition to a similarity measure between the two sequences (a so called "warping path" is produced), by warping according to this path the two signals may be aligned in time. The signal with an original set of points X(original), Y(original) is transformed to X(warped), Y(warped). This finds applications in genetic sequence and audio synchronisation. In a related technique sequences of varying speed may be averaged using this technique see the average sequence section. This is conceptually very similar to the Needleman–Wunsch algorithm. == Implementation == This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and y, d ( x , y ) {\displaystyle d(x,y)} is a distance between the symbols, e.g., d ( x , y ) = | x − y | {\displaystyle d(x,y)=|x-y|} . int DTWDistance(s: array [1..n], t: array [1..m]) { DTW := array [0..n, 0..m] for i := 0 to n for j := 0 to m DTW[i, j] := infinity DTW[0, 0] := 0 for i := 1 to n for j := 1 to m cost := d(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion DTW[i , j-1], // deletion DTW[i-1, j-1]) // match return DTW[n, m] } where DTW[i, j] is the distance between s[1:i] and t[1:j] with the best alignment. We sometimes want to add a locality constraint. That is, we require that if s[i] is matched with t[j], then | i − j | {\displaystyle |i-j|} is no larger than w, a window parameter. We can easily modify the above algorithm to add a locality constraint (differences marked). However, the above given modification works only if | n − m | {\displaystyle |n-m|} is no larger than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that | n − m | ≤ w {\displaystyle |n-m|\leq w} (see the line marked with () in the code). int DTWDistance(s: array [1..n], t: array [1..m], w: int) { DTW := array [0..n, 0..m] w := max(w, abs(n-m)) // adapt window size () for i := 0 to n for j:= 0 to m DTW[i, j] := infinity DTW[0, 0] := 0 for i := 1 to n for j := max(1, i-w) to min(m, i+w) DTW[i, j] := 0 for i := 1 to n for j := max(1, i-w) to min(m, i+w) cost := d(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion DTW[i , j-1], // deletion DTW[i-1, j-1]) // match return DTW[n, m] } == Warping properties == The DTW algorithm produces a discrete matching between existing elements of one series to another. In other words, it does not allow time-scaling of segments within the sequence. Other methods allow continuous warping. For example, Correlation Optimized Warping (COW) divides the sequence into uniform segments that are scaled in time using linear interpolation, to produce the best matching warping. The segment scaling causes potential creation of new elements, by time-scaling segments either down or up, and thus produces a more sensitive warping than DTW's discrete matching of raw elements. == Complexity == The time complexity of the DTW algorithm is O ( N M ) {\displaystyle O(NM)} , where N {\displaystyle N} and M {\displaystyle M} are the lengths of the two input sequences. The 50 years old quadratic time bound was broken in 2016: an algorithm due to Gold and Sharir enables computing DTW in O ( N 2 / log log N ) {\displaystyle O({N^{2}}/\log \log N)} time and space for two input sequences of length N {\displaystyle N} . This algorithm can also be adapted to sequences of different lengths. Despite this improvement, it was shown that a strongly subquadratic running time of the form O ( N 2 − ϵ ) {\displaystyle O(N^{2-\epsilon })} for some ϵ > 0 {\displaystyle \epsilon >0} cannot exist unless the Strong exponential time hypothesis fails. While the dynamic programming algorithm for DTW requires O ( N M ) {\displaystyle O(NM)} space in a naive implementation, the space consumption can be reduced to O ( min ( N , M ) ) {\displaystyle O(\min(N,M))} using Hirschberg's algorithm. == Fast computation == Fast techniques for computing DTW include PrunedDTW, SparseDTW, FastDTW, and the MultiscaleDTW. A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh, LB_Improved, or LB_Petitjean. However, the Early Abandon and Pruned DTW algorithm reduces the degree of acceleration that lower bounding provides and sometimes renders it ineffective. In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient. Subsequent to this survey, the LB_Enhanced bound was developed that is always tighter than LB_Keogh while also being more efficient to compute. LB_Petitjean is the tightest known lower bound that can be computed in linear time. == Average sequence == Averaging for dynamic time warping is the problem of finding an average sequence for a set of sequences. NLAAF is an exact method to average two sequences using DTW. For more than two sequences, the problem is related to that of multiple alignment and requires heuristics. DBA is currently a reference method to average a set of sequences consistently with DTW. COMASA efficiently randomizes the search for the average sequence, using DBA as a local optimization process. == Supervised learning == A nearest-neighbour classifier can achieve state-of-the-art performance when using dynamic time warping as a distance measure. == Amerced Dynamic Time Warping == Amerced Dynamic Time Warping (ADTW) is a variant of DTW designed to better control DTW's permissiveness in the alignments that it allows. The windows that classical DTW uses to constrain alignments introduce a step function. Any warping of the path is allowed within the window and none beyond it. In contrast, ADTW employs an additive penalty that is incurred each time that the path is warped. Any amount of warping is allowed, but each warping action incurs a direct penalty. ADTW significantly outperforms DTW with windowing when applied as a nearest neighbor classifier on a set of benchmark time series classification tasks. == Alternative approaches == In functional data analysis, time series are regarde
Stripe, Inc.
Stripe, Inc. is an Irish and American multinational financial services and software as a service (SaaS) company dual-headquartered in South San Francisco, California, United States, and Dublin, Ireland. The company primarily offers payment-processing software and application programming interfaces for e-commerce websites and mobile applications. Stripe is the largest privately owned financial technology company with a valuation of about $159 billion and over $1.9 trillion in payment volume processed in 2025, processing transactions for 5 million businesses in that year. == History == Irish entrepreneur brothers John and Patrick Collison founded Stripe in Palo Alto, California, in 2010, and serve as the company's president and CEO, respectively. In 2011 the company received a $2 million investment, including contributions from Elon Musk, PayPal founder Peter Thiel, Irish entrepreneur Liam Casey, and venture capital firms Sequoia Capital, Andreessen Horowitz, and SV Angel. In March 2013, Stripe made its first acquisition, Kickoff, a chat and task-management application. In 2012 the company moved from Palo Alto to San Francisco. In October 2019, the company announced that it would be moving from the South of Market area to Oyster Point in the neighbouring city of South San Francisco in 2021. In February 2021, Mark Carney, former governor of the Bank of Canada and of the Bank of England, was appointed to the company's board. Carney stepped down from his role with the company in 2025 in order to run for the leadership of the Liberal Party. Stripe acquired accountancy platform Recko in October 2021 whose solution was to be added to Stripe's existing suite of financial tools. In January 2022, Stripe entered a five-year partnership with Ford Motor Company. Through the deal, Stripe would handle transactions for consumer vehicle orders and reservations. That same month, Stripe partnered with Spotify to help the company monetize subscriptions. In April 2022, Twitter announced that it would partner with Stripe, Inc. (digital payments processor) for piloting cryptocurrency pay-outs for limited users in the platform. In April 2022, Stripe announced its strategic partnership with UK-based financial technology company ION. The Wall Street Journal reported in July 2022 that the company's internal share price had fallen, causing its implied valuation to drop from $95 billion to $74 billion. In November 2022, the company announced it intended to initiate layoffs, terminating some 14% of its workforce. Throughout 2022 and 2023, the company announced a number of large enterprise customers, including Airbnb, Amazon, Microsoft, Uber, BMW, Maersk, Zara, Lotus, Alaska Airlines, Le Monde, and Toyota. The company also announced in March 2023 that OpenAI is working with Stripe to commercialize its generative AI technology. In January 2025, Stripe sent layoff notices to nearly 300 workers, primarily affecting roles in Product, Operations and Engineering. The company experienced controversy when the company sent a cartoon picture of a duck to the laid-off employees. Stripe's Chief People Officer Rob McIntosh later apologized for the mistake. After re-enabling cryptocurrency pay-ins in April 2024, starting with USDC, Stripe completed the acquisition of Bridge in February 2025. The acquisition of the two-year-old stablecoin platform company is valued at $1.1 billion. In June 2025, the company acquired Privy, which powers crypto wallets. In September 2025, Stripe announced it was powering Instant Checkout in ChatGPT and released Agentic Commerce Protocol for agentic commerce, which was co-developed with OpenAI. In October 2025, the company opened its second headquarters in Dublin, Ireland. In February 2026, Stripe was valued at $159 billion in a tender offer posted for employees and shareholders. The tender offer was about a 70% increase from Stripe's previous valuation published in February 2025, where it was valued at $91.5 billion. Stripe also announced that its total volume increased to $1.9 trillion USD in 2025, a 34% increase from 2024. == Technology company == === Payment processing === Stripe provides application programming interfaces that web developers can use to integrate payment processing into their websites and mobile applications. The company introduced Stripe Connect in 2012, a multiparty payments solution that lets software developers embed payments natively into their products. In April 2018, Stripe released antifraud tools, branded "Radar", that block fraudulent transactions. The same year, it expanded its services to include a billing product for online businesses, allowing businesses to manage subscription recurring revenue and invoicing. Stripe's point-of-sale service called Terminal was made available to US users on 11 June 2019. Terminal had previously been invitation-only. Terminal is currently available in Australia, Canada, France, Germany, Ireland, the Netherlands, New Zealand, Singapore, and the United Kingdom. The service offers physical credit-card readers designed to work with Stripe. On 5 September 2019, Stripe launched a merchant cash-advance scheme called Stripe Capital. The scheme allows Stripe merchants to request an advance on future payments they expect to process through their Stripe merchant account. In June 2021, the company launched Stripe Tax, a service to allow businesses to automatically calculate and collect sales tax, VAT, and GST, initially rolling out to 30 countries and all US states. As of 2025, it has been made available in 102 countries. In May that year, Stripe introduced Payment Links, a no-code product allowing businesses to create a link to a checkout page and begin accepting payments on social platforms or direct channels. In January 2022, Stripe agreed to acquire Terminal manufacturing partner BBPOS, allowing the company to bring the hardware development of Terminal readers in-house. In February, it was announced as Apple's first partner on in-person Tap to Pay, which enables businesses to accept contactless payments using an iPhone and a partner-enabled iOS app. In May, Stripe announced Data Pipeline, a tool for Stripe users who store data with Amazon Redshift or Snowflake Data Cloud. Data Pipeline syncs Stripe data and reports with Amazon Redshift or Snowflake Data Cloud, where they can be queried in combination with other business information. That month, the company also introduced Stripe Financial Connections, enabling businesses to establish direct connections with their customers’ bank accounts to verify accounts for payments and pay-outs, check balances to reduce payment failures, and cut fraud by confirming bank account ownership. In September 2023, Stripe announced that its optimized checkout suite allowed businesses to offer their customers more than 100 payment methods. In May 2025, Stripe announced a new AI foundational model for payments, and introduced stablecoin powered accounts. === Corporate finance === In July 2018, Stripe introduced Stripe Issuing, a product that allows online businesses and platforms to create their own physical and digital credit and debit cards. === Atlas === On 14 February 2016, the company launched the Atlas platform to help start-ups register as US corporations, targeting foreign entrepreneurs. The platform was originally invitation-only. In March 2016, Cuba was added to the list of countries covered under the program. Originally, companies registered using Atlas were set up as Delaware-based C corporations. As of 30 April 2018, the option to be registered as limited liability companies was added. Companies set up using Atlas automatically had a business bank account and Stripe merchant account set up. === Link === In May 2021, Stripe launched Link, a service for saving and auto-filling payment details when paying via Stripe. The service supported payments in over 185 countries and Stripe reported plans to make it available to platform businesses through its API. In September 2025, Patrick Collison announced that Link had surpassed 200 million users. === Other === In 2018, Stripe started a publishing company named Stripe Press to promote ideas that support businesses. In 2019, Stripe began offering loans and credit cards to businesses in the United States. The company stated that loans are approved automatically using machine-learning models, with no human intervention. The following year, the company introduced Stripe Treasury, which provides its platform users APIs to embed financial services, allowing their customers to send, receive, and store funds. In October 2020, Stripe announced Stripe Climate, a service for businesses to fund atmospheric carbon research and capture. In 2022, Stripe started a new subsidiary called Frontier that would direct spending on carbon removal. It announced $925 million in funding from major Silicon Valley companies to fund start up companies performing carbon capture to kick-start the industry. Stripe Identity, launched in Ju
Latent class model
In statistics, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class model because the class to which each data point belongs is unobserved (or latent). Latent class analysis (LCA) is a subset of structural equation modeling used to find groups or subtypes of cases in multivariate categorical data. These groups or subtypes of cases are called "latent classes". When faced with the following situation, a researcher might opt to use LCA to better understand the data: Symptoms a, b, c, and d have been recorded in a variety of patients diagnosed with diseases X, Y, and Z. Disease X is associated with symptoms a, b, and c; disease Y is linked to symptoms b, c, and d; and disease Z is connected to symptoms a, c, and d. In this context, the LCA would attempt to detect the presence of latent classes (i.e., the disease entities), thus creating patterns of association in the symptoms. As in factor analysis, LCA can also be used to classify cases according to their maximum likelihood class membership probability. The key criterion for resolving the LCA is identifying latent classes in which the observed symptom associations are effectively rendered null. This is because within each class, the diseases responsible for the symptoms create a structure of dependencies. As a result, the symptoms become conditionally independent, meaning that, given the class a case belongs to, the symptoms are no longer related to one another. == Model == Within each latent class, the observed variables are statistically independent—an essential aspect of latent class modeling. Usually, the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes, variables are independent (local independence). Therefore, the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987). In one form, the LCM is written as p i 1 , i 2 , … , i N ≈ ∑ t T p t ∏ n N p i n , t n , {\displaystyle p_{i_{1},i_{2},\ldots ,i_{N}}\approx \sum _{t}^{T}p_{t}\,\prod _{n}^{N}p_{i_{n},t}^{n},} where T {\displaystyle T} is the number of latent classes and p t {\displaystyle p_{t}} are the so-called recruitment or unconditional probabilities that should sum to one. p i n , t n {\displaystyle p_{i_{n},t}^{n}} are the marginal or conditional probabilities. For a two-way latent class model, the form is p i j ≈ ∑ t T p t p i t p j t . {\displaystyle p_{ij}\approx \sum _{t}^{T}p_{t}\,p_{it}\,p_{jt}.} This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization. The probability model used in LCA is closely related to the Naive Bayes classifier. The main difference is that in LCA, the class membership of an individual is a latent variable, whereas in Naive Bayes classifiers, the class membership is an observed label. == Related methods == There are a number of methods with distinct names and uses that share a common relationship. Cluster analysis is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data and assumes that such data arise from a mixture of distributions, such as a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution, it is termed latent profile analysis. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension, allocating members to classes based on that dimension. An example would be assigning cases to social classes based on ability or merit. In a practical instance, the variables could be multiple choice items of a political questionnaire. In this case, the data consists of an N-way contingency table with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion, and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance that certain answers are chosen. == Application == LCA may be used in many fields, such as: collaborative filtering, Behavior Genetics and Evaluation of diagnostic tests.