AI Detector Similar To Turnitin

AI Detector Similar To Turnitin — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Google Messages

    Google Messages

    Google Messages (formerly known as Messenger, Android Messages, and Messages by Google) is a text messaging software application developed by Google for its Android and Wear OS mobile operating systems. It is also available as a web app. Google's official universal messaging platform for the Android ecosystem, Messages employs SMS, MMS, and Rich Communication Services (RCS). Starting in 2023, Google has RCS activated by default on participating Android devices, similar to the implementation of iMessage on Apple devices. Samsung Messages will be discontinued on July 6th 2026, with Samsung transitioning users to Google Messages as the default messaging application. == History == The original code for Android SMS messaging was released in 2009 integrated into the operating system. It was released as a standalone application independent of Android with the release of Android 5.0 Lollipop in 2014, replacing Google Hangouts as the default SMS app on Google's Nexus line of phones. In 2018, Messages adopted RCS messages and evolved to send larger data files, sync with other apps, and even create mass messages. This was in preparation for when Google launched Messages for web. In December 2019, Google began to introduce support for Rich Communication Services (RCS) messaging via an RCS service hosted by Google, referred to in the user interface as "chat features". This was followed by a wider global rollout throughout 2020. The app surpassed 1 billion installs in April 2020, doubling its number of installs in less than a year. Initially, RCS did not support end-to-end encryption. In June 2021, Google introduced end-to-end encryption in Messages by default using the Signal Protocol, for all one-to-one RCS-based conversations, for all RCS group chats in December 2022 for beta users, and for all RCS users by August 2023, as well as enabling RCS for all users by default to encourage encryption. In July 2023, Google announced it would build the Message Layer Security (MLS) end-to-end encryption protocol into Google Messages. Beginning with the Samsung Galaxy S21, Messages replaces Samsung's in-house Messages app as the default text messaging app for One UI for some regions and carriers. In April 2021, the app began to receive UI modifications on Samsung devices to follow aspects of One UI, including pushing the top of the message list towards the middle of the screen to improve ergonomics. In February 2023, Google began to replace references to "chat features" in the Messages user interface with "RCS". In August 2023, Google announced that Messages will use RCS by default for all users unless they opt out, to allow them to benefit from secure messaging. In December 2023, with the arrival of several new features, the app was renamed "Google Messages". In July 2024, Samsung announced it would no longer pre-install Samsung Messages on its Galaxy devices in some regions, starting with the Galaxy Z Fold 6 and Flip, favoring Google Messages instead. In April 2026, Samsung announced that Samsung Messages would be discontinued in July 2026. It encouraged users to switch to Google Messages. == Features == Some of the most important features in Google Messages are: Send instant text and voice messages in 1:1 or group chat conversations over mobile data and Wi-Fi, via Android, Wear OS or the web. End-to-end encryption for RCS chats. Typing, sent, delivered and read status Reply and react to specific messages Share files and high-resolution photos Voice message transcriptions Schedule messages In-app reminders for birthdays and messages you didn't respond to after some time with Nudges Tight integration with the Google ecosystem, e.g. Google Calendar, Meet, Maps, YouTube, Photos, Contacts, Assistant, Search, Safe Browsing etc. Web interface: Users can visit https://messages.google.com/web and either sign in with their Google account or scan the QR code that is shown with their smartphone to access a limited web version of the app that allows them to send and receive messages, provided the smartphone remains connected. Phone number recognition: The app shows the country and province of the caller. Additionally, it can show the company's name or a warning for spam calls if the number is registered in a data base. Access to the Gemini chatbot on select Pixel, Galaxy and Android devices.

    Read more →
  • Cross-validation (statistics)

    Cross-validation (statistics)

    Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance. == Motivation == Assume a model with one or more unknown parameters, and a data set to which the model can be fit (the training data set). The fitting process optimizes the model parameters to make the model fit the training data as well as possible. If an independent sample of validation data is taken from the same population as the training data, it will generally turn out that the model does not fit the validation data as well as it fits the training data. The size of this difference is likely to be large especially when the size of the training data set is small, or when the number of parameters in the model is large. Cross-validation is a way to estimate the size of this effect. === Example: linear regression === In linear regression, there exist real response values y 1 , … , y n {\textstyle y_{1},\ldots ,y_{n}} , and n p-dimensional vector covariates x1, ..., xn. The components of the vector xi are denoted xi1, ..., xip. If least squares is used to fit a function in the form of a hyperplane ŷ = a + βTx to the data (xi, yi) 1 ≤ i ≤ n, then the fit can be assessed using the mean squared error (MSE). The MSE for given estimated parameter values a and β on the training set (xi, yi) 1 ≤ i ≤ n is defined as: MSE = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 = 1 n ∑ i = 1 n ( y i − a − β T x i ) 2 = 1 n ∑ i = 1 n ( y i − a − β 1 x i 1 − ⋯ − β p x i p ) 2 {\displaystyle {\begin{aligned}{\text{MSE}}&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-{\boldsymbol {\beta }}^{T}\mathbf {x} _{i})^{2}\\&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})^{2}\end{aligned}}} If the model is correctly specified, it can be shown under mild assumptions that the expected value of the MSE for the training set is (n − p − 1)/(n + p + 1) < 1 times the expected value of the MSE for the validation set (the expected value is taken over the distribution of training sets). Thus, a fitted model and computed MSE on the training set will result in an optimistically biased assessment of how well the model will fit an independent data set. This biased estimate is called the in-sample estimate of the fit, whereas the cross-validation estimate is an out-of-sample estimate. Since in linear regression it is possible to directly compute the factor (n − p − 1)/(n + p + 1) by which the training MSE underestimates the validation MSE under the assumption that the model specification is valid, cross-validation can be used for checking whether the model has been overfitted, in which case the MSE in the validation set will substantially exceed its anticipated value. (Cross-validation in the context of linear regression is also useful in that it can be used to select an optimally regularized cost function.) === General case === In most other regression procedures (e.g. logistic regression), there is no simple formula to compute the expected out-of-sample fit. Cross-validation is, thus, a generally applicable way to predict the performance of a model on unavailable data using numerical computation in place of theoretical analysis. == Types == Two types of cross-validation can be distinguished: exhaustive and non-exhaustive cross-validation. === Exhaustive cross-validation === Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set. ==== Leave-p-out cross-validation ==== Leave-p-out cross-validation (LpO CV) involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set. LpO cross-validation require training and validating the model C p n {\displaystyle C_{p}^{n}} times, where n is the number of observations in the original sample, and where C p n {\displaystyle C_{p}^{n}} is the binomial coefficient. For p > 1 and for even moderately large n, LpO CV can become computationally infeasible. For example, with n = 100 and p = 30, C 30 100 ≈ 3 × 10 25 . {\displaystyle C_{30}^{100}\approx 3\times 10^{25}.} A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under ROC curve of binary classifiers. ==== Leave-one-out cross-validation ==== Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1. The process looks similar to jackknife; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only. LOO cross-validation requires less computation time than LpO cross-validation because there are only C 1 n = n {\displaystyle C_{1}^{n}=n} passes rather than C p n {\displaystyle C_{p}^{n}} . However, n {\displaystyle n} passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate. Pseudo-code algorithm: Input: x, {vector of length N with x-values of incoming points} y, {vector of length N with y-values of the expected result} interpolate( x_in, y_in, x_out ), { returns the estimation for point x_out after the model is trained with x_in-y_in pairs} Output: err, {estimate for the prediction error} Steps: err ← 0 for i ← 1, ..., N do // define the cross-validation subsets x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N]) y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N]) x_out ← x[i] y_out ← interpolate(x_in, y_in, x_out) err ← err + (y[i] − y_out)^2 end for err ← err/N === Non-exhaustive cross-validation === Non-exhaustive cross validation methods do not compute all ways of splitting the original sample. These methods are approximations of leave-p-out cross-validation. ==== k-fold cross-validation ==== In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples, often referred to as "folds". Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used, but in general k remains an unfixed parameter. For example, setting k = 2 results in 2-fold cross-validation. In 2-fold cross-validation, the dataset is randomly shuffled into two sets d0 and d1, so that both sets are equal size (this is usually implemented by shuffling the data array and then splitting it in two). We then train on d0 and validate on d1, followed by training on d1 and validating on d0. When k = n (the number of observations), k-fold cross-validation is equivalent to leave-one-out cr

    Read more →
  • Learning to rank

    Learning to rank

    Learning to rank (LTR) or machine-learned ranking (MLR) is the application of machine learning, often supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval and recommender systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data. == Applications == === In information retrieval === Ranking is a central part of many information retrieval problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising. A possible architecture of a machine-learned search engine is shown in the accompanying figure. Training data consists of queries and documents matching them together with the relevance degree of each match. It may be prepared manually by human assessors (or raters, as Google calls them), who check results for some queries and determine relevance of each result. It is not feasible to check the relevance of all documents, and so typically a technique called pooling is used — only the top few documents, retrieved by some existing ranking models are checked. This technique may introduce selection bias. Alternatively, training data may be derived automatically by analyzing clickthrough logs (i.e. search results which got clicks from users), query chains, or such search engines' features as Google's (since-replaced) SearchWiki. Clickthrough logs can be biased by the tendency of users to click on the top search results on the assumption that they are already well-ranked. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. Typically, users expect a search query to complete in a short time (such as a few hundred milliseconds for web search), which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the vector space model, Boolean model, weighted AND, or BM25. This phase is called top- k {\displaystyle k} document retrieval and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes. In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents. === In other areas === Learning to rank algorithms have been applied in areas other than information retrieval: In machine translation for ranking a set of hypothesized translations; In computational biology for ranking candidate 3-D structures in protein structure prediction problems; In recommender systems for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article. == Feature vectors == For the convenience of MLR algorithms, query-document pairs are usually represented by numerical vectors, which are called feature vectors. Such an approach is sometimes called bag of features and is analogous to the bag of words model and vector space model used in information retrieval for representation of documents. Components of such vectors are called features, factors or ranking signals. They may be divided into three groups (features from document retrieval are shown as examples): Query-independent or static features — those features, which depend only on the document, but not on the query. For example, PageRank or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's static quality score (or static rank), which is often used to speed up search query evaluation. Query-dependent or dynamic features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions. Query-level features or query features, which depend only on the query. For example, the number of words in a query. Some examples of features, which were used in the well-known LETOR dataset: TF, TF-IDF, BM25, and language modeling scores of document's zones (title, body, anchors text, URL) for a given query; Lengths and IDF sums of document's zones; Document's PageRank, HITS ranks and their variants. Selecting and designing good features is an important area in machine learning, which is called feature engineering. == Evaluation measures == There are several measures (metrics) which are commonly used to judge how well an algorithm is doing on training data and to compare the performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics. Examples of ranking quality measures: Mean average precision (MAP); DCG and NDCG; Precision@n, NDCG@n, where "@n" denotes that the metrics are evaluated only on top n documents; Mean reciprocal rank; Kendall's tau; Spearman's rho. DCG and its normalized variant NDCG are usually preferred in academic research when multiple levels of relevance are used. Other metrics such as MAP, MRR and precision, are defined only for binary judgments. Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric: Expected reciprocal rank (ERR); Yandex's pfound. Both of these metrics are based on the assumption that the user is more likely to stop looking at search results after examining a more relevant document, than after a less relevant document. == Approaches == Learning to Rank approaches are often categorized using one of three approaches: pointwise (where individual documents are ranked), pairwise (where pairs of documents are ranked into a relative order), and listwise (where an entire list of documents are ordered). Tie-Yan Liu of Microsoft Research Asia has analyzed existing algorithms for learning to rank problems in his book Learning to Rank for Information Retrieval. He categorized them into three groups by their input spaces, output spaces, hypothesis spaces (the core function of the model) and loss functions: the pointwise, pairwise, and listwise approach. In practice, listwise approaches often outperform pairwise approaches and pointwise approaches. This statement was further supported by a large scale experiment on the performance of different learning-to-rank methods on a large collection of benchmark data sets. In this section, without further notice, x {\displaystyle x} denotes an object to be evaluated, for example, a document or an image, f ( x ) {\displaystyle f(x)} denotes a single-value hypothesis, h ( ⋅ ) {\displaystyle h(\cdot )} denotes a bi-variate or multi-variate function and L ( ⋅ ) {\displaystyle L(\cdot )} denotes the loss function. === Pointwise approach === In this case, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score. Formally speaking, the pointwise approach aims at learning a function f ( x ) {\displaystyle f(x)} predicting the real-value or ordinal score of a document x {\displaystyle x} using the loss function L ( f ; x j , y j ) {\displaystyle L(f;x_{j},y_{j})} . A number of existing supervised machine learning algorithms can be readily used for this purpose. Ordinal regression and classification algorithms can also be used in pointwise approach when they are used to predict the score of a single query-document pair, and it takes a small, finite number of values. === Pairwise approach === In this case, the learning-to-rank problem is approximated by a classification problem — learning a binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} that can tell which document is better in a given pair of documents. The classifier shall take two documents as its input and the goal is to minimize a loss function L ( h ; x u , x v , y u , v ) {\displaystyle L(h;x_{u},x_{v},y_{u,v})} . The loss function typically reflects the number and magnitude of inversions in the induced ranking. In many cases, the binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} is implemented with a scoring function f ( x ) {\displaystyle f(x)} . As an example, RankNet adapts a probability model and defines h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} as the estimated probability of the document x u {\displaystyle x_{u}} has higher quality than x v {\displaystyle x_{v}} : P u , v ( f ) = CDF ( f ( x u ) − f ( x v ) ) , {\displaystyle P_{u,v}(f)={\text{CDF}

    Read more →
  • Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference

    Solomonoff's theory of inductive inference proves that, under its common sense assumptions (axioms), the best possible scientific model is the shortest algorithm that generates the empirical data under consideration. In addition to the choice of data, other assumptions are that, to avoid the post-hoc fallacy, the programming language must be chosen prior to the data and that the environment being observed is generated by an unknown algorithm. This is also called a theory of induction. Due to its basis in the dynamical (state-space model) character of Algorithmic Information Theory, it encompasses statistical as well as dynamical information criteria for model selection. It was introduced by Ray Solomonoff, based on probability theory and theoretical computer science. In essence, Solomonoff's induction derives the posterior probability of any computable theory, given a sequence of observed data. This posterior probability is derived from Bayes' rule and some universal prior, that is, a prior that assigns a positive probability to any computable theory. Solomonoff proved that this induction is incomputable (or more precisely, lower semi-computable), but noted that "this incomputability is of a very benign kind", and that it "in no way inhibits its use for practical prediction" (as it can be approximated from below more accurately with more computational resources). It is only "incomputable" in the benign sense that no scientific consensus is able to prove that the best current scientific theory is the best of all possible theories. However, Solomonoff's theory does provide an objective criterion for deciding among the current scientific theories explaining a given set of observations. Solomonoff's induction naturally formalizes Occam's razor by assigning larger prior credences to theories that require a shorter algorithmic description. == Origin == === Philosophical === The theory is based in philosophical foundations, and was founded by Ray Solomonoff around 1960. It is a mathematically formalized combination of Occam's razor and the Principle of Multiple Explanations. All computable theories which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Marcus Hutter's universal artificial intelligence builds upon this to calculate the expected value of an action. === Principle === Solomonoff's induction has been argued to be the computational formalization of pure Bayesianism. To understand, recall that Bayesianism derives the posterior probability P [ T | D ] {\displaystyle \mathbb {P} [T|D]} of a theory T {\displaystyle T} given data D {\displaystyle D} by applying Bayes rule, which yields P [ T | D ] = P [ D | T ] P [ T ] P [ D | T ] P [ T ] + ∑ A ≠ T P [ D | A ] P [ A ] {\displaystyle \mathbb {P} [T|D]={\frac {\mathbb {P} [D|T]\mathbb {P} [T]}{\mathbb {P} [D|T]\mathbb {P} [T]+\sum _{A\neq T}\mathbb {P} [D|A]\mathbb {P} [A]}}} where theories A {\displaystyle A} are alternatives to theory T {\displaystyle T} . For this equation to make sense, the quantities P [ D | T ] {\displaystyle \mathbb {P} [D|T]} and P [ D | A ] {\displaystyle \mathbb {P} [D|A]} must be well-defined for all theories T {\displaystyle T} and A {\displaystyle A} . In other words, any theory must define a probability distribution over observable data D {\displaystyle D} . Solomonoff's induction essentially boils down to demanding that all such probability distributions be computable. Interestingly, the set of computable probability distributions is a subset of the set of all programs, which is countable. Similarly, the sets of observable data considered by Solomonoff were finite. Without loss of generality, we can thus consider that any observable data is a finite bit string. As a result, Solomonoff's induction can be defined by only invoking discrete probability distributions. Solomonoff's induction then allows to make probabilistic predictions of future data F {\displaystyle F} , by simply obeying the laws of probability. Namely, we have P [ F | D ] = E T [ P [ F | T , D ] ] = ∑ T P [ F | T , D ] P [ T | D ] {\displaystyle \mathbb {P} [F|D]=\mathbb {E} _{T}[\mathbb {P} [F|T,D]]=\sum _{T}\mathbb {P} [F|T,D]\mathbb {P} [T|D]} . This quantity can be interpreted as the average predictions P [ F | T , D ] {\displaystyle \mathbb {P} [F|T,D]} of all theories T {\displaystyle T} given past data D {\displaystyle D} , weighted by their posterior credences P [ T | D ] {\displaystyle \mathbb {P} [T|D]} . === Mathematical === The proof of the "razor" is based on the known mathematical properties of a probability distribution over a countable set. These properties are relevant because the infinite set of all programs is a denumerable set. The sum S of the probabilities of all programs must be exactly equal to one (as per the definition of probability) thus the probabilities must roughly decrease as we enumerate the infinite set of all programs, otherwise S will be strictly greater than one. To be more precise, for every ϵ {\displaystyle \epsilon } > 0, there is some length l such that the probability of all programs longer than l is at most ϵ {\displaystyle \epsilon } . This does not, however, preclude very long programs from having very high probability. Fundamental ingredients of the theory are the concepts of algorithmic probability and Kolmogorov complexity. The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion. == Mathematical guarantees == === Solomonoff's completeness === The remarkable property of Solomonoff's induction is its completeness. In essence, the completeness theorem guarantees that the expected cumulative errors made by the predictions based on Solomonoff's induction are upper-bounded by the Kolmogorov complexity of the (stochastic) data generating process. The errors can be measured using the Kullback–Leibler divergence or the square of the difference between the induction's prediction and the probability assigned by the (stochastic) data generating process. === Solomonoff's uncomputability === Unfortunately, Solomonoff also proved that Solomonoff's induction is uncomputable. In fact, he showed that computability and completeness are mutually exclusive: any complete theory must be uncomputable. The proof of this is derived from a game between the induction and the environment. Essentially, any computable induction can be tricked by a computable environment, by choosing the computable environment that negates the computable induction's prediction. This fact can be regarded as an instance of the no free lunch theorem. == Modern applications == === Artificial intelligence === Though Solomonoff's inductive inference is not computable, several AIXI-derived algorithms approximate it in order to make it run on a modern computer. The more computing power they are given, the closer their predictions are to the predictions of inductive inference (their mathematical limit is Solomonoff's inductive inference). Another direction of inductive inference is based on E. Mark Gold's model of learning in the limit from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which for any input of the form (f(0),f(1),...,f(n)) outputs a hypothesis (an index e with respect to a previously agreed on acceptable numbering of all computable functions; the indexed function may be required consistent with the given values of f). A learner M learns a function f if almost all its hypotheses are the same index e, which generates the function f; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards. A far reaching extension of the Gold’s approach is developed by Schmidhuber's theory of generalized Kolmogorov complexities, which are kinds of super-recursive algorithms.

    Read more →
  • Matrix regularization

    Matrix regularization

    In the field of statistical learning theory, matrix regularization generalizes notions of vector regularization to cases where the object to be learned is a matrix. The purpose of regularization is to enforce conditions, for example sparsity or smoothness, that can produce stable predictive functions. For example, in the more common vector framework, Tikhonov regularization optimizes over min x ‖ A x − y ‖ 2 + λ ‖ x ‖ 2 {\displaystyle \min _{x}\left\|Ax-y\right\|^{2}+\lambda \left\|x\right\|^{2}} to find a vector x {\displaystyle x} that is a stable solution to the regression problem. When the system is described by a matrix rather than a vector, this problem can be written as min X ‖ A X − Y ‖ 2 + λ ‖ X ‖ 2 , {\displaystyle \min _{X}\left\|AX-Y\right\|^{2}+\lambda \left\|X\right\|^{2},} where the vector norm enforcing a regularization penalty on x {\displaystyle x} has been extended to a matrix norm on X {\displaystyle X} . Matrix regularization has applications in matrix completion, multivariate regression, and multi-task learning. Ideas of feature and group selection can also be extended to matrices, and these can be generalized to the nonparametric case of multiple kernel learning. == Basic definition == Consider a matrix W {\displaystyle W} to be learned from a set of examples, S = ( X i t , y i t ) {\displaystyle S=(X_{i}^{t},y_{i}^{t})} , where i {\displaystyle i} goes from 1 {\displaystyle 1} to n {\displaystyle n} , and t {\displaystyle t} goes from 1 {\displaystyle 1} to T {\displaystyle T} . Let each input matrix X i {\displaystyle X_{i}} be ∈ R D T {\displaystyle \in \mathbb {R} ^{DT}} , and let W {\displaystyle W} be of size D × T {\displaystyle D\times T} . A general model for the output y {\displaystyle y} can be posed as y i t = ⟨ W , X i t ⟩ F , {\displaystyle y_{i}^{t}=\left\langle W,X_{i}^{t}\right\rangle _{F},} where the inner product is the Frobenius inner product. For different applications the matrices X i {\displaystyle X_{i}} will have different forms, but for each of these the optimization problem to infer W {\displaystyle W} can be written as min W ∈ H E ( W ) + R ( W ) , {\displaystyle \min _{W\in {\mathcal {H}}}E(W)+R(W),} where E {\displaystyle E} defines the empirical error for a given W {\displaystyle W} , and R ( W ) {\displaystyle R(W)} is a matrix regularization penalty. The function R ( W ) {\displaystyle R(W)} is typically chosen to be convex and is often selected to enforce sparsity (using ℓ 1 {\displaystyle \ell ^{1}} -norms) and/or smoothness (using ℓ 2 {\displaystyle \ell ^{2}} -norms). Finally, W {\displaystyle W} is in the space of matrices H {\displaystyle {\mathcal {H}}} with Frobenius inner product ⟨ … ⟩ F {\displaystyle \langle \dots \rangle _{F}} . == General applications == === Matrix completion === In the problem of matrix completion, the matrix X i t {\displaystyle X_{i}^{t}} takes the form X i t = e t ⊗ e i ′ , {\displaystyle X_{i}^{t}=e_{t}\otimes e_{i}',} where ( e t ) t {\displaystyle (e_{t})_{t}} and ( e i ′ ) i {\displaystyle (e_{i}')_{i}} are the canonical basis in R T {\displaystyle \mathbb {R} ^{T}} and R D {\displaystyle \mathbb {R} ^{D}} . In this case the role of the Frobenius inner product is to select individual elements w i t {\displaystyle w_{i}^{t}} from the matrix W {\displaystyle W} . Thus, the output y {\displaystyle y} is a sampling of entries from the matrix W {\displaystyle W} . The problem of reconstructing W {\displaystyle W} from a small set of sampled entries is possible only under certain restrictions on the matrix, and these restrictions can be enforced by a regularization function. For example, it might be assumed that W {\displaystyle W} is low-rank, in which case the regularization penalty can take the form of a nuclear norm. R ( W ) = λ ‖ W ‖ ∗ = λ ∑ i | σ i | , {\displaystyle R(W)=\lambda \left\|W\right\|_{}=\lambda \sum _{i}\left|\sigma _{i}\right|,} where σ i {\displaystyle \sigma _{i}} , with i {\displaystyle i} from 1 {\displaystyle 1} to min D , T {\displaystyle \min D,T} , are the singular values of W {\displaystyle W} . === Multivariate regression === Models used in multivariate regression are parameterized by a matrix of coefficients. In the Frobenius inner product above, each matrix X {\displaystyle X} is X i t = e t ⊗ x i {\displaystyle X_{i}^{t}=e_{t}\otimes x_{i}} such that the output of the inner product is the dot product of one row of the input with one column of the coefficient matrix. The familiar form of such models is Y = X W + b {\displaystyle Y=XW+b} Many of the vector norms used in single variable regression can be extended to the multivariate case. One example is the squared Frobenius norm, which can be viewed as an ℓ 2 {\displaystyle \ell ^{2}} -norm acting either entrywise, or on the singular values of the matrix: R ( W ) = λ ‖ W ‖ F 2 = λ ∑ i ∑ j | w i j | 2 = λ Tr ⁡ ( W ∗ W ) = λ ∑ i σ i 2 . {\displaystyle R(W)=\lambda \left\|W\right\|_{F}^{2}=\lambda \sum _{i}\sum _{j}\left|w_{ij}\right|^{2}=\lambda \operatorname {Tr} \left(W^{}W\right)=\lambda \sum _{i}\sigma _{i}^{2}.} In the multivariate case the effect of regularizing with the Frobenius norm is the same as the vector case; very complex models will have larger norms, and, thus, will be penalized more. === Multi-task learning === The setup for multi-task learning is almost the same as the setup for multivariate regression. The primary difference is that the input variables are also indexed by task (columns of Y {\displaystyle Y} ). The representation with the Frobenius inner product is then X i t = e t ⊗ x i t . {\displaystyle X_{i}^{t}=e_{t}\otimes x_{i}^{t}.} The role of matrix regularization in this setting can be the same as in multivariate regression, but matrix norms can also be used to couple learning problems across tasks. In particular, note that for the optimization problem min W ‖ X W − Y ‖ 2 2 + λ ‖ W ‖ 2 2 {\displaystyle \min _{W}\left\|XW-Y\right\|_{2}^{2}+\lambda \left\|W\right\|_{2}^{2}} the solutions corresponding to each column of Y {\displaystyle Y} are decoupled. That is, the same solution can be found by solving the joint problem, or by solving an isolated regression problem for each column. The problems can be coupled by adding an additional regularization penalty on the covariance of solutions min W , Ω ‖ X W − Y ‖ 2 2 + λ 1 ‖ W ‖ 2 2 + λ 2 Tr ⁡ ( W T Ω − 1 W ) {\displaystyle \min _{W,\Omega }\left\|XW-Y\right\|_{2}^{2}+\lambda _{1}\left\|W\right\|_{2}^{2}+\lambda _{2}\operatorname {Tr} \left(W^{T}\Omega ^{-1}W\right)} where Ω {\displaystyle \Omega } models the relationship between tasks. This scheme can be used to both enforce similarity of solutions across tasks, and to learn the specific structure of task similarity by alternating between optimizations of W {\displaystyle W} and Ω {\displaystyle \Omega } . When the relationship between tasks is known to lie on a graph, the Laplacian matrix of the graph can be used to couple the learning problems. == Spectral regularization == Regularization by spectral filtering has been used to find stable solutions to problems such as those discussed above by addressing ill-posed matrix inversions (see for example Filter function for Tikhonov regularization). In many cases the regularization function acts on the input (or kernel) to ensure a bounded inverse by eliminating small singular values, but it can also be useful to have spectral norms that act on the matrix that is to be learned. There are a number of matrix norms that act on the singular values of the matrix. Frequently used examples include the Schatten p-norms, with p = 1 or 2. For example, matrix regularization with a Schatten 1-norm, also called the nuclear norm, can be used to enforce sparsity in the spectrum of a matrix. This has been used in the context of matrix completion when the matrix in question is believed to have a restricted rank. In this case the optimization problem becomes: min ‖ W ‖ ∗ subject to W i , j = Y i j . {\displaystyle \min \left\|W\right\|_{}~~{\text{ subject to }}~~W_{i,j}=Y_{ij}.} Spectral Regularization is also used to enforce a reduced rank coefficient matrix in multivariate regression. In this setting, a reduced rank coefficient matrix can be found by keeping just the top n {\displaystyle n} singular values, but this can be extended to keep any reduced set of singular values and vectors. == Structured sparsity == Sparse optimization has become the focus of much research interest as a way to find solutions that depend on a small number of variables (see e.g. the Lasso method). In principle, entry-wise sparsity can be enforced by penalizing the entry-wise ℓ 0 {\displaystyle \ell ^{0}} -norm of the matrix, but the ℓ 0 {\displaystyle \ell ^{0}} -norm is not convex. In practice this can be implemented by convex relaxation to the ℓ 1 {\displaystyle \ell ^{1}} -norm. While entry-wise regularization with an ℓ 1 {\displaystyle \ell ^{1}} -norm will find solutions with a small number of nonzero elements, applying an ℓ 1 {

    Read more →
  • Cognitive philology

    Cognitive philology

    Cognitive philology is the science that studies written and oral texts as the product of human mental processes. Studies in cognitive philology compare documentary evidence emerging from textual investigations with results of experimental research, especially in the fields of cognitive and ecological psychology, neurosciences and artificial intelligence. "The point is not the text, but the mind that made it". Cognitive Philology aims to foster communication between literary, textual, philological disciplines on the one hand and researches across the whole range of the cognitive, evolutionary, ecological and human sciences on the other. Cognitive philology: investigates transmission of oral and written text, and categorization processes which lead to classification of knowledge, mostly relying on the information theory; studies how narratives emerge in so called natural conversation and selective process which lead to the rise of literary standards for storytelling, mostly relying on embodied semantics; explores the evolutive and evolutionary role played by rhythm and metre in human ontogenetic and phylogenetic development and the pertinence of the semantic association during processing of cognitive maps; Provides the scientific ground for multimedia critical editions of literary texts. Among the founding thinkers and noteworthy scholars devoted to such investigations are: Alan Richardson: Studies Theory of Mind in early-modern and contemporary literature. Anatole Pierre Fuksas Benoît de Cornulier David Herman: Professor of English at North Carolina State University and an adjunct professor of linguistics at Duke University. He is the author of "Universal Grammar and Narrative Form" and the editor of "Narratologies: New Perspectives on Narrative Analysis". Domenico Fiormonte François Recanati Gilles Fauconnier, a professor in Cognitive science at the University of California, San Diego. He was one of the founders of cognitive linguistics in the 1970s through his work on pragmatic scales and mental spaces. His research explores the areas of conceptual integration and compressions of conceptual mappings in terms of the emergent structure in language. Julián Santano Moreno Luca Nobile Manfred Jahn in Germany Mark Turner Paolo Canettieri

    Read more →
  • Timeline of artificial intelligence risks in global finance

    Timeline of artificial intelligence risks in global finance

    The following article is a broad timeline of the course of events related to artificial intelligence risks in global finance. The AI boom has led to concerns including the existential risk from artificial intelligence, as the uptake on applications of artificial intelligence increases. By late 2025, global finance and artificial intelligence were "deeply intertwined". A June 2025 Menlo Ventures report raised concerns about the sustainability of future revenue and long-term profitability of AI, given the relatively low rate of consumer monetization. == 2017 == 30 NovemberThe New York Times said that new AI reports by McKinsey & Company, the National Bureau of Economic Research, and an AI Index created by university researchers, indicated an early AI boom. The Index built on a project—"The One Hundred Year Study on Artificial Intelligence" launched in 2014. == 2018 == 2018 was a year of incremental AI growth in finance. == 2022 == The release of ChatGPT by OpenAI became the catalyst for an artificial intelligence boom that continues to remake the global economy. According to a European Central Bank report, public interest in AI increased rapidly as evidenced with rising Google searches, AI jobs, models, patents, and innovations since late 2022. At that time Europe led the US in the size of its AI workforce. == 2023 == The regulatory body, the International Monetary Fund (IMF), published their report, "Generative Artificial Intelligence in Finance: Risk Considerations", drawing attention to oversight gaps and the need for regulations. The report explores the risks posed by using generative artificial intelligence (GenAI) systems in the financial sector including "broader risks to financial stability." == 2024 == January 12 In January 2024 Bloomberg's published its list of the "Magnificent Seven" Big Tech companies on the stock market based on their strength, size and market capitalization:Apple, Microsoft, Alphabet (Google), Amazon, Meta Platforms (Facebook), Nvidia, and Tesla. 21 June During the AI boom, Nvidia became the world's most valuable company, surpassing Microsoft, as its value increased to over US$4 trillion. In 2023 and 2024, the "Magnificent Seven" stocks were the primary drivers behind the increase in equity indexes, according to Reuters. == 2025 == === January === 23 January President Donald Trump's AI policy was announced calling for United States global leadership in artificial intelligence. The Economist noted that this politic shift in which the United States seeks "global dominance" in AI includes trimming regulations and assisting in expansion of infrastructure and increase in number of AI workers. Governments of Gulf nations were also investing trillions of dollars in AI. 27 January Against the backdrop of a tech war between China and the United States over AI dominance, within days of the launch of China's free DeepSeek App, it was the most downloaded app in the United States, rising to the first place in the Apple app store. President Trump responded immediately, saying this "sudden rise" should be a "wake-up" call to the United States, and called on US companies to be more competitive. === June === 26 June In their June 2025 report, Menlo Ventures estimated that only about 3% of consumers paid for artificial intelligence-related services, representing about $USD12 billion in annual spending. This is relatively low in contrast to the massive capital expenditure by AI infrastructure companies, which raises concerns about revenue sustainability and long-term profitability. === July === 23 July The Trump administration launched the US AI Action Plan, positioning the United States in a high-stakes technological race with China for global dominance in artificial intelligence, emphasizing that neither nation can afford to fall behind due to the exponential nature of AI advancement. The plan, a new government website and policy speech called for accelerated AI adoption across federal agencies, and a number of initiatives to make is easier for AI infrastructure expansion, and other measures to ensure American leadership in AI standards. Some leading experts warned that the administration failed to provide sufficient regulations and safeguards for AI safety. Concerns were raised about the negative impacts of cuts to research funding and tightened visa policies for scientists, potentially undermining public trust and America's ability to compete internationally. === September === 7 September The Economist cautioned that AI revenues are relatively modest compared to the high cost and investments in the creation of new data centers. Even Sam Altman, OpenAI CEO and one of the leading figures of the AI boom,, raised concerns about investors' outsized hopes for financial returns. At the same time, history has shown that new technologies, like railways and electricity, endured and spread after the initial hype faded. 12 September Economists warn that U.S. households' direct and indirect investments—mutual funds or retirement plans—in the stock market reached an unprecedented historically high level, now representing 45% of all financial assets, or about $USD51.2 trillion. Compared to the Dot-com bubble this represents a sharp increase in exposure. This makes U.S. households vulnerable to market downturns which in turn would result in decreasing consumer spending. U.S. household net worth rose to a record $176.3 trillion in the second quarter, an increase of $7.3 trillion since early 2025 and about $46 trillion higher than before the pandemic. Federal Reserve data attribute the surge primarily to gains in stock markets and housing values. However, the rise in wealth on paper coincided with increased household borrowing and growing government debt. 18 September Questions were being raised about how quickly the data centers, chips, servers, and GPUs assets of major AI companies will depreciate in value. Comparisons have been made to the Railway Mania in the aftermath of the stock market bubble where a valuable physical infrastructure remained standing, and the telecoms crash after the dot-com bubble which left fiber networks. 28 September There were warnings that record-high American stock ownership during the AI-fueled market boom is a red flag for systemic risk, as the current concentration in equities exceeds levels seen before the dot-com bubble burst in 2000, and could amplify the impact of any future stock market correction. === October === 3 October In 2025 alone, venture capitalists invested almost $USD200 billion in the artificial intelligence sector. 29 October Nvidia was the first company in the world to be valued at US$5 trillion, largely due to AI demand and strategic partnerships with leading technology and AI firms. Nvidia's increase in value was "meteoric". === November === 2 November Forbes reported that, since April, the 'Magnificent Seven' tech giants together contributed over 40% of the S&P 500's return, highlighting their outsized influence and the growing impact of AI on market valuations. CNN warned that while there is a current benefit to investors, with such a high concentration in the S&P 500, they are highly exposed to the fate of the Mag Seven. 2 November Globally there are 11,000 datacentres—huge campuses for AI infrastructure, including thousands of chips, GPUS, and servers. This represents a 500% increase over the last two decades. It is anticipated that $3USDtn more will be spent on increasing that number over the next two or three years. 5 November Concerns about the potential for a market bubble were raised as six of the AI-related Big Tech "Magnificent Seven"—that contribute to the AI boom—reported losing ground in the stock market. Global markets and artificial intelligence have become "deeply intertwined", according to a Reuters report. As of November 2025, more than 50% of the 20 largest S&P firms were deeply exposed to AI. In contrast, in 2000, the 20 S&P 500 firms represented 39% of its total value only 11 of these companies were exposed to the internet. If AI fails to deliver strong returns on their investments, these top S&P firms would be significantly impacted, according to the Economist. Analysts suggest that the AI market in 2025 may not behave like a traditional one, as investors are simultaneously aware of the risks and driven by the potential for outsized rewards. Leading AI labs may believe that the first company to achieve artificial general intelligence (AGI), when an AI system surpasses all human cognitive abilities and becomes capable of self-improvement—could dominate the future of technology and finance. While some have estimated that the potential value of such a breakthrough could be as high as $1.46 quadrillion, this figure is speculative and widely debated. 5 November Bloomberg described Nvidia's H100 Hopper-Blackwell AI chips as the "King of AI chips". Nvidia dominates the AI chip market with over 78% of the market share because of both speed and cost. According to B

    Read more →
  • Emergent algorithm

    Emergent algorithm

    An emergent algorithm is an algorithm that exhibits emergent behavior. In essence an emergent algorithm implements a set of simple building block behaviors that when combined exhibit more complex behaviors. One example of this is the implementation of fuzzy motion controllers used to adapt robot movement in response to environmental obstacles. An emergent algorithm has the following characteristics: it achieves predictable global effects it does not require global visibility it does not assume any kind of centralized control it is self-stabilizing Other examples of emergent algorithms and models include cellular automata, artificial neural networks and swarm intelligence systems (ant colony optimization, bees algorithm, etc.).

    Read more →
  • Deconvolution

    Deconvolution

    In mathematics, deconvolution is the inverse of convolution. Both operations are used in signal processing and image processing. For example, it may be possible to recover the original signal after a filter (convolution) by using a deconvolution method with a certain degree of accuracy. Due to the measurement error of the recorded signal or image, it can be demonstrated that the worse the signal-to-noise ratio (SNR), the worse the reversing of a filter will be; hence, inverting a filter is not always a good solution as the error amplifies. Deconvolution offers a solution to this problem. The foundations for deconvolution and time-series analysis were largely laid by Norbert Wiener of the Massachusetts Institute of Technology in his book Extrapolation, Interpolation, and Smoothing of Stationary Time Series (1949). The book was based on work Wiener had done during World War II but that had been classified at the time. Some of the early attempts to apply these theories were in the fields of weather forecasting and economics. == Description == In general, the objective of deconvolution is to find the solution f of a convolution equation of the form: f ∗ g = h {\displaystyle fg=h\,} Usually, h is some recorded signal, and f is some signal that we wish to recover, but has been convolved with a filter or distortion function g, before we recorded it. Usually, h is a distorted version of f and the shape of f can't be easily recognized by the eye or simpler time-domain operations. The function g represents the impulse response of an instrument or a driving force that was applied to a physical system. If we know g, or at least know the form of g, then we can perform deterministic deconvolution. However, if we do not know g in advance, then we need to estimate it. This can be done using methods of statistical estimation or building the physical principles of the underlying system, such as the electrical circuit equations or diffusion equations. There are several deconvolution techniques, depending on the choice of the measurement error and deconvolution parameters: === Raw deconvolution === When the measurement error is very low (ideal case), deconvolution collapses into a filter reversing. This kind of deconvolution can be performed in the Laplace domain. By computing the Fourier transform of the recorded signal h and the system response function g, you get H and G, with G as the transfer function. Using the convolution theorem, F = H / G {\displaystyle F=H/G\,} where F is the estimated Fourier transform of f. Finally, the inverse Fourier transform of the function F is taken to find the estimated deconvolved signal f. Note that G is at the denominator and could amplify elements of the error model if present. === Deconvolution with noise === In physical measurements, the situation is usually closer to ( f ∗ g ) + ε = h {\displaystyle (fg)+\varepsilon =h\,} In this case ε is noise that has entered our recorded signal. If a noisy signal or image is assumed to be noiseless, the statistical estimate of g will be incorrect. In turn, the estimate of ƒ will also be incorrect. The lower the signal-to-noise ratio, the worse the estimate of the deconvolved signal will be. That is the reason why inverse filtering the signal (as in the "raw deconvolution" above) is usually not a good solution. However, if at least some knowledge exists of the type of noise in the data (for example, white noise), the estimate of ƒ can be improved through techniques such as Wiener deconvolution. == Applications == === Seismology === The concept of deconvolution had an early application in reflection seismology. In 1950, Enders Robinson was a graduate student at MIT. He worked with others at MIT, such as Norbert Wiener, Norman Levinson, and economist Paul Samuelson, to develop the "convolutional model" of a reflection seismogram. This model assumes that the recorded seismogram s(t) is the convolution of an Earth-reflectivity function e(t) and a seismic wavelet w(t) from a point source, where t represents recording time. Thus, our convolution equation is s ( t ) = ( e ∗ w ) ( t ) . {\displaystyle s(t)=(ew)(t).\,} The seismologist is interested in e, which contains information about the Earth's structure. By the convolution theorem, this equation may be Fourier transformed to S ( ω ) = E ( ω ) W ( ω ) {\displaystyle S(\omega )=E(\omega )W(\omega )\,} in the frequency domain, where ω {\displaystyle \omega } is the frequency variable. By assuming that the reflectivity is white, we can assume that the power spectrum of the reflectivity is constant, and that the power spectrum of the seismogram is the spectrum of the wavelet multiplied by that constant. Thus, | S ( ω ) | ≈ k | W ( ω ) | . {\displaystyle |S(\omega )|\approx k|W(\omega )|.\,} If we assume that the wavelet is minimum phase, we can recover it by calculating the minimum phase equivalent of the power spectrum we just found. The reflectivity may be recovered by designing and applying a Wiener filter that shapes the estimated wavelet to a Dirac delta function (i.e., a spike). The result may be seen as a series of scaled, shifted delta functions (although this is not mathematically rigorous): e ( t ) = ∑ i = 1 N r i δ ( t − τ i ) , {\displaystyle e(t)=\sum _{i=1}^{N}r_{i}\delta (t-\tau _{i}),} where N is the number of reflection events, r i {\displaystyle r_{i}} are the reflection coefficients, t − τ i {\displaystyle t-\tau _{i}} are the reflection times of each event, and δ {\displaystyle \delta } is the Dirac delta function. In practice, since we are dealing with noisy, finite bandwidth, finite length, discretely sampled datasets, the above procedure only yields an approximation of the filter required to deconvolve the data. However, by formulating the problem as the solution of a Toeplitz matrix and using Levinson recursion, we can relatively quickly estimate a filter with the smallest mean squared error possible. We can also do deconvolution directly in the frequency domain and get similar results. The technique is closely related to linear prediction. === Optics and other imaging === In optics and imaging, the term "deconvolution" is specifically used to refer to the process of reversing the optical distortion that takes place in an optical microscope, electron microscope, telescope, or other imaging instrument, thus creating clearer images. It is usually done in the digital domain by a software algorithm, as part of a suite of microscope image processing techniques. Deconvolution is also practical to sharpen images that suffer from fast motion or jiggles during capturing. Early Hubble Space Telescope images were distorted by a flawed mirror and were sharpened by deconvolution. The usual method is to assume that the optical path through the instrument is optically perfect, convolved with a point spread function (PSF), that is, a mathematical function that describes the distortion in terms of the pathway a theoretical point source of light (or other waves) takes through the instrument. Usually, such a point source contributes a small area of fuzziness to the final image. If this function can be determined, it is then a matter of computing its inverse or complementary function, and convolving the acquired image with that. The result is the original, undistorted image. In practice, finding the true PSF is impossible, and usually an approximation of it is used, theoretically calculated or based on some experimental estimation by using known probes. Real optics may also have different PSFs at different focal and spatial locations, and the PSF may be non-linear. The accuracy of the approximation of the PSF will dictate the final result. Different algorithms can be employed to give better results, at the price of being more computationally intensive. Since the original convolution discards data, some algorithms use additional data acquired at nearby focal points to make up some of the lost information. Regularization in iterative algorithms (as in expectation-maximization algorithms) can be applied to avoid unrealistic solutions. When the PSF is unknown, it may be possible to deduce it by systematically trying different possible PSFs and assessing whether the image has improved. This procedure is called blind deconvolution. Blind deconvolution is a well-established image restoration technique in astronomy, where the point nature of the objects photographed exposes the PSF thus making it more feasible. It is also used in fluorescence microscopy for image restoration, and in fluorescence spectral imaging for spectral separation of multiple unknown fluorophores. The most common iterative algorithm for the purpose is the Richardson–Lucy deconvolution algorithm; the Wiener deconvolution (and approximations) are the most common non-iterative algorithms. For some specific imaging systems such as laser pulsed terahertz systems, PSF can be modeled mathematically. As a result, as shown in the figure, deconvolution of the modeled PS

    Read more →
  • Artificial intelligence of things

    Artificial intelligence of things

    Artificial Intelligence of Things (AIoT) is the combination of artificial intelligence (AI) technologies with the Internet of things (IoT) infrastructure to create systems capable of sensing, learning, and acting on data without continuous human intervention. While IoT focuses on connectivity and sensor data collection, AI enables IoT devices to analyse data in real time and produce actionable outputs, including automated decisions at the edge. == Applications == === Manufacturing and predictive maintenance === Manufacturing accounts for the largest share of AIoT adoption by industry vertical. A common application is predictive maintenance, where sensors measuring vibration, temperature, current draw, and acoustic emissions feed machine learning models trained to detect signatures that precede equipment failure. These systems can flag developing faults weeks or months in advance, and in more advanced deployments can autonomously adjust machine parameters such as motor speed or cooling cycles to delay or prevent failure. === Other industries === In healthcare, AIoT enables remote patient monitoring through wearable devices that collect vital signs and apply AI models to detect anomalies or predict deterioration. In logistics, GPS and telematics sensors combined with AI models support real-time route optimisation, vehicle maintenance prediction, and fuel cost forecasting. Smart building systems use occupancy, temperature, and energy sensors with AI to dynamically adjust HVAC and lighting, reducing energy consumption. == Architecture == AIoT systems typically operate across three layers: a device layer of sensors and actuators that collect data, a connectivity layer that transmits data via protocols such as MQTT or HTTP, and a compute layer where AI models process the data either in the cloud or at the edge. The trend toward edge-based processing, where inference runs on low-cost processors near the data source rather than in a centralised cloud, has accelerated as hardware costs have fallen and applications increasingly require sub-second response times. == Market == Market sizing estimates for AIoT vary significantly depending on scope and definition. Fortune Business Insights valued the AIoT market at USD 35.65 billion in 2023, projecting growth to USD 253.86 billion by 2030 at a compound annual growth rate of 32.4%. Grand View Research estimated the broader market at USD 171.4 billion in 2024 with a CAGR of 31.7% through 2030, reflecting a wider definition that includes AI-integrated hardware components. North America accounted for approximately 40% of global market share in 2024, with the Asia-Pacific region projected as the fastest-growing market.

    Read more →
  • Smart object

    Smart object

    A smart object is an object that enhances the interaction with not only people but also with other smart objects. Also known as smart connected products or smart connected things (SCoT), they are products, assets and other things embedded with processors, sensors, software and connectivity that allow data to be exchanged between the product and its environment, manufacturer, operator/user, and other products and systems. Connectivity also enables some capabilities of the product to exist outside the physical device, in what is known as the product cloud. The data collected from these products can be then analysed to inform decision-making, enable operational efficiencies and continuously improve the performance of the product. It can not only refer to interaction with physical world objects but also to interaction with virtual (computing environment) objects. A smart physical object may be created either as an artifact or manufactured product or by embedding electronic tags such as RFID tags or sensors into non-smart physical objects. Smart virtual objects are created as software objects that are intrinsic when creating and operating a virtual or cyber world simulation or game. The concept of a smart object has several origins and uses, see History. There are also several overlapping terms, see also smart device, tangible object or tangible user interface and Thing as in the Internet of things. == History == In the early 1990s, Mark Weiser, from whom the term ubiquitous computing originated, referred to a vision "When almost every object either contains a computer or can have a tab attached to it, obtaining information will be trivial", Although Weiser did not specifically refer to an object as being smart, his early work did imply that smart physical objects are smart in the sense that they act as digital information sources. Hiroshi Ishii and Brygg Ullmer refer to tangible objects in terms of tangibles bits or tangible user interfaces that enable users to "grasp & manipulate" bits in the center of users' attention by coupling the bits with everyday physical objects and architectural surfaces. The smart object concept was introduced by Marcelo Kallman and Daniel Thalmann as an object that can describe its own possible interactions. The main focus here is to model interactions of smart virtual objects with virtual humans, agents, in virtual worlds. The opposite approach to smart objects is 'plain' objects that do not provide this information. The additional information provided by this concept enables far more general interaction schemes, and can greatly simplify the planner of an artificial intelligence agent. In contrast to smart virtual objects used in virtual worlds, Lev Manovich focuses on physical space filled with electronic and visual information. Here, "smart objects" are described as "objects connected to the Net; objects that can sense their users and display smart behaviour". More recently in the early 2010s, smart objects are being proposed as a key enabler for the vision of the Internet of things. The combination of the Internet and emerging technologies such as near field communications, real-time localization, and embedded sensors enables everyday objects to be transformed into smart objects that can understand and react to their environment. Such objects are building blocks for the Internet of things and enable novel computing applications. In 2018, one of the world's first smart houses was built in Klaukkala, Finland in the form of a five-floor apartment block, using the Kone Residential Flow solution created by KONE, allowing even a smartphone to act as a home key. == Characteristics == Although we can view interaction with physical smart object in the physical world as distinct from interaction with virtual smart objects in a virtual simulated world, these can be related. Poslad considers the progression of: how humans use models of smart objects situated in the physical world to enhance human to physical world interaction; versus how smart physical objects situated in the physical world can model human interaction in order to lessen the need for human to physical world interaction; versus how virtual smart objects by modelling both physical world objects and modelling humans as objects and their subsequent interactions can form a predominantly smart virtual object environment. === Smart physical objects === The concept smart for a smart physical object simply means that it is active, digital, networked, can operate to some extent autonomously, is reconfigurable and has local control of the resources it needs such as energy, data storage, etc. Note, a smart object does not necessarily need to be intelligent as in exhibiting a strong essence of artificial intelligence—although it can be designed to also be intelligent. Physical world smart objects can be described in terms of three properties: Awareness: is a smart object's ability to understand (that is, sense, interpret, and react to) events and human activities occurring in the physical world. Representation: refers to a smart object's application and programming model—in particular, programming abstractions. Interaction: denotes the object's ability to converse with the user in terms of input, output, control, and feedback. Based upon these properties, these have been classified into three types: Activity-Aware Smart Objects: Are objects that can record information about work activities and its own use. Policy-Aware Smart Objects: Are objects that are activity-aware Objects can interpret events and activities with respect to predefined organizational policies. Process-Aware Smart Objects: Processes play a fundamental role in industrial work management and operation. A process is a collection of related activities or tasks that are ordered according to their position in time and space. === Smart virtual objects === For the virtual object in a virtual world case, an object is called smart when it has the ability to describe its possible interactions. This focuses on constructing a virtual world using only virtual objects that contain their own interaction information. There are four basic elements to constructing such a smart virtual object framework. Object properties: physical properties and a text description Interaction information: position of handles, buttons, grips, and the like Object behavior: different behaviors based on state variables Agent behaviors: description of the behavior an agent should follow when using the object Some versions of smart objects also include animation information in the object information, but this is not considered to be an efficient approach, since this can make objects inappropriately oversized. === Categorization === The terms smart, connected product or smart product can be confusing as it is used to cover a broad range of different products, ranging from smart home appliances (e.g., smart bathroom scales or smart light bulbs) to smart cars (e.g., Tesla). While these products share certain similarities, they often differ substantially in their capabilities. Raff et al. developed a conceptual framework that distinguishes different smart products based on their capabilities, which features 4 types of smart product archetypes (in ascending order of "smartness"). Digital Connected Responsive Intelligent == Advantages == Smart, connected products have three primary components: Physical – made up of the product's mechanical and electrical parts. Smart – made up of sensors, microprocessors, data storage, controls, software, and an embedded operating system with enhanced user interface. Connectivity – made up of ports, antennae, and protocols enabling wired/wireless connections that serve two purposes, it allows data to be exchanged with the product and enables some functions of the product to exist outside the physical device. Each component expands the capabilities of one another resulting in "a virtuous cycle of value improvement". First, the smart components of a product amplify the value and capabilities of the physical components. Then, connectivity amplifies the value and capabilities of the smart components. These improvements include: Monitoring of the product's conditions, its external environment, and its operations and usage. Control of various product functions to better respond to changes in its environment, as well as to personalize the user experience. Optimization of the product's overall operations based on actual performance data, and reduction of downtimes through predictive maintenance and remote service. Autonomous product operation, including learning from their environment, adapting to users' preferences and self-diagnosing and service. === The Internet of things (IoT) === The Internet of things is the network of physical objects that contain embedded technology to communicate and sense or interact with their internal states or the external environment. The phrase "Internet of things" reflects the gro

    Read more →
  • EfficientNet

    EfficientNet

    EfficientNet is a family of convolutional neural networks (CNNs) for computer vision published by researchers at Google AI in 2019. Its key innovation is compound scaling, which uniformly scales all dimensions of depth, width, and resolution using a single parameter. EfficientNet models have been adopted in various computer vision tasks, including image classification, object detection, and segmentation. == Compound scaling == EfficientNet introduces compound scaling, which, instead of scaling one dimension of the network at a time, such as depth (number of layers), width (number of channels), or resolution (input image size), uses a compound coefficient ϕ {\displaystyle \phi } to scale all three dimensions simultaneously. Specifically, given a baseline network, the depth, width, and resolution are scaled according to the following equations: depth multiplier: d = α ϕ width multiplier: w = β ϕ resolution multiplier: r = γ ϕ {\displaystyle {\begin{aligned}{\text{depth multiplier: }}d&=\alpha ^{\phi }\\{\text{width multiplier: }}w&=\beta ^{\phi }\\{\text{resolution multiplier: }}r&=\gamma ^{\phi }\end{aligned}}} subject to α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} and α ≥ 1 , β ≥ 1 , γ ≥ 1 {\displaystyle \alpha \geq 1,\beta \geq 1,\gamma \geq 1} . The α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} condition is such that increasing ϕ {\displaystyle \phi } by a factor of ϕ 0 {\displaystyle \phi _{0}} would increase the total FLOPs of running the network on an image approximately 2 ϕ 0 {\displaystyle 2^{\phi _{0}}} times. The hyperparameters α {\displaystyle \alpha } , β {\displaystyle \beta } , and γ {\displaystyle \gamma } are determined by a small grid search. The original paper suggested 1.2, 1.1, and 1.15, respectively. Architecturally, they optimized the choice of modules by neural architecture search (NAS), and found that the inverted bottleneck convolution (which they called MBConv) used in MobileNet worked well. The EfficientNet family is a stack of MBConv layers, with shapes determined by the compound scaling. The original publication consisted of 8 models, from EfficientNet-B0 to EfficientNet-B7, with increasing model size and accuracy. EfficientNet-B0 is the baseline network, and subsequent models are obtained by scaling the baseline network by increasing ϕ {\displaystyle \phi } . == Variants == EfficientNet has been adapted for fast inference on edge TPUs and centralized TPU or GPU clusters by NAS. EfficientNet V2 was published in June 2021. The architecture was improved by further NAS search with more types of convolutional layers. It also introduced a training method, which progressively increases image size during training, and uses regularization techniques like dropout, RandAugment, and Mixup. The authors claim this approach mitigates accuracy drops often associated with progressive resizing.

    Read more →
  • Accelerated Linear Algebra

    Accelerated Linear Algebra

    XLA (Accelerated Linear Algebra) is an open-source compiler for machine learning developed by the OpenXLA project. XLA is designed to improve the performance of machine learning models by optimizing the computation graphs at a lower level, making it particularly useful for large-scale computations and high-performance machine learning models. Key features of XLA include: Compilation of Computation Graphs: Compiles computation graphs into efficient machine code. Optimization Techniques: Applies operation fusion, memory optimization, and other techniques. Hardware Support: Optimizes models for various hardware, including CPUs, GPUs, and NPUs. Improved Model Execution Time: Aims to reduce machine learning models' execution time for both training and inference. Seamless Integration: Can be used with existing machine learning code with minimal changes. XLA represents a significant step in optimizing machine learning models, providing developers with tools to enhance computational efficiency and performance. == OpenXLA Project == OpenXLA Project is an open-source machine learning compiler and infrastructure initiative intended to provide a common set of tools for compiling and deploying machine learning models across different frameworks and hardware platforms. It provides a modular compilation stack that can be used by major deep learning frameworks like JAX, PyTorch, and TensorFlow. The project focuses on supplying shared components for optimization, portability, and execution across CPUs, GPUs, and specialized accelerators. Its design emphasizes interoperability between frameworks and a standardized set of representations for model computation. == Components == The OpenXLA ecosystem includes several core components: XLA – A deep learning compiler that optimizes computational graphs for multiple hardware targets. PJRT – A runtime interface that allows different back-ends to connect to XLA through a consistent API. StableHLO – A high-level operator set intended to serve as a stable, portable representation for ML models across compilers and frameworks. Shardy – An MLIR-based system for describing and transforming models that run in distributed or multi-device environments. Additional profiling, testing, and integration tools maintained under the OpenXLA organization. == Users and adopters == Several machine learning frameworks can use or interoperate with OpenXLA components, including JAX, TensorFlow, and parts of the PyTorch ecosystem. The project is developed with participation from multiple hardware and software organizations that contribute back-end integrations, testing, or specifications for their devices. This includes Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive. == Supported target devices == x86-64 ARM64 NVIDIA GPU AMD GPU Intel GPU Apple GPU Google TPU AWS Trainium, Inferentia Cerebras Graphcore IPU == Governance == OpenXLA is developed as a community project with its work carried out in public repositories, discussion forums, and design meetings. Some components, such as StableHLO, began with stewardship from specific organizations and have outlined plans for more formal and distributed governance models as the project matures. == History == The project was announced in 2022 as an effort to coordinate development of ML compiler technologies across major AI companies, notably: Alibaba, Amazon Web Services, AMD, Anyscale, Apple, Arm, Cerebras, Google, Graphcore, Hugging Face, Intel, Meta, NVIDIA and SiFive.. It consolidated the XLA compiler, introduced StableHLO as a portable operator set, and created a unified structure for additional tools. Development continues within multiple repositories under the OpenXLA umbrella. It was founded by Eugene Burmako, James Rubin, Magnus Hyttsten, Mehdi Amini, Navid Khajouei, and Thea Lamkin from Google's Machine Learning organization.

    Read more →
  • Instance-based learning

    Instance-based learning

    In machine learning, instance-based learning (sometimes called memory-based learning) is a family of learning algorithms that, instead of performing explicit generalization, compare new problem instances with instances seen in training, which have been stored in memory. Because computation is postponed until a new instance is observed, these algorithms are sometimes referred to as "lazy." It is called instance-based because it constructs hypotheses directly from the training instances themselves. This means that the hypothesis complexity can grow with the data: in the worst case, a hypothesis is a list of n training items and the computational complexity of classifying a single new instance is O(n). One advantage that instance-based learning has over other methods of machine learning is its ability to adapt its model to previously unseen data. Instance-based learners may simply store a new instance or throw an old instance away. Examples of instance-based learning algorithms are the k-nearest neighbors algorithm, kernel machines and RBF networks. These store (a subset of) their training set; when predicting a value/class for a new instance, they compute distances or similarities between this instance and the training instances to make a decision. To battle the memory complexity of storing all training instances, as well as the risk of overfitting to noise in the training set, instance reduction algorithms have been proposed.

    Read more →
  • Self-supervised learning

    Self-supervised learning

    Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on externally-provided labels. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving them requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples, where one sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects. During SSL, the model learns in two steps. First, the task is solved based on an auxiliary or pretext classification task using pseudo-labels, which help to initialize the model parameters. Next, the actual task is performed with supervised or unsupervised learning. Self-supervised learning has produced promising results in recent years, and has found practical application in fields such as audio processing, and is being used by Facebook and others for speech recognition. == Pseudo-labels == Pseudo-labels are automatically generated labels that a model assigns to unlabeled data based on its own predictions. They are widely used in self-supervised and semi-supervised learning, where ground-truth annotations are limited or unavailable. By treating predicted labels as surrogate ground truth, learning algorithms can make use of large quantities of unlabeled data in the training process. Pseudo-labeling also plays an important role in systems that must adapt to concept drift, where the statistical properties of the data change over time. In these scenarios, the model may detect that an incoming instance deviates from previously learned behavior. The system then generates a classification result for that instance, and this predicted class is used as a pseudo-label for updating or retraining model components that are becoming outdated. This approach enables continuous adaptation in dynamic environments without requiring manual annotation. In many adaptive learning pipelines, pseudo-labels are chosen when the classifier produces sufficiently confident predictions, reducing the risk of propagating errors. These pseudo-labeled instances are then incorporated into training to refresh or evolve the model's understanding of emerging data patterns, particularly when existing components show signs of “aging” due to drift or distributional shifts. This strategy reduces reliance on manual labeling while helping maintain long-term model performance. == Types == === Autoassociative self-supervised learning === Autoassociative self-supervised learning is a specific category of self-supervised learning where a neural network is trained to reproduce or reconstruct its own input data. In other words, the model is tasked with learning a representation of the data that captures its essential features or structure, allowing it to regenerate the original input. The term "autoassociative" comes from the fact that the model is essentially associating the input data with itself. This is often achieved using autoencoders, which are a type of neural network architecture used for representation learning. Autoencoders consist of an encoder network that maps the input data to a lower-dimensional representation (latent space), and a decoder network that reconstructs the input from this representation. The training process involves presenting the model with input data and requiring it to reconstruct the same data as closely as possible. The loss function used during training typically penalizes the difference between the original input and the reconstructed output (e.g. mean squared error). By minimizing this reconstruction error, the autoencoder learns a meaningful representation of the data in its latent space. === Contrastive self-supervised learning === For a binary classification task, training data can be divided into positive examples and negative examples. Positive examples are those that match the target. For example, if training a classifier to identify birds, the positive training data would include images that contain birds. Negative examples would be images that do not. Contrastive self-supervised learning uses both positive and negative examples. The loss function in contrastive learning is used to minimize the distance between positive sample pairs, while maximizing the distance between negative sample pairs. An early example uses a pair of 1-dimensional convolutional neural networks to process a pair of images and maximize their agreement. Contrastive Language-Image Pre-training (CLIP) allows joint pretraining of a text encoder and an image encoder, such that a matching image-text pair have image encoding vector and text encoding vector that span a small angle (having a large cosine similarity). InfoNCE (Noise-Contrastive Estimation) is a method to optimize two models jointly, based on Noise Contrastive Estimation (NCE). Given a set X = { x 1 , … x N } {\displaystyle X=\left\{x_{1},\ldots x_{N}\right\}} of N {\displaystyle N} random samples containing one positive sample from p ( x t + k ∣ c t ) {\displaystyle p\left(x_{t+k}\mid c_{t}\right)} and N − 1 {\displaystyle N-1} negative samples from the 'proposal' distribution p ( x t + k ) {\displaystyle p\left(x_{t+k}\right)} , it minimizes the following loss function: L N = − E X [ log ⁡ f k ( x t + k , c t ) ∑ x j ∈ X f k ( x j , c t ) ] {\displaystyle {\mathcal {L}}_{\mathrm {N} }=-\mathbb {E} _{X}\left[\log {\frac {f_{k}\left(x_{t+k},c_{t}\right)}{\sum _{x_{j}\in X}f_{k}\left(x_{j},c_{t}\right)}}\right]} === Non-contrastive self-supervised learning === Non-contrastive self-supervised learning (NCSSL) uses only positive examples. Counterintuitively, NCSSL converges on a useful local minimum rather than reaching a trivial solution, with zero loss. For the example of binary classification, it would trivially learn to classify each example as positive. Effective NCSSL requires an extra predictor on the online side that does not back-propagate on the target side. === Joint-Embedding and Predictive Architectures === A major class of self-supervised learning moves beyond contrastive pairs, instead maximizing the agreement between views while preventing collapse through statistical constraints. Rooted in Deep Canonical Correlation Analysis (Deep CCA), this approach includes Joint-Embedding Architectures (JEA) like Barlow Twins and VICReg, which enforce covariance constraints to learn invariant representations without negative sampling. Deep Latent Variable Path Modelling (DLVPM) generalizes this to multimodal systems, using path models to enforce correlation and orthogonality across diverse data types. In 2022 Yann LeCun introduced Joint-Embedding Predictive Architectures (JEPA) as a step towards decision making, reasoning, and autonomous human intelligence in machines, including self-improvement through autonomous learning. Founded in representation learning, LeCun included the concept of a “world model” in JEPA which aims to enable machines to replicate human intellect by providing machines with a concept for the world in which they exist. Unlike autoencoders, JEPAs operate entirely in latent space, avoiding pixel-level noise to focus on semantic structure. Rather than just learning invariance, JEPAs learn by predicting masked latent representations from visible context. JEPA has been applied to domains such as image analysis, audio processing, and motion in images and video. == Comparison with other forms of machine learning == SSL belongs to supervised learning methods insofar as the goal is to generate a classified output from the input. At the same time, however, it does not require the explicit use of labeled input-output pairs. Instead, correlations, metadata embedded in the data, or domain knowledge present in the input are implicitly and autonomously extracted from the data. These supervisory signals, extracted from the data, can then be used for training. SSL is similar to unsupervised learning in that it does not require labels in the sample data. Unlike unsupervised learning, however, learning is not done using inherent data structures. Semi-supervised learning combines supervised and unsupervised learning, requiring only a small portion of the learning data be labeled. In transfer learning, a model designed for one task is reused on a different task. Training an autoencoder intrinsically constitutes a self-supervised process, because the output pattern needs to become an optimal reconstruction of the input pattern itself. However, in current jargon, the term 'self-supervised' often refers to tasks based on a pretext-task training setup

    Read more →