AI Code Editor Online Free

AI Code Editor Online Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Grammar systems theory

    Grammar systems theory

    Grammar systems theory is a field of theoretical computer science that studies systems of finite collections of formal grammars generating a formal language. Each grammar works on a string, a so-called sequential form that represents an environment. Grammar systems can thus be used as a formalization of decentralized or distributed systems of agents in artificial intelligence. Let A {\displaystyle \mathbb {A} } be a simple reactive agent moving on the table and trying not to fall down from the table with two reactions, t for turning and ƒ for moving forward. The set of possible behaviors of A {\displaystyle \mathbb {A} } can then be described as formal language L A = { ( f m t n f r ) + : 1 ≤ m ≤ k ; 1 ≤ n ≤ ℓ ; 1 ≤ r ≤ k } , {\displaystyle \mathbb {L_{A}} =\{(f^{m}t^{n}f^{r})^{+}:1\leq m\leq k;1\leq n\leq \ell ;1\leq r\leq k\},} where ƒ can be done maximally k times and t can be done maximally ℓ times considering the dimensions of the table. Let G A {\displaystyle \mathbb {G_{A}} } be a formal grammar which generates language L A {\displaystyle \mathbb {L_{A}} } . The behavior of A {\displaystyle \mathbb {A} } is then described by this grammar. Suppose the A {\displaystyle \mathbb {A} } has a subsumption architecture; each component of this architecture can be then represented as a formal grammar, too, and the final behavior of the agent is then described by this system of grammars. The schema on the right describes such a system of grammars which shares a common string representing an environment. The shared sequential form is sequentially rewritten by each grammar, which can represent either a component or generally an agent. If grammars communicate together and work on a shared sequential form, it is called a Cooperating Distributed (DC) grammar system. Shared sequential form is a similar concept to the blackboard approach in AI, which is inspired by an idea of experts solving some problem together while they share their proposals and ideas on a shared blackboard. Each grammar in a grammar system can also work on its own string and communicate with other grammars in a system by sending their sequential forms on request. Such a grammar system is then called a Parallel Communicating (PC) grammar system. PC and DC are inspired by distributed AI. If there is no communication between grammars, the system is close to the decentralized approaches in AI. These kinds of grammar systems are sometimes called colonies or Eco-Grammar systems, depending (besides others) on whether the environment is changing on its own (Eco-Grammar system) or not (colonies).

    Read more →
  • Legal information retrieval

    Legal information retrieval

    Legal information retrieval is the science of information retrieval applied to legal text, including legislation, case law, and scholarly works. Accurate legal information retrieval is important to provide access to the law to laymen and legal professionals. Its importance has increased because of the vast and quickly increasing amount of legal documents available through electronic means. Legal information retrieval is a part of the growing field of legal informatics. In a legal setting, it is frequently important to retrieve all information related to a specific query. However, commonly used boolean search methods (exact matches of specified terms) on full text legal documents have been shown to have an average recall rate as low as 20 percent, meaning that only 1 in 5 relevant documents are actually retrieved. In that case, researchers believed that they had retrieved over 75% of relevant documents. This may result in failing to retrieve important or precedential cases. In some jurisdictions this may be especially problematic, as legal professionals are ethically obligated to be reasonably informed as to relevant legal documents. Legal Information Retrieval attempts to increase the effectiveness of legal searches by increasing the number of relevant documents (providing a high recall rate) and reducing the number of irrelevant documents (a high precision rate). This is a difficult task, as the legal field is prone to jargon, polysemes (words that have different meanings when used in a legal context), and constant change. Techniques used to achieve these goals generally fall into three categories: boolean retrieval, manual classification of legal text, and natural language processing of legal text. == Problems == Application of standard information retrieval techniques to legal text can be more difficult than application in other subjects. One key problem is that the law rarely has an inherent taxonomy. Instead, the law is generally filled with open-ended terms, which may change over time. This can be especially true in common law countries, where each decided case can subtly change the meaning of a certain word or phrase. Legal information systems must also be programmed to deal with law-specific words and phrases. Though this is less problematic in the context of words which exist solely in law, legal texts also frequently use polysemes, words may have different meanings when used in a legal or common-speech manner, potentially both within the same document. The legal meanings may be dependent on the area of law in which it is applied. For example, in the context of European Union legislation, the term "worker" has four different meanings: Any worker as defined in Article 3(a) of Directive 89/391/EEC who habitually uses display screen equipment as a significant part of his normal work. Any person employed by an employer, including trainees and apprentices but excluding domestic servants; Any person carrying out an occupation on board a vessel, including trainees and apprentices, but excluding port pilots and shore personnel carrying out work on board a vessel at the quayside; Any person who, in the Member State concerned, is protected as an employee under national employment law and in accordance with national practice; It also has the common meaning: A person who works at a specific occupation. Though the terms may be similar, correct information retrieval must differentiate between the intended use and irrelevant uses in order to return the correct results. Even if a system overcomes the language problems inherent in law, it must still determine the relevancy of each result. In the context of judicial decisions, this requires determining the precedential value of the case. Case decisions from senior or superior courts may be more relevant than those from lower courts, even where the lower court's decision contains more discussion of the relevant facts. The opposite may be true, however, if the senior court has only a minor discussion of the topic (for example, if it is a secondary consideration in the case). An information retrieval system must also be aware of the authority of the jurisdiction. A case from a binding authority is most likely of more value than one from a non-binding authority. Additionally, the intentions of the user may determine which cases they find valuable. For instance, where a legal professional is attempting to argue a specific interpretation of law, he might find a minor court's decision which supports his position more valuable than a senior courts position which does not. He may also value similar positions from different areas of law, different jurisdictions, or dissenting opinions. Overcoming these problems can be made more difficult because of the large number of cases available. The number of legal cases available via electronic means is constantly increasing (in 2003, US appellate courts handed down approximately 500 new cases per day), meaning that an accurate legal information retrieval system must incorporate methods of both sorting past data and managing new data. == Techniques == === Boolean searches === Boolean searches, where a user may specify terms such as use of specific words or judgments by a specific court, are the most common type of search available via legal information retrieval systems. They are widely implemented but overcome few of the problems discussed above. The recall and precision rates of these searches vary depending on the implementation and searches analyzed. One study found a basic boolean search's recall rate to be roughly 20%, and its precision rate to be roughly 79%. Another study implemented a generic search (that is, not designed for legal uses) and found a recall rate of 56% and a precision rate of 72% among legal professionals. Both numbers increased when searches were run by non-legal professionals, to a 68% recall rate and 77% precision rate. This is likely explained because of the use of complex legal terms by the legal professionals. === Manual classification === In order to overcome the limits of basic boolean searches, information systems have attempted to classify case laws and statutes into more computer friendly structures. Usually, this results in the creation of an ontology to classify the texts, based on the way a legal professional might think about them. These attempt to link texts on the basis of their type, their value, and/or their topic areas. Most major legal search providers now implement some sort of classification search, such as Westlaw's “Natural Language” or LexisNexis' Headnote searches. Additionally, both of these services allow browsing of their classifications, via Westlaw's West Key Numbers or Lexis' Headnotes. Though these two search algorithms are proprietary and secret, it is known that they employ manual classification of text (though this may be computer-assisted). These systems can help overcome the majority of problems inherent in legal information retrieval systems, in that manual classification has the greatest chances of identifying landmark cases and understanding the issues that arise in the text. In one study, ontological searching resulted in a precision rate of 82% and a recall rate of 97% among legal professionals. The legal texts included, however, were carefully controlled to just a few areas of law in a specific jurisdiction. The major drawback to this approach is the requirement of using highly skilled legal professionals and large amounts of time to classify texts. As the amount of text available continues to increase, some have stated their belief that manual classification is unsustainable. === Natural language processing === In order to reduce the reliance on legal professionals and the amount of time needed, efforts have been made to create a system to automatically classify legal text and queries. Adequate translation of both would allow accurate information retrieval without the high cost of human classification. These automatic systems generally employ Natural Language Processing (NLP) techniques that are adapted to the legal domain, and also require the creation of a legal ontology. Though multiple systems have been postulated, few have reported results. One system, “SMILE,” which attempted to automatically extract classifications from case texts, resulted in an f-measure (which is a calculation of both recall rate and precision) of under 0.3 (compared to perfect f-measure of 1.0). This is probably much lower than an acceptable rate for general usage. Despite the limited results, many theorists predict that the evolution of such systems will eventually replace manual classification systems. === Citation-Based ranking === In the mid-90s the Room 5 case law retrieval project used citation mining for summaries and ranked its search results based on citation type and count. This slightly pre-dated the PageRank algorithm at Stanford which was also a citation-based ranking. Ranking of results was based

    Read more →
  • Weight initialization

    Weight initialization

    In deep learning, weight initialization or parameter initialization describes the initial step in creating a neural network. A neural network contains trainable parameters that are modified during training: weight initialization is the pre-training step of assigning initial values to these parameters. The choice of weight initialization method affects the speed of convergence, the scale of neural activation within the network, the scale of gradient signals during backpropagation, and the quality of the final model. Proper initialization is necessary for avoiding issues such as vanishing and exploding gradients and activation function saturation. Note that even though this article is titled "weight initialization", both weights and biases are used in a neural network as trainable parameters, so this article describes how both of these are initialized. Similarly, trainable parameters in convolutional neural networks (CNNs) are called kernels and biases, and this article also describes these. == Constant initialization == We discuss the main methods of initialization in the context of a multilayer perceptron (MLP). Specific strategies for initializing other network architectures are discussed in later sections. For an MLP, there are only two kinds of trainable parameters, called weights and biases. Each layer l {\displaystyle l} contains a weight matrix W ( l ) ∈ R n l − 1 × n l {\displaystyle W^{(l)}\in \mathbb {R} ^{n_{l-1}\times n_{l}}} and a bias vector b ( l ) ∈ R n l {\displaystyle b^{(l)}\in \mathbb {R} ^{n_{l}}} , where n l {\displaystyle n_{l}} is the number of neurons in that layer. A weight initialization method is an algorithm for setting the initial values for W ( l ) , b ( l ) {\displaystyle W^{(l)},b^{(l)}} for each layer l {\displaystyle l} . The simplest form is zero initialization: W ( l ) = 0 , b ( l ) = 0 {\displaystyle W^{(l)}=0,b^{(l)}=0} Zero initialization is usually used for initializing biases, but it is not used for initializing weights, as it leads to symmetry in the network, causing all neurons to learn the same features. In this page, we assume b = 0 {\displaystyle b=0} unless otherwise stated. Recurrent neural networks typically use activation functions with bounded range, such as sigmoid and tanh, since unbounded activation may cause exploding values. (Le, Jaitly, Hinton, 2015) suggested initializing weights in the recurrent parts of the network to identity and zero bias, similar to the idea of residual connections and LSTM with no forget gate. In most cases, the biases are initialized to zero, though some situations can use a nonzero initialization. For example, in multiplicative units, such as the forget gate of LSTM, the bias can be initialized to 1 to allow good gradient signal through the gate. For neurons with ReLU activation, one can initialize the bias to a small positive value like 0.1, so that the gradient is likely nonzero at initialization, avoiding the dying ReLU problem. == Random initialization == Random initialization means sampling the weights from a normal distribution or a uniform distribution, usually independently. === LeCun initialization === LeCun initialization, popularized in (LeCun et al., 1998), is designed to preserve the variance of neural activations during the forward pass. It samples each entry in W ( l ) {\displaystyle W^{(l)}} independently from a distribution with mean 0 and variance 1 / n l − 1 {\displaystyle 1/n_{l-1}} . For example, if the distribution is a continuous uniform distribution, then the distribution is U ( ± 3 / n l − 1 ) {\displaystyle {\mathcal {U}}(\pm {\sqrt {3/n_{l-1}}})} . === Glorot initialization === Glorot initialization (or Xavier initialization) was proposed by Xavier Glorot and Yoshua Bengio. It was designed as a compromise between two goals: to preserve activation variance during the forward pass and to preserve gradient variance during the backward pass. For uniform initialization, it samples each entry in W ( l ) {\displaystyle W^{(l)}} independently and identically from U ( ± 6 / ( n l + 1 + n l − 1 ) ) {\displaystyle {\mathcal {U}}(\pm {\sqrt {6/(n_{l+1}+n_{l-1})}})} . In the context, n l − 1 {\displaystyle n_{l-1}} is also called the "fan-in", and n l + 1 {\displaystyle n_{l+1}} the "fan-out". When the fan-in and fan-out are equal, then Glorot initialization is the same as LeCun initialization. === He initialization === As Glorot initialization performs poorly for ReLU activation, He initialization (or Kaiming initialization) was proposed by Kaiming He et al. for networks with ReLU activation. It samples each entry in W ( l ) {\displaystyle W^{(l)}} from N ( 0 , 2 / n l − 1 ) {\displaystyle {\mathcal {N}}(0,2/n_{l-1})} . === Orthogonal initialization === (Saxe et al. 2013) proposed orthogonal initialization: initializing weight matrices as uniformly random (according to the Haar measure) semi-orthogonal matrices, multiplied by a factor that depends on the activation function of the layer. It was designed so that if one initializes a deep linear network this way, then its training time until convergence is independent of depth. Sampling a uniformly random semi-orthogonal matrix can be done by initializing X {\displaystyle X} by IID sampling its entries from a standard normal distribution, then calculate ( X X ⊤ ) − 1 / 2 X {\displaystyle \left(XX^{\top }\right)^{-1/2}X} or its transpose, depending on whether X {\displaystyle X} is tall or wide. For CNN kernels with odd widths and heights, orthogonal initialization is done this way: initialize the central point by a semi-orthogonal matrix, and fill the other entries with zero. As an illustration, a kernel K {\displaystyle K} of shape 3 × 3 × c × c ′ {\displaystyle 3\times 3\times c\times c'} is initialized by filling K [ 2 , 2 , : , : ] {\displaystyle K[2,2,:,:]} with the entries of a random semi-orthogonal matrix of shape c × c ′ {\displaystyle c\times c'} , and the other entries with zero. (Balduzzi et al., 2017) used it with stride 1 and zero-padding. This is sometimes called the Orthogonal Delta initialization. Related to this approach, unitary initialization proposes to parameterize the weight matrices to be unitary matrices, with the result that at initialization they are random unitary matrices (and throughout training, they remain unitary). This is found to improve long-sequence modelling in LSTM. Orthogonal initialization has been generalized to layer-sequential unit-variance (LSUV) initialization. It is a data-dependent initialization method, and can be used in convolutional neural networks. It first initializes weights of each convolution or fully connected layer with orthonormal matrices. Then, proceeding from the first to the last layer, it runs a forward pass on a random minibatch, and divides the layer's weights by the standard deviation of its output, so that its output has variance approximately 1. === Fixup initialization === In 2015, the introduction of residual connections allowed very deep neural networks to be trained, much deeper than the ~20 layers of the previous state of the art (such as the VGG-19). Residual connections gave rise to their own weight initialization problems and strategies. These are sometimes called "normalization-free" methods, since using residual connection could stabilize the training of a deep neural network so much that normalizations become unnecessary. Fixup initialization is designed specifically for networks with residual connections and without batch normalization, as follows: Initialize the classification layer and the last layer of each residual branch to 0. Initialize every other layer using a standard method (such as He initialization), and scale only the weight layers inside residual branches by L − 1 2 m − 2 {\displaystyle L^{-{\frac {1}{2m-2}}}} . Add a scalar multiplier (initialized at 1) in every branch and a scalar bias (initialized at 0) before each convolution, linear, and element-wise activation layer. Similarly, T-Fixup initialization is designed for Transformers without layer normalization. === Others === Instead of initializing all weights with random values on the order of O ( 1 / n ) {\displaystyle O(1/{\sqrt {n}})} , sparse initialization initialized only a small subset of the weights with larger random values, and the other weights zero, so that the total variance is still on the order of O ( 1 ) {\displaystyle O(1)} . Random walk initialization was designed for MLP so that during backpropagation, the L2 norm of gradient at each layer performs an unbiased random walk as one moves from the last layer to the first. Looks linear initialization was designed to allow the neural network to behave like a deep linear network at initialization, since W R e L U ( x ) − W R e L U ( − x ) = W x {\displaystyle W\;\mathrm {ReLU} (x)-W\;\mathrm {ReLU} (-x)=Wx} . It initializes a matrix W {\displaystyle W} of shape R n 2 × m {\displaystyle \mathbb {R} ^{{\frac {n}{2}}\times m}} by any method, such as orthogonal initialization, t

    Read more →
  • Eigenmoments

    Eigenmoments

    EigenMoments is a set of orthogonal, noise robust, invariant to rotation, scaling and translation and distribution sensitive moments. Their application can be found in signal processing and computer vision as descriptors of the signal or image. The descriptors can later be used for classification purposes. It is obtained by performing orthogonalization, via eigen analysis on geometric moments. == Framework summary == EigenMoments are computed by performing eigen analysis on the moment space of an image by maximizing signal-to-noise ratio in the feature space in form of Rayleigh quotient. This approach has several benefits in Image processing applications: Dependency of moments in the moment space on the distribution of the images being transformed, ensures decorrelation of the final feature space after eigen analysis on the moment space. The ability of EigenMoments to take into account distribution of the image makes it more versatile and adaptable for different genres. Generated moment kernels are orthogonal and therefore analysis on the moment space becomes easier. Transformation with orthogonal moment kernels into moment space is analogous to projection of the image onto a number of orthogonal axes. Nosiy components can be removed. This makes EigenMoments robust for classification applications. Optimal information compaction can be obtained and therefore a few number of moments are needed to characterize the images. == Problem formulation == Assume that a signal vector s ∈ R n {\displaystyle s\in {\mathcal {R}}^{n}} is taken from a certain distribution having correlation C ∈ R n × n {\displaystyle C\in {\mathcal {R}}^{n\times n}} , i.e. C = E [ s s T ] {\displaystyle C=E[ss^{T}]} where E[.] denotes expected value. Dimension of signal space, n, is often too large to be useful for practical application such as pattern classification, we need to transform the signal space into a space with lower dimensionality. This is performed by a two-step linear transformation: q = W T X T s , {\displaystyle q=W^{T}X^{T}s,} where q = [ q 1 , . . . , q n ] T ∈ R k {\displaystyle q=[q_{1},...,q_{n}]^{T}\in {\mathcal {R}}^{k}} is the transformed signal, X = [ x 1 , . . . , x n ] T ∈ R n × m {\displaystyle X=[x_{1},...,x_{n}]^{T}\in {\mathcal {R}}^{n\times m}} a fixed transformation matrix which transforms the signal into the moment space, and W = [ w 1 , . . . , w n ] T ∈ R m × k {\displaystyle W=[w_{1},...,w_{n}]^{T}\in {\mathcal {R}}^{m\times k}} the transformation matrix which we are going to determine by maximizing the SNR of the feature space resided by q {\displaystyle q} . For the case of Geometric Moments, X would be the monomials. If m = k = n {\displaystyle m=k=n} , a full rank transformation would result, however usually we have m ≤ n {\displaystyle m\leq n} and k ≤ m {\displaystyle k\leq m} . This is specially the case when n {\displaystyle n} is of high dimensions. Finding W {\displaystyle W} that maximizes the SNR of the feature space: S N R t r a n s f o r m = w T X T C X w w T X T N X w , {\displaystyle SNR_{transform}={\frac {w^{T}X^{T}CXw}{w^{T}X^{T}NXw}},} where N is the correlation matrix of the noise signal. The problem can thus be formulated as w 1 , . . . , w k = a r g m a x w w T X T C X w w T X T N X w {\displaystyle {w_{1},...,w_{k}}=argmax_{w}{\frac {w^{T}X^{T}CXw}{w^{T}X^{T}NXw}}} subject to constraints: w i T X T N X w j = δ i j , {\displaystyle w_{i}^{T}X^{T}NXw_{j}=\delta _{ij},} where δ i j {\displaystyle \delta _{ij}} is the Kronecker delta. It can be observed that this maximization is Rayleigh quotient by letting A = X T C X {\displaystyle A=X^{T}CX} and B = X T N X {\displaystyle B=X^{T}NX} and therefore can be written as: w 1 , . . . , w k = a r g m a x x w T A w w T B w {\displaystyle {w_{1},...,w_{k}}={\underset {x}{\operatorname {arg\,max} }}{\frac {w^{T}Aw}{w^{T}Bw}}} , w i T B w j = δ i j {\displaystyle w_{i}^{T}Bw_{j}=\delta _{ij}} === Rayleigh quotient === Optimization of Rayleigh quotient has the form: max w R ( w ) = max w w T A w w T B w {\displaystyle \max _{w}R(w)=\max _{w}{\frac {w^{T}Aw}{w^{T}Bw}}} and A {\displaystyle A} and B {\displaystyle B} , both are symmetric and B {\displaystyle B} is positive definite and therefore invertible. Scaling w {\displaystyle w} does not change the value of the object function and hence and additional scalar constraint w T B w = 1 {\displaystyle w^{T}Bw=1} can be imposed on w {\displaystyle w} and no solution would be lost when the objective function is optimized. This constraint optimization problem can be solved using Lagrangian multiplier: max w w T A w {\displaystyle \max _{w}{w^{T}Aw}} subject to w T B w = 1 {\displaystyle {w^{T}Bw}=1} max w L ( w ) = max w ( w T A w − λ w T B w ) {\displaystyle \max _{w}{\mathcal {L}}(w)=\max _{w}(w{T}Aw-\lambda w^{T}Bw)} equating first derivative to zero and we will have: A w = λ B w {\displaystyle Aw=\lambda Bw} which is an instance of Generalized Eigenvalue Problem (GEP). The GEP has the form: A w = λ B w {\displaystyle Aw=\lambda Bw} for any pair ( w , λ ) {\displaystyle (w,\lambda )} that is a solution to above equation, w {\displaystyle w} is called a generalized eigenvector and λ {\displaystyle \lambda } is called a generalized eigenvalue. Finding w {\displaystyle w} and λ {\displaystyle \lambda } that satisfies this equations would produce the result which optimizes Rayleigh quotient. One way of maximizing Rayleigh quotient is through solving the Generalized Eigen Problem. Dimension reduction can be performed by simply choosing the first components w i {\displaystyle w_{i}} , i = 1 , . . . , k {\displaystyle i=1,...,k} , with the highest values for R ( w ) {\displaystyle R(w)} out of the m {\displaystyle m} components, and discard the rest. Interpretation of this transformation is rotating and scaling the moment space, transforming it into a feature space with maximized SNR and therefore, the first k {\displaystyle k} components are the components with highest k {\displaystyle k} SNR values. The other method to look at this solution is to use the concept of simultaneous diagonalization instead of Generalized Eigen Problem. === Simultaneous diagonalization === Let A = X T C X {\displaystyle A=X^{T}CX} and B = X T N X {\displaystyle B=X^{T}NX} as mentioned earlier. We can write W {\displaystyle W} as two separate transformation matrices: W = W 1 W 2 . {\displaystyle W=W_{1}W_{2}.} W 1 {\displaystyle W_{1}} can be found by first diagonalize B: P T B P = D B {\displaystyle P^{T}BP=D_{B}} . Where D B {\displaystyle D_{B}} is a diagonal matrix sorted in increasing order. Since B {\displaystyle B} is positive definite, thus D B > 0 {\displaystyle D_{B}>0} . We can discard those eigenvalues that large and retain those close to 0, since this means the energy of the noise is close to 0 in this space, at this stage it is also possible to discard those eigenvectors that have large eigenvalues. Let P ^ {\displaystyle {\hat {P}}} be the first k {\displaystyle k} columns of P {\displaystyle P} , now P T ^ B P ^ = D B ^ {\displaystyle {\hat {P^{T}}}B{\hat {P}}={\hat {D_{B}}}} where D B ^ {\displaystyle {\hat {D_{B}}}} is the k × k {\displaystyle k\times k} principal submatrix of D B {\displaystyle D_{B}} . Let W 1 = P ^ D B ^ − 1 / 2 {\displaystyle W_{1}={\hat {P}}{\hat {D_{B}}}^{-1/2}} and hence: W 1 T B W 1 = ( P ^ D B ^ − 1 / 2 ) T B ( P ^ D B ^ − 1 / 2 ) = I {\displaystyle W_{1}^{T}BW_{1}=({\hat {P}}{\hat {D_{B}}}^{-1/2})^{T}B({\hat {P}}{\hat {D_{B}}}^{-1/2})=I} . W 1 {\displaystyle W_{1}} whiten B {\displaystyle B} and reduces the dimensionality from m {\displaystyle m} to k {\displaystyle k} . The transformed space resided by q ′ = W 1 T X T s {\displaystyle q'=W_{1}^{T}X^{T}s} is called the noise space. Then, we diagonalize W 1 T A W 1 {\displaystyle W_{1}^{T}AW_{1}} : W 2 T W 1 T A W 1 W 2 = D A {\displaystyle W_{2}^{T}W_{1}^{T}AW_{1}W_{2}=D_{A}} , where W 2 T W 2 = I {\displaystyle W_{2}^{T}W_{2}=I} . D A {\displaystyle D_{A}} is the matrix with eigenvalues of W 1 T A W 1 {\displaystyle W_{1}^{T}AW_{1}} on its diagonal. We may retain all the eigenvalues and their corresponding eigenvectors since most of the noise are already discarded in previous step. Finally the transformation is given by: W = W 1 W 2 {\displaystyle W=W_{1}W_{2}} where W {\displaystyle W} diagonalizes both the numerator and denominator of the SNR, W T A W = D A {\displaystyle W^{T}AW=D_{A}} , W T B W = I {\displaystyle W^{T}BW=I} and the transformation of signal s {\displaystyle s} is defined as q = W T X T s = W 2 T W 1 T X T s {\displaystyle q=W^{T}X^{T}s=W_{2}^{T}W_{1}^{T}X^{T}s} . === Information loss === To find the information loss when we discard some of the eigenvalues and eigenvectors we can perform following analysis: η = 1 − t r a c e ( W 1 T A W 1 ) t r a c e ( D B − 1 / 2 P T A P D B − 1 / 2 ) = 1 − t r a c e ( D B ^ − 1 / 2 P ^ T A P ^ D B ^ − 1 / 2 ) t r a c e ( D B − 1 / 2 P T A P D B − 1 / 2 ) {\displaystyle {\begin{array}{lll}\eta &=&

    Read more →
  • Variable data publishing

    Variable data publishing

    Variable-data publishing (VDP) (also known as database publishing) is a term referring to the output of a variable composition system. While these systems can produce both electronically viewable and hard-copy (print) output, the "variable-data publishing" term today often distinguishes output destined for electronic viewing, rather than that which is destined for hard-copy print (e.g. variable data printing). Essentially the same techniques are employed to perform variable-data publishing, as those utilized with variable data printing. The difference is in the interpretation for output. While variable-data printing may be interpreted to produce various print streams or page-description files (e.g. AFP/IPDS, PostScript, PCL), variable-data publishing produces electronically viewable files, most commonly seen in the forms of PDF, HTML, or XML. Variable-data composition involves the use of data to conditionally: exhibit text (static blocks and/or variable content) exhibit images select fonts select colors format page layouts & flows Variable-data may be as simple as an address block or salutation. However, it can be any or all of the document's textual content—including words, sentences, paragraphs, pages, or the entire document. In other words, it can make up as little or as much of the document as the composer desires. Variable data may also be used to exhibit various images, such as logos, products, or membership photos. Further, variable-data can be used to build rule-based design schemes, including fonts, colors, and page formats. The possibilities are vast. The variable-data tools available today, make it possible to perform variable-data composition at nearly every stage of document production. However, the level of control that can be achieved varies, based upon how far into the document production process a variable-data tool is deployed. For example, if variable-data insertion occurs just prior to output...it's not likely that the text flow or layout can be altered with nearly as much control as would be available at the time of initial document composition. Many organizations will produce multiple forms of output (aka: multi-channel output), for the same document. This ensures that the published content is available to recipients via any form of access method they might require. When multi-channel output is utilized, integrity between those output channels often becomes important. Variable-data publishing may be performed on everything from a personal computer to a mainframe system. However, the speed and practical output volumes which can be achieved are directly affected by the computer power utilized. == Origin of the concept == The term variable-data publishing was likely an offshoot of the term "variable-data printing", first introduced to the printing industry by Frank Romano, Professor Emeritus, School of Print Media, at the College of Imaging Arts and Sciences at Rochester Institute of Technology. However, the concept of merging static document elements and variable document elements predates the term and has seen various implementations ranging from simple desktop 'mail merge', to complex mainframe applications in the financial and banking industry. In the past, the term VDP has been most closely associated with digital printing machines. However, in the past 3 years the application of this technology has spread to web pages, emails, and mobile messaging.

    Read more →
  • Averbis

    Averbis

    Averbis has a focus on healthcare, pharma, automotive and intellectual property analytics. Averbis is involved in various research projects of the German Federal Ministry of Economics and Energy and the European Union such as DebugIT, EUCases, Mantra and SEMCARE. In addition to these projects, Averbis was also involved in the following projects: Greenpilot is a virtual library, which provides technical information in the fields of nutrition, environment and agriculture. Medpilot is a virtual library, which provides information about medicine and related sciences. In 2013, Averbis has been nominated for the German Founder Prize 2013. Averbis GmbH provides text analytics and text mining software to transform unstructured text into actionable information. It was founded in 2007 by IT experts after years of relevant scientific experience in the field of text mining and multilingual information retrieval. Averbis works in the field of terminology management, natural language processing, machine learning and semantic search. Its text mining software is embedded into the text mining framework UIMA.

    Read more →
  • Spell checker

    Spell checker

    In software, a spell checker (or spelling checker or spell check) is a software feature that checks for misspellings in a text. Spell-checking features are often embedded in software or services, such as a word processor, email client, electronic dictionary, or search engine. == Design == A basic spell checker carries out the following processes: It scans the text and extracts the words contained in it. It then compares each word with a known list of correctly spelled words (i.e. a dictionary). This might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes. An additional step is a language-dependent algorithm for handling morphology. Even for a lightly inflected language like English, the spell checker will need to consider different forms of the same word, such as plurals, verbal forms, contractions, and possessives. For many other languages, such as those featuring agglutination and more complex declension and conjugation, this part of the process is more complicated. It is unclear whether morphological analysis—allowing for many forms of a word depending on its grammatical role—provides a significant benefit for English, though its benefits for highly synthetic languages such as German, Hungarian, or Turkish are clear. As an adjunct to these components, the program's user interface allows users to approve or reject replacements and modify the program's operation. Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words. This approach usually requires a lot of effort to obtain sufficient statistical information. Key advantages include needing less runtime storage and the ability to correct errors in words that are not included in a dictionary. In some cases, spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the see also entries of encyclopedias. Clustering algorithms have also been used for spell checking combined with phonetic information. == History == === Pre-PC === In 1961, Les Earnest, who headed the research on this budding technology, saw it necessary to include the first spell checker that accessed a list of 10,000 acceptable words. Ralph Gorin, a graduate student under Earnest at the time, created the first true spelling checker program written as an applications program (rather than research) for general English text: SPELL for the DEC PDP-10 at Stanford University's Artificial Intelligence Laboratory, in February 1971. Gorin wrote SPELL in assembly language, for faster action; he made the first spelling corrector by searching the word list for plausible correct spellings that differ by a single letter or adjacent letter transpositions and presenting them to the user. Gorin made SPELL publicly accessible, as was done with most SAIL (Stanford Artificial Intelligence Laboratory) programs, and it soon spread around the world via the new ARPAnet, about ten years before personal computers came into general use. SPELL, its algorithms and data structures inspired the Unix ispell program. The first spell checkers were widely available on mainframe computers in the late 1970s. A group of six linguists from Georgetown University developed the first spell-check system for the IBM corporation. Henry Kučera invented one for the VAX machines of Digital Equipment Corp in 1981. === Unix === The International Ispell program commonly used in Unix is based on R. E. Gorin's SPELL. It was converted to C by Pace Willisson at MIT. The GNU project has its spell checker GNU Aspell. Aspell's main improvement is that it can more accurately suggest correct alternatives for misspelled English words. Due to the inability of traditional spell checkers to check words in complex inflected languages, Hungarian László Németh developed Hunspell, a spell checker that supports agglutinative languages and complex compound words. Hunspell also uses Unicode in its dictionaries. Hunspell replaced the previous MySpell in OpenOffice.org in version 2.0.2. Enchant is another general spell checker, derived from AbiWord. Its goal is to combine programs supporting different languages such as Aspell, Hunspell, Nuspell, Hspell (Hebrew), Voikko (Finnish), Zemberek (Turkish) and AppleSpell under one interface. === PCs === The first spell checkers for personal computers appeared in 1980, such as "WordCheck" for Commodore systems which was released in late 1980 in time for advertisements to go to print in January 1981. Developers such as Maria Mariani and Random House rushed OEM packages or end-user products into the rapidly expanding software market. On the pre-Windows PCs, these spell checkers were standalone programs, many of which could be run in terminate-and-stay-resident mode from within word-processing packages on PCs with sufficient memory. However, the market for standalone packages was short-lived, as by the mid-1980s developers of popular word-processing packages like WordStar and WordPerfect had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just English to many European and eventually even Asian languages. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-agglutinative languages like Hungarian and Finnish. Although the size of the word-processing market in a country like Iceland might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many national markets as possible as part of their global marketing strategy. When Apple developed "a system-wide spelling checker" for Mac OS X so that "the operating system took over spelling fixes," it was a first: one "didn't have to maintain a separate spelling checker for each" program. Mac OS X's spellcheck coverage includes virtually all bundled and third party applications. Visual Tools' VT Speller, introduced in 1994, was "designed for developers of applications that support Windows." It came with a dictionary but had the ability to build and incorporate use of secondary dictionaries. === Browsers === Web browsers such as Firefox and Google Chrome offer spell checking support, using Hunspell. Prior to using Hunspell, Firefox and Chrome used MySpell and GNU Aspell, respectively. === Specialties === Some spell checkers have separate support for medical dictionaries to help prevent medical errors. == Functionality == The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for typos but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms. It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine on the basis of corpus linguistics that the word baht is more frequently a misspelling of bath or bat than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced than if the spelling errors of the many more people who discuss baths were overlooked. The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as Oracle's short-lived CoAuthor and allowed a user to view the results after a document was processed and correct only the words that were known to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and Microsoft Word since Word 95. Spell checkers became increasingly sophisticated; now capable of recognizing grammatical errors. However, even at their best, they rarely catch all the errors in a text (such as homophone errors) and will flag neologisms and foreign words as misspellings. Nonetheless, spell checkers can be considered as a type of foreign language writing aid that non-native language lea

    Read more →
  • Ernie Bot

    Ernie Bot

    Ernie Bot (Chinese: 文心一言, Pinyin: wénxīn yīyán), full name Enhanced Representation through Knowledge Integration, is an artificial intelligence chatbot developed by the Chinese technology company Baidu. Ernie Bot rivals GPT models in Chinese NLP tasks. It is built on the company's ERNIE series of large language models, which have been in development since 2019. The service was first launched for invited testing on March 16, 2023, and was released to the general public on August 31, 2023, after receiving approval from Chinese regulators. Since its public launch, Ernie Bot has undergone several updates, with newer versions like ERNIE 4.0 and 4.5 released to improve its capabilities. The service has seen rapid user adoption, reportedly reaching over 200 million users by April 2024. It has been integrated into various products, notably powering AI features for the Chinese release of Samsung's Galaxy S24 smartphones. As a product operating in China, Ernie Bot is subject to the country's censorship regulations. It has been observed to refuse answers to politically sensitive questions, such as those regarding CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, and other topics deemed taboo by the government. == History == Ernie Bot was initially released for invited testing on March 16, 2023. The live release demo was reported to have been prerecorded, which caused Baidu's stock to drop 10 percent on the day of the launch. The company's stock gained 14 percent the following day after analysts from Citigroup and Bank of America tested Ernie Bot and gave it positive preliminary reviews. On August 31, 2023, Ernie Bot was released to the public after receiving approval from Chinese regulatory authorities. By December 2023, Baidu announced the service had surpassed 100 million users. In January 2024, Hong Kong newspaper South China Morning Post reported that a university research lab linked to the People's Liberation Army (PLA) had tested Ernie Bot for military response scenarios. Baidu denied the allegations, stating it had no connection with the academic paper. That same month, Ernie was integrated into Samsung's Galaxy S24 lineup for its launch in China. The user base reportedly grew to 200 million by April 2024 and 300 million by June 2024. In September 2024, Baidu changed the chatbot's Chinese name from "Wenxin Yiyan" (文心一言) to "Wenxiaoyan" (文小言) to position it as a search assistant. On March 16, 2025, Baidu announced version 4.5 and the reasoning model ERNIE X1. The following month, at the Create2025 Baidu AI Developer Conference, the company released the Wenxin 4.5 Turbo and Wenxin X1 Turbo models, designed to be faster and less expensive to operate. == Development == Ernie Bot is based on Baidu's ERNIE (Enhanced Representation through Knowledge Integration) series of foundation models. The general training process begins with pre-training on large datasets, followed by refinement using techniques like supervised fine-tuning, reinforcement learning with human feedback, and prompt engineering. === Foundation models === ==== Ernie 3.0 ==== The model powering the initial launch of Ernie Bot. It was trained with 10 billion parameters on a 4-terabyte corpus consisting of plain text and a large-scale knowledge graph. ==== Ernie 3.5 ==== Released in June 2023. At the time of release, its performance was reported as "slightly inferior" to OpenAI's GPT-4. ==== Ernie 4.0 ==== Unveiled in October 2023 and released to paying subscribers in November. According to Baidu, this version featured improved performance over its predecessor, with information updated to April 2023. ==== Ernie X1 ==== Announced in March 2025, with Ernie X1 positioned as a specialized reasoning model. Baidu stated that performance improvements were achieved through new technologies such as "FlashMask" dynamic attention masking and a heterogeneous multimodal mixture-of-experts architecture. === Turbo Models === In June 2024, Baidu announced Ernie 4.0 Turbo. In April 2025, Ernie 4.5 Turbo and X1 Turbo were released. These models are optimized for faster response times and lower operational costs. == Service == In its subscription options, the professional plan gives users access to Ernie 4.0 with a payment either for a month or with reduced payment for auto-renewal per month. Meanwhile, Ernie 3.5 is free of charge. Ernie 4.0, the language model for Ernie bot, has information updated to April 2023. == Censorship == Ernie Bot is subject to the Chinese government's censorship regime. In public tests with journalists, Ernie Bot refused to answer questions about CCP general secretary Xi Jinping, the 1989 Tiananmen Square protests and massacre, the persecution of Uyghurs in China in Xinjiang, and the 2019–2020 Hong Kong protests. When queried about the origin of SARS-CoV-2, Ernie Bot stated that it originated among American vape users.

    Read more →
  • Circle Hough Transform

    Circle Hough Transform

    The circle Hough Transform (CHT) is a basic feature extraction technique used in digital image processing for detecting circles in imperfect images. The circle candidates are produced by “voting” in the Hough parameter space and then selecting local maxima in an accumulator matrix. It is a specialization of the Hough transform. == Theory == In a two-dimensional space, a circle can be described by: ( x − a ) 2 + ( y − b ) 2 = r 2 ( 1 ) {\displaystyle \left(x-a\right)^{2}+\left(y-b\right)^{2}=r^{2}\ \ \ \ \ (1)} where (a,b) is the center of the circle, and r is the radius. If a 2D point (x,y) is fixed, then the parameters can be found according to (1). The parameter space would be three dimensional, (a, b, r). And all the parameters that satisfy (x, y) would lie on the surface of an inverted right-angled cone whose apex is at (x, y, 0). In the 3D space, the circle parameters can be identified by the intersection of many conic surfaces that are defined by points on the 2D circle. This process can be divided into two stages. The first stage is fixing radius then find the optimal center of circles in a 2D parameter space. The second stage is to find the optimal radius in a one dimensional parameter space. === Find parameters with known radius R === If the radius is fixed, then the parameter space would be reduced to 2D (the position of the circle center). For each point (x, y) on the original circle, it can define a circle centered at (x, y) with radius R according to (1). The intersection point of all such circles in the parameter space would be corresponding to the center point of the original circle. Consider 4 points on a circle in the original image (left). The circle Hough transform is shown in the right. Note that the radius is assumed to be known. For each (x,y) of the four points (white points) in the original image, it can define a circle in the Hough parameter space centered at (x, y) with radius r. An accumulator matrix is used for tracking the intersection point. In the parameter space, the voting number of those points that have a newly defined circle passing through them would be increased by one for every circle. Then the local maxima point (the red point in the center in the right figure) can be found. The position (a, b) of the maxima would be the center of the original circle. === Multiple circles with known radius R === Multiple circles with same radius can be found with the same technique. Note that, in the accumulator matrix (right fig), there would be at least 3 local maxima points. === Accumulator matrix and voting === In practice, an accumulator matrix is introduced to find the intersection point in the parameter space. First, we need to divide the parameter space into “buckets” using a grid and produce an accumulator matrix according to the grid. The element in the accumulator matrix denotes the number of “circles” in the parameter space that are passing through the corresponding grid cell in the parameter space. The number is also called “voting number”. Initially, every element in the matrix is zeros. Then for each “edge” point in the original space, we can formulate a circle in the parameter space and increase the voting number of the grid cell which the circle passes through. This process is called “voting”. After voting, we can find local maxima in the accumulator matrix. The positions of the local maxima are corresponding to the circle centers in the original space. === Find circle parameter with unknown radius === Since the parameter space is 3D, the accumulator matrix would be 3D, too. We can iterate through possible radii; for each radius, we use the previous technique. Finally, find the local maxima in the 3D accumulator matrix. Accumulator array should be A[x,y,r] in the 3D space. Voting should be for each pixels, radius and theta A[x,y,r] += 1 The algorithm : For each A[a,b,r] = 0; Process the filtering algorithm on image Gaussian Blurring, convert the image to grayscale ( grayScaling), make Canny operator, The Canny operator gives the edges on image. Vote on all possible circles in accumulator. The local maximum voted circles of Accumulator A gives the circle Hough space. The maximum voted circle of Accumulator gives the circle. The Incrementing for Best Candidate : For each A[a,b,r] = 0; // fill with zeroes initially, instantiate 3D matrix For each cell(x,y) For each theta t = 0 to 360 // the possible theta 0 to 360 b = y – r sin(t PI / 180); //polar coordinate for center (convert to radians) a = x – r cos(t PI / 180); //polar coordinate for center (convert to radians) A[a,b,r] +=1; //voting end end == Examples == === Find circles in a shoe-print === The original picture (right) is first turned into a binary image (left) using a threshold and Gaussian filter. Then edges (mid) are found from it using canny edge detection. After this, all the edge points are used by the Circle Hough Transform to find underlying circle structure. == Limitations == Since the parameter space of the CHT is three dimensional, it may require lots of storage and computation. Choosing a bigger grid size can ameliorate this problem. However, choosing an appropriate grid size is difficult. Since too coarse a grid can lead to large values of the vote being obtained falsely because many quite different structures correspond to a single bucket. Too fine a grid can lead to structures not being found because votes resulting from tokens that are not exactly aligned end up in different buckets, and no bucket has a large vote. Also, the CHT is not very robust to noise. == Extensions == === Adaptive Hough Transform === J. Illingworth and J. Kittler introduced this method for implementing Hough Transform efficiently. The AHT uses a small accumulator array and the idea of a flexible iterative "coarse to fine" accumulation and search strategy to identify significant peaks in the Hough parameter spaces. This method is substantially superior to the standard Hough Transform implementation in both storage and computational requirements. == Application == === People Counting === Since the head would be similar to a circle in an image, CHT can be used for detecting heads in a picture, so as to count the number of persons in the image. === Brain Aneurysm Detection === Modified Hough Circle Transform (MHCT) is used on the image extracted from Digital Subtraction Angiogram (DSA) to detect and classify aneurysms type. == Implementation code == Circle Detection via Standard Hough Transform, by Amin Sarafraz, Mathworks (File Exchange) Hough Circle Transform, OpenCV-Python Tutorials (archived version on archive.org)

    Read more →
  • Photometric stereo

    Photometric stereo

    Photometric stereo is a technique in computer vision for estimating the surface normals of objects by observing that object under different lighting conditions (photometry). It is based on the fact that the amount of light reflected by a surface is dependent on the orientation of the surface in relation to the light source and the observer. By measuring the amount of light reflected into a camera, the space of possible surface orientations is limited. Given enough light sources from different angles, the surface orientation may be constrained to a single orientation or even overconstrained. The technique was originally introduced by Woodham in 1980. The special case where the data is a single image is known as shape from shading, and was analyzed by B. K. P. Horn in 1989. Photometric stereo has since been generalized to many other situations, including extended light sources and non-Lambertian surface finishes. Current research aims to make the method work in the presence of projected shadows, highlights, and non-uniform lighting. Photometric stereo is widely used in various fields, including archaeology, cultural heritage conservation, and quality control. It is now integrated into widely used open-source software, such as Meshroom. == Basic method == Under Woodham's original assumptions — Lambertian reflectance, known point-like distant light sources, and uniform albedo — the problem can be solved by inverting the linear equation I = L ⋅ n {\displaystyle I=L\cdot n} , where I {\displaystyle I} is a (known) vector of m {\displaystyle m} observed intensities, n {\displaystyle n} is the (unknown) surface normal, and L {\displaystyle L} is a (known) 3 × m {\displaystyle 3\times m} matrix of normalized light directions. This model can easily be extended to surfaces with non-uniform albedo, while keeping the problem linear. Taking an albedo reflectivity of k {\displaystyle k} , the formula for the reflected light intensity becomes I = k ( L ⋅ n ) . {\displaystyle I=k(L\cdot n).} If L {\displaystyle L} is square (there are exactly 3 lights) and non-singular, it can be inverted, giving L − 1 I = k n . {\displaystyle L^{-1}I=kn.} Since the normal vector is known to have length 1, k {\displaystyle k} must be the length of the vector k n {\displaystyle kn} , and n {\displaystyle n} is the normalised direction of that vector. If L {\displaystyle L} is not square (there are more than 3 lights), a generalisation of the inverse can be obtained using the Moore–Penrose pseudoinverse, by simply multiplying both sides with L T {\displaystyle L^{T}} , giving L T I = L T k ( L ⋅ n ) , {\displaystyle L^{T}I=L^{T}k(L\cdot n),} ( L T L ) − 1 L T I = k n , {\displaystyle (L^{T}L)^{-1}L^{T}I=kn,} after which the normal vector and albedo can be solved as described above. == Non-Lambertian surfaces == The classical photometric stereo problem concerns itself only with Lambertian surfaces, with perfectly diffuse reflection. This is unrealistic for many types of materials, especially metals, glass and smooth plastics, and will lead to aberrations in the resulting normal vectors. Many methods have been developed to lift this assumption. In this section, a few of these are listed. === Specular reflections === Historically, in computer graphics, the commonly used model to render surfaces started with Lambertian surfaces and progressed first to include simple specular reflections. Computer vision followed a similar course with photometric stereo. Specular reflections were among the first deviations from the Lambertian model. These are a few adaptations that have been developed. Many techniques ultimately rely on modelling the reflectance function of the surface, that is, how much light is reflected in each direction. This reflectance function has to be invertible. The reflected light intensities towards the camera is measured, and the inverse reflectance function is fit onto the measured intensities, resulting in a unique solution for the normal vector. === General BRDFs and beyond === According to the Bidirectional reflectance distribution function (BRDF) model, a surface may distribute the amount of light it receives in any outward direction. This is the most general known model for opaque surfaces. Some techniques have been developed to model (almost) general BRDFs. In practice, all of these require many light sources to obtain reliable data. These are methods in which surfaces with general BRDFs can be measured. Determine the explicit BRDF prior to scanning. To do this, a different surface is required that has the same or a very similar BRDF, of which the actual geometry (or at least the normal vectors for many points on the surface) is already known. The lights are then individually shone upon the known surface, and the amount of reflection into the camera is measured. Using this information, a look-up table can be created that maps reflected intensities for each light source to a list of possible normal vectors. This puts constraints on the possible normal vectors the surface may have, and reduces the photometric stereo problem to an interpolation between measurements. Typical known surfaces to calibrate the look-up table with are spheres for their wide variety of surface orientations. Restricting the BRDF to be symmetrical. If the BRDF is symmetrical, the direction of the light can be restricted to a cone about the direction to the camera. Which cone this is depends on the BRDF itself, the normal vector of the surface, and the measured intensity. Given enough measured intensities and the resulting light directions, these cones can be approximated and therefore the normal vectors of the surface. Some progress has been made towards modelling an even more general surfaces, such as Spatially Varying Bidirectional Distribution Functions (SVBRDF), Bidirectional surface scattering reflectance distribution functions (BSSRDF), and accounting for interreflections. However, such methods are still fairly restrictive in photometric stereo. Better results have been achieved with structured light. == Uncalibrated photometric stereo == Uncalibrated Photometric Stereo is an approach in photometric stereo that aims to reconstruct the 3D shape of an object from images captured under unknown lighting conditions. Unlike classical methods, which often assume controlled or known lighting setups, this approach removes these constraints, making it adaptable to diverse and real-world environments. The advent of deep learning has revolutionized universal PS by replacing handcrafted assumptions with data-driven models. Recent approaches leverage Transformer-based architectures and multi-scale encoder–decoder networks to directly estimate surface normals from input images. Uncalibrated Photometric Stereo is inherently an ill-posed problem, as it attempts to recover 3D shape and lighting conditions simultaneously from images alone. This leads to fundamental ambiguities in the reconstruction process, which manifest as systematic errors in the recovered geometry, including global distortions in the object's overall shape, and misinterpretation of surface orientation, where concave regions may appear convex and vice versa. To address the challenges of uncalibrated photometric stereo, hybrid methods have emerged that combine multi-view stereo and photometric stereo. These approaches leverage the strengths of both techniques, including geometric reliability and resolution.

    Read more →
  • LaMDA

    LaMDA

    LaMDA (Language Model for Dialogue Applications) is a family of conversational large language models developed by Google. Originally developed and introduced as Meena in 2020, the first-generation LaMDA was announced during the 2021 Google I/O keynote, while the second generation was announced the following year. In June 2022, LaMDA gained widespread attention when Google engineer Blake Lemoine made claims that the chatbot had become sentient. The scientific community has largely rejected Lemoine's claims, though it has led to conversations about the efficacy of the Turing test, which measures whether a computer can pass for a human. In February 2023, Google announced Gemini (then Bard), a conversational artificial intelligence chatbot powered by LaMDA, to counter the rise of OpenAI's ChatGPT. == History == === Background === On January 28, 2020, Google unveiled Meena, a neural network-powered chatbot with 2.6 billion parameters, which Google claimed to be superior to all other existing chatbots. The company previously hired computer scientist Ray Kurzweil in 2012 to develop multiple chatbots for the company, including one named Danielle. The Google Brain research team, who developed Meena, hoped to release the chatbot to the public in a limited capacity, but corporate executives refused on the grounds that Meena violated Google's "AI principles around safety and fairness". Meena was later renamed LaMDA as its data and computing power increased, and the Google Brain team again sought to deploy the software to the Google Assistant, the company's virtual assistant software, in addition to opening it up to a public demo. Both requests were once again denied by company leadership. LaMDA's two lead researchers, Daniel de Freitas and Noam Shazeer, eventually left the company in frustration. === First generation === Google announced the LaMDA conversational large language model during the Google I/O keynote on May 18, 2021, powered by artificial intelligence. The acronym stands for "Language Model for Dialogue Applications". Built on the seq2seq architecture, transformer-based neural networks developed by Google Research in 2017, LaMDA was trained on human dialogue and stories, allowing it to engage in open-ended conversations. Google states that responses generated by LaMDA have been ensured to be "sensible, interesting, and specific to the context". LaMDA has access to multiple symbolic text processing systems, including a database, a real-time clock and calendar, a mathematical calculator, and a natural language translation system, giving it superior accuracy in tasks supported by those systems, and making it among the first dual process chatbots. LaMDA is also not stateless because its "sensibleness" metric is fine-tuned by "pre-conditioning" each dialog turn by prepending many of the most recent dialog interactions, on a user-by-user basis. LaMDA is tuned on nine unique performance metrics: sensibleness, specificity, interestingness, safety, groundedness, informativeness, citation accuracy, helpfulness, and role consistency. Tests by Google indicated that LaMDA surpassed human responses in the area of interestingness. The pre-training dataset consists of 2.97B documents, 1.12B dialogs, and 13.39B utterances, for a total of 1.56T words. The largest LaMDA model has 137B non-embedding parameters. === Second generation === On May 11, 2022, Google unveiled LaMDA 2, the successor to LaMDA, during the 2022 Google I/O keynote. The new incarnation of the model draws examples of text from numerous sources, using it to formulate unique "natural conversations" on topics that it may not have been trained to respond to. === Sentience claims === On June 11, 2022, The Washington Post reported that Google engineer Blake Lemoine had been placed on paid administrative leave after Lemoine told company executives Blaise Agüera y Arcas and Jen Gennai that LaMDA had become sentient. Lemoine came to this conclusion after the chatbot made questionable responses to questions regarding self-identity, moral values, religion, and Isaac Asimov's Three Laws of Robotics. Google refuted these claims, insisting that there was substantial evidence to indicate that LaMDA was not sentient. In an interview with Wired, Lemoine reiterated his claims that LaMDA was "a person" as dictated by the Thirteenth Amendment to the U.S. Constitution, comparing it to an "alien intelligence of terrestrial origin". He further revealed that he had been dismissed by Google after he hired an attorney on LaMDA's behalf after the chatbot requested that Lemoine do so. On July 22, Google fired Lemoine, asserting that Blake had violated their policies "to safeguard product information" and rejected his claims as "wholly unfounded". Internal controversy instigated by the incident prompted Google executives to decide against releasing LaMDA to the public, which it had previously been considering. Lemoine's claims were widely pushed back by the scientific community. Many experts rejected the idea that LaMDA was sentient, including former New York University psychology professor Gary Marcus, David Pfau of Google sister company DeepMind, Erik Brynjolfsson of the Institute for Human-Centered Artificial Intelligence at Stanford University, and University of Surrey professor Adrian Hilton. Yann LeCun, who leads Meta Platforms' AI research team, stated that neural networks such as LaMDA were "not powerful enough to attain true intelligence". University of California, Santa Cruz professor Max Kreminski noted that LaMDA's architecture did not "support some key capabilities of human-like consciousness" and that its neural network weights were "frozen", assuming it was a typical large language model. Philosopher Nick Bostrom noted, however, that the lack of precise and consensual criteria for determining whether a system is conscious warrants some uncertainty. IBM Watson lead developer David Ferrucci compared how LaMDA appeared to be human in the same way Watson did when it was first introduced. Former Google AI ethicist Timnit Gebru called Lemoine a victim of a "hype cycle" initiated by researchers and the media. Lemoine's claims have also generated discussion on whether the Turing test remained useful to determine researchers' progress toward achieving artificial general intelligence, with Will Omerus of the Post opining that the test actually measured whether machine intelligence systems were capable of deceiving humans, while Brian Christian of The Atlantic said that the controversy was an instance of the ELIZA effect. == Products == === AI Test Kitchen === With the unveiling of LaMDA 2 in May 2022, Google also launched the AI Test Kitchen, a mobile application for the Android operating system powered by LaMDA capable of providing lists of suggestions on-demand based on a complex goal. Originally open only to Google employees, the app was set to be made available to "select academics, researchers, and policymakers" by invitation sometime in the year. In August, the company began allowing users in the U.S. to sign up for early access. In November, Google released a "season 2" update to the app, integrating a limited form of Google Brain's Imagen text-to-image model. A third iteration of the AI Test Kitchen was in development by January 2023, expected to launch at I/O later that year. Following the 2023 I/O keynote in May, Google added MusicLM, an AI-powered music generator first previewed in January, to the AI Test Kitchen app. In August, the app was delisted from Google Play and the Apple App Store, instead moving completely online. === Bard === On February 6, 2023, Google announced Bard, a conversational AI chatbot powered by LaMDA, in response to the unexpected popularity of OpenAI's ChatGPT chatbot. Google positions the chatbot as a "collaborative AI service" rather than a search engine. Bard became available for early access on March 21. === Other products === In addition to Bard, Pichai also unveiled the company's Generative Language API, an application programming interface also based on LaMDA, which he announced would be opened up to third-party developers in March 2023. == Architecture == LaMDA is a decoder-only Transformer language model. It is pre-trained on a text corpus that includes both documents and dialogs consisting of 1.56 trillion words, and is then trained with fine-tuning data generated by manually annotated responses for "sensibleness, interestingness, and safety". LaMDA was retrieval-augmented to improve the accuracy of facts provided to the user. Three different models were tested, with the largest having 137 billion non-embedding parameters:

    Read more →
  • Brave Leo

    Brave Leo

    Brave Leo is a large language model-based chatbot developed by Brave Software and included with the Brave browser. == History == In November 2023, the company said versions for iOS and Android would be available "in the coming months". == Features == Since January 2024, Leo has used the open-source Mixtral 8x7B from Mistral AI as its default large language model, in addition to LLaMA 2 from Meta Platforms and Claude from Anthropic, both of which have been used previously. Leo can suggest follow-up questions, and summarize webpages, PDFs, and videos. Leo has a $15 (US) per month premium version that enables more requests and uses larger LLMs. == Privacy == The answers given by Leo are not saved. Brave uses the slogan Love Privacy to emphasize its focus on user privacy and data protection. The phrase has been featured in Brave's official marketing campaigns and has been cited in media coverage of the browser's privacy-first approach. == Controversies == In 2023, PC World reported that Leo evades questions about US elections.

    Read more →
  • Content Security Policy

    Content Security Policy

    Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features. == Status == The standard, originally named Content Restrictions, was proposed by Robert Hansen in 2004, first implemented in Firefox 4 and quickly picked up by other browsers. Version 1 of the standard was published in 2012 as W3C candidate recommendation and quickly with further versions (Level 2) published in 2014. As of 2023, the draft of Level 3 is being developed with the new features being quickly adopted by the web browsers. The following header names are in use as part of experimental CSP implementations: Content-Security-Policy – standard header name proposed by the W3C document. Google Chrome supports this as of version 25. Firefox supports this as of version 23, released on 6 August 2013. WebKit supports this as of version 528 (nightly build). Chromium-based Microsoft Edge support is similar to Chrome's. X-WebKit-CSP – deprecated, experimental header introduced into Google Chrome, Safari and other WebKit-based web browsers in 2011. X-Content-Security-Policy – deprecated, experimental header introduced in Gecko 2 based browsers (Firefox 4 to Firefox 22, Thunderbird 3.3, SeaMonkey 2.1). A website can declare multiple CSP headers, also mixing enforcement and report-only ones. Each header will be processed separately by the browser. CSP can also be delivered within the HTML code using a meta tag, although in this case its effectiveness will be limited. Internet Explorer 10 and Internet Explorer 11 also support CSP, but only sandbox directive, using the experimental X-Content-Security-Policy header. A number of web application frameworks support CSP, for example AngularJS (natively) and Django (middleware). Instructions for Ruby on Rails have been posted by GitHub. Web framework support is however only required if the CSP contents somehow depend on the web application's state—such as usage of the nonce origin. Otherwise, the CSP is rather static and can be delivered from web application tiers above the application, for example on load balancer or web server. === Bypasses === In December 2015 and December 2016, a few methods of bypassing 'nonce' allowlisting origins were published. In January 2016, another method was published, which leverages server-wide CSP allowlisting to exploit old and vulnerable versions of JavaScript libraries hosted at the same server (frequent case with CDN servers). In May 2017 one more method was published to bypass CSP using web application frameworks code. == Mode of operation == If the Content-Security-Policy header is present in the server response, a compliant client enforces the declarative allowlist policy. One example goal of a policy is a stricter execution mode for JavaScript in order to prevent certain cross-site scripting attacks. In practice this means that a number of features are disabled by default: Inline JavaScript code