AI App Builder Free Unlimited

AI App Builder Free Unlimited — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Ware report

    Ware report

    Security Controls for Computer Systems, commonly called the Ware report, is a 1970 text by Willis Ware that was foundational in the field of computer security. == Development == A defense contractor in St. Louis, Missouri, had bought an IBM mainframe computer, which it was using for classified work on a fighter aircraft. To provide additional income, the contractor asked the Department of Defense (DoD) for permission to sell computer time on the mainframe to local businesses via remote terminals, while the classified work continued. At the time, the DoD did not have a policy to cover this. The DoD's Advanced Research Projects Agency (DARPA) asked Ware - a RAND employee - to chair a committee to examine and report on the feasibility of security controls for computer systems. The committee's report was a classified document given in January 1970 to the Defense Science Board (DSB), which had taken over the project from ARPA. After declassification, the report was published by RAND in October 1979. == Influence == The IEEE Computer Society said the report was widely circulated, and the IEEE Annals of the History of Computing said that it, together with Ware's 1967 Spring Joint Computer Conference session, marked the start of the field of computer security. The report influenced security certification standards and processes, especially in the banking and defense industries, where the report was instrumental in creating the Orange Book.

    Read more →
  • Constellation model

    Constellation model

    The constellation model is a probabilistic, generative model for category-level object recognition in computer vision. Like other part-based models, the constellation model attempts to represent an object class by a set of N parts under mutual geometric constraints. Because it considers the geometric relationship between different parts, the constellation model differs significantly from appearance-only, or "bag-of-words" representation models, which explicitly disregard the location of image features. The problem of defining a generative model for object recognition is difficult. The task becomes significantly complicated by factors such as background clutter, occlusion, and variations in viewpoint, illumination, and scale. Ideally, we would like the particular representation we choose to be robust to as many of these factors as possible. In category-level recognition, the problem is even more challenging because of the fundamental problem of intra-class variation. Even if two objects belong to the same visual category, their appearances may be significantly different. However, for structured objects such as cars, bicycles, and people, separate instances of objects from the same category are subject to similar geometric constraints. For this reason, particular parts of an object such as the headlights or tires of a car still have consistent appearances and relative positions. The Constellation Model takes advantage of this fact by explicitly modeling the relative location, relative scale, and appearance of these parts for a particular object category. Model parameters are estimated using an unsupervised learning algorithm, meaning that the visual concept of an object class can be extracted from an unlabeled set of training images, even if that set contains "junk" images or instances of objects from multiple categories. It can also account for the absence of model parts due to appearance variability, occlusion, clutter, or detector error. == History == The idea for a "parts and structure" model was originally introduced by Fischler and Elschlager in 1973. This model has since been built upon and extended in many directions. The Constellation Model, as introduced by Dr. Perona and his colleagues, was a probabilistic adaptation of this approach. In the late '90s, Burl et al. revisited the Fischler and Elschlager model for the purpose of face recognition. In their work, Burl et al. used manual selection of constellation parts in training images to construct a statistical model for a set of detectors and the relative locations at which they should be applied. In 2000, Weber et al. made the significant step of training the model using a more unsupervised learning process, which precluded the necessity for tedious hand-labeling of parts. Their algorithm was particularly remarkable because it performed well even on cluttered and occluded image data. Fergus et al. then improved upon this model by making the learning step fully unsupervised, having both shape and appearance learned simultaneously, and accounting explicitly for the relative scale of parts. == The method of Weber and Welling et al. == In the first step, a standard interest point detection method, such as Harris corner detection, is used to generate interest points. Image features generated from the vicinity of these points are then clustered using k-means or another appropriate algorithm. In this process of vector quantization, one can think of the centroids of these clusters as being representative of the appearance of distinctive object parts. Appropriate feature detectors are then trained using these clusters, which can be used to obtain a set of candidate parts from images. As a result of this process, each image can now be represented as a set of parts. Each part has a type, corresponding to one of the aforementioned appearance clusters, as well as a location in the image space. === Basic generative model === Weber & Welling here introduce the concept of foreground and background. Foreground parts correspond to an instance of a target object class, whereas background parts correspond to background clutter or false detections. Let T be the number of different types of parts. The positions of all parts extracted from an image can then be represented in the following "matrix," X o = ( x 11 , x 12 , ⋯ , x 1 N 1 x 21 , x 22 , ⋯ , x 2 N 2 ⋮ x T 1 , x T 2 , ⋯ , x T N T ) {\displaystyle X^{o}={\begin{pmatrix}x_{11},x_{12},{\cdots },x_{1N_{1}}\\x_{21},x_{22},{\cdots },x_{2N_{2}}\\\vdots \\x_{T1},x_{T2},{\cdots },x_{TN_{T}}\end{pmatrix}}} where N i {\displaystyle N_{i}\,} represents the number of parts of type i ∈ { 1 , … , T } {\displaystyle i\in \{1,\dots ,T\}} observed in the image. The superscript o indicates that these positions are observable, as opposed to missing. The positions of unobserved object parts can be represented by the vector x m {\displaystyle x^{m}\,} . Suppose that the object will be composed of F {\displaystyle F\,} distinct foreground parts. For notational simplicity, we assume here that F = T {\displaystyle F=T\,} , though the model can be generalized to F > T {\displaystyle F>T\,} . A hypothesis h {\displaystyle h\,} is then defined as a set of indices, with h i = j {\displaystyle h_{i}=j\,} , indicating that point x i j {\displaystyle x_{ij}\,} is a foreground point in X o {\displaystyle X^{o}\,} . The generative probabilistic model is defined through the joint probability density p ( X o , x m , h ) {\displaystyle p(X^{o},x^{m},h)\,} . === Model details === The rest of this section summarizes the details of Weber & Welling's model for a single component model. The formulas for multiple component models are extensions of those described here. To parametrize the joint probability density, Weber & Welling introduce the auxiliary variables b {\displaystyle b\,} and n {\displaystyle n\,} , where b {\displaystyle b\,} is a binary vector encoding the presence/absence of parts in detection ( b i = 1 {\displaystyle b_{i}=1\,} if h i > 0 {\displaystyle h_{i}>0\,} , otherwise b i = 0 {\displaystyle b_{i}=0\,} ), and n {\displaystyle n\,} is a vector where n i {\displaystyle n_{i}\,} denotes the number of background candidates included in the i t h {\displaystyle i^{th}} row of X o {\displaystyle X^{o}\,} . Since b {\displaystyle b\,} and n {\displaystyle n\,} are completely determined by h {\displaystyle h\,} and the size of X o {\displaystyle X^{o}\,} , we have p ( X o , x m , h ) = p ( X o , x m , h , n , b ) {\displaystyle p(X^{o},x^{m},h)=p(X^{o},x^{m},h,n,b)\,} . By decomposition, p ( X o , x m , h , n , b ) = p ( X o , x m | h , n , b ) p ( h | n , b ) p ( n ) p ( b ) {\displaystyle p(X^{o},x^{m},h,n,b)=p(X^{o},x^{m}|h,n,b)p(h|n,b)p(n)p(b)\,} The probability density over the number of background detections can be modeled by a Poisson distribution, p ( n ) = ∏ i = 1 T 1 n i ! ( M i ) n i e − M i {\displaystyle p(n)=\prod _{i=1}^{T}{\frac {1}{n_{i}!}}(M_{i})^{n_{i}}e^{-M_{i}}} where M i {\displaystyle M_{i}\,} is the average number of background detections of type i {\displaystyle i\,} per image. Depending on the number of parts F {\displaystyle F\,} , the probability p ( b ) {\displaystyle p(b)\,} can be modeled either as an explicit table of length 2 F {\displaystyle 2^{F}\,} , or, if F {\displaystyle F\,} is large, as F {\displaystyle F\,} independent probabilities, each governing the presence of an individual part. The density p ( h | n , b ) {\displaystyle p(h|n,b)\,} is modeled by p ( h | n , b ) = { 1 ∏ f = 1 F N f b f , if h ∈ H ( b , n ) 0 , for other h {\displaystyle p(h|n,b)={\begin{cases}{\frac {1}{\textstyle \prod _{f=1}^{F}N_{f}^{b_{f}}}},&{\mbox{if }}h\in H(b,n)\\0,&{\mbox{for other }}h\end{cases}}} where H ( b , n ) {\displaystyle H(b,n)\,} denotes the set of all hypotheses consistent with b {\displaystyle b\,} and n {\displaystyle n\,} , and N f {\displaystyle N_{f}\,} denotes the total number of detections of parts of type f {\displaystyle f\,} . This expresses the fact that all consistent hypotheses, of which there are ∏ f = 1 F N f b f {\displaystyle \textstyle \prod _{f=1}^{F}N_{f}^{b_{f}}} , are equally likely in the absence of information on part locations. And finally, p ( X o , x m | h , n ) = p f g ( z ) p b g ( x b g ) {\displaystyle p(X^{o},x^{m}|h,n)=p_{fg}(z)p_{bg}(x_{bg})\,} where z = ( x o x m ) {\displaystyle z=(x^{o}x^{m})\,} are the coordinates of all foreground detections, observed and missing, and x b g {\displaystyle x_{bg}\,} represents the coordinates of the background detections. Note that foreground detections are assumed to be independent of the background. p f g ( z ) {\displaystyle p_{fg}(z)\,} is modeled as a joint Gaussian with mean μ {\displaystyle \mu \,} and covariance Σ {\displaystyle \Sigma \,} . === Classification === The ultimate objective of this model is to classify images into classes "object present" (class C 1 {\displaystyle C_{1}\,} ) and "object absent" (class C 0 {\displaystyle C_{0}\,} ) given t

    Read more →
  • Consensus clustering

    Consensus clustering

    Consensus clustering is a method of aggregating (potentially conflicting) results from multiple clustering algorithms. Also called cluster ensembles or aggregation of clustering (or partitions), it refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete, even when the number of input clusterings is three. Consensus clustering for unsupervised learning is analogous to ensemble learning in supervised learning. == Issues with existing clustering techniques == Current clustering techniques do not address all the requirements adequately. Dealing with large number of dimensions and large number of data items can be problematic because of time complexity; Effectiveness of the method depends on the definition of "distance" (for distance-based clustering) If an obvious distance measure doesn't exist, we must "define" it, which is not always easy, especially in multidimensional spaces. The result of the clustering algorithm (that, in many cases, can be arbitrary itself) can be interpreted in different ways. == Justification for using consensus clustering == There are potential shortcomings for all existing clustering techniques. This may cause interpretation of results to become difficult, especially when there is no knowledge about the number of clusters. Clustering methods are also very sensitive to the initial clustering settings, which can cause non-significant data to be amplified in non-reiterative methods. An extremely important issue in cluster analysis is the validation of the clustering results, that is, how to gain confidence about the significance of the clusters provided by the clustering technique (cluster numbers and cluster assignments). Lacking an external objective criterion (the equivalent of a known class label in supervised analysis), this validation becomes somewhat elusive. Iterative descent clustering methods, such as the SOM and k-means clustering circumvent some of the shortcomings of hierarchical clustering by providing for univocally defined clusters and cluster boundaries. Consensus clustering provides a method that represents the consensus across multiple runs of a clustering algorithm, to determine the number of clusters in the data, and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. It can provide data for a visualization tool to inspect cluster number, membership, and boundaries. However, they lack the intuitive and visual appeal of hierarchical clustering dendrograms, and the number of clusters must be chosen a priori. == The Monti consensus clustering algorithm == The Monti consensus clustering algorithm is one of the most popular consensus clustering algorithms and is used to determine the number of clusters, K {\displaystyle K} . Given a dataset of N {\displaystyle N} total number of points to cluster, this algorithm works by resampling and clustering the data, for each K {\displaystyle K} and a N × N {\displaystyle N\times N} consensus matrix is calculated, where each element represents the fraction of times two samples clustered together. A perfectly stable matrix would consist entirely of zeros and ones, representing all sample pairs always clustering together or not together over all resampling iterations. The relative stability of the consensus matrices can be used to infer the optimal K {\displaystyle K} . More specifically, given a set of points to cluster, D = { e 1 , e 2 , . . . e N } {\displaystyle D=\{e_{1},e_{2},...e_{N}\}} , let D 1 , D 2 , . . . , D H {\displaystyle D^{1},D^{2},...,D^{H}} be the list of H {\displaystyle H} perturbed (resampled) datasets of the original dataset D {\displaystyle D} , and let M h {\displaystyle M^{h}} denote the N × N {\displaystyle N\times N} connectivity matrix resulting from applying a clustering algorithm to the dataset D h {\displaystyle D^{h}} . The entries of M h {\displaystyle M^{h}} are defined as follows: M h ( i , j ) = { 1 , if points i and j belong to the same cluster 0 , otherwise {\displaystyle M^{h}(i,j)={\begin{cases}1,&{\text{if}}{\text{ points i and j belong to the same cluster}}\\0,&{\text{otherwise}}\end{cases}}} Let I h {\displaystyle I^{h}} be the N × N {\displaystyle N\times N} identicator matrix where the ( i , j ) {\displaystyle (i,j)} -th entry is equal to 1 if points i {\displaystyle i} and j {\displaystyle j} are in the same perturbed dataset D h {\displaystyle D^{h}} , and 0 otherwise. The indicator matrix is used to keep track of which samples were selected during each resampling iteration for the normalisation step. The consensus matrix C {\displaystyle C} is defined as the normalised sum of all connectivity matrices of all the perturbed datasets and a different one is calculated for every K {\displaystyle K} . C ( i , j ) = ( ∑ h = 1 H M h ( i , j ) ∑ h = 1 H I h ( i , j ) ) {\displaystyle C(i,j)=\left({\frac {\textstyle \sum _{h=1}^{H}M^{h}(i,j)\displaystyle }{\sum _{h=1}^{H}I^{h}(i,j)}}\right)} That is the entry ( i , j ) {\displaystyle (i,j)} in the consensus matrix is the number of times points i {\displaystyle i} and j {\displaystyle j} were clustered together divided by the total number of times they were selected together. The matrix is symmetric and each element is defined within the range [ 0 , 1 ] {\displaystyle [0,1]} . A consensus matrix is calculated for each K {\displaystyle K} to be tested, and the stability of each matrix, that is how far the matrix is towards a matrix of perfect stability (just zeros and ones) is used to determine the optimal K {\displaystyle K} . One way of quantifying the stability of the K {\displaystyle K} th consensus matrix is examining its CDF curve (see below). == Over-interpretation potential of the Monti consensus clustering algorithm == Monti consensus clustering can be a powerful tool for identifying clusters, but it needs to be applied with caution as shown by Şenbabaoğlu et al. It has been shown that the Monti consensus clustering algorithm is able to claim apparent stability of chance partitioning of null datasets drawn from a unimodal distribution, and thus has the potential to lead to over-interpretation of cluster stability in a real study. If clusters are not well separated, consensus clustering could lead one to conclude apparent structure when there is none, or declare cluster stability when it is subtle. Identifying false positive clusters is a common problem throughout cluster research, and has been addressed by methods such as SigClust and the GAP-statistic. However, these methods rely on certain assumptions for the null model that may not always be appropriate. Şenbabaoğlu et al demonstrated the original delta K metric to decide K {\displaystyle K} in the Monti algorithm performed poorly, and proposed a new superior metric for measuring the stability of consensus matrices using their CDF curves. In the CDF curve of a consensus matrix, the lower left portion represents sample pairs rarely clustered together, the upper right portion represents those almost always clustered together, whereas the middle segment represent those with ambiguous assignments in different clustering runs. The proportion of ambiguous clustering (PAC) score measure quantifies this middle segment; and is defined as the fraction of sample pairs with consensus indices falling in the interval (u1, u2) ∈ [0, 1] where u1 is a value close to 0 and u2 is a value close to 1 (for instance u1=0.1 and u2=0.9). A low value of PAC indicates a flat middle segment, and a low rate of discordant assignments across permuted clustering runs. One can therefore infer the optimal number of clusters by the K {\displaystyle K} value having the lowest PAC. == Related work == Clustering ensemble (Strehl and Ghosh): They considered various formulations for the problem, most of which reduce the problem to a hyper-graph partitioning problem. In one of their formulations they considered the same graph as in the correlation clustering problem. The solution they proposed is to compute the best k-partition of the graph, which does not take into account the penalty for merging two nodes that are far apart. Clustering aggregation (Fern and Brodley): They applied the clustering aggregation idea to a collection of soft clusterings they obtained by random projections. They used an agglomerative algorithm

    Read more →
  • Mutation (evolutionary algorithm)

    Mutation (evolutionary algorithm)

    Mutation is a genetic operator used to maintain genetic diversity of the chromosomes of a population of an evolutionary algorithm (EA), including genetic algorithms in particular. It is analogous to biological mutation. The classic example of a mutation operator of a binary coded genetic algorithm (GA) involves a probability that an arbitrary bit in a genetic sequence will be flipped from its original state. A common method of implementing the mutation operator involves generating a random variable for each bit in a sequence. This random variable tells whether or not a particular bit will be flipped. This mutation procedure, based on the biological point mutation, is called single point mutation. Other types of mutation operators are commonly used for representations other than binary, such as floating-point encodings or representations for combinatorial problems. The purpose of mutation in EAs is to introduce diversity into the sampled population. Mutation operators are used in an attempt to avoid local minima by preventing the population of chromosomes from becoming too similar to each other, thus slowing or even stopping convergence to the global optimum. This reasoning also leads most EAs to avoid only taking the fittest of the population in generating the next generation, but rather selecting a random (or semi-random) set with a weighting toward those that are fitter. The following requirements apply to all mutation operators used in an EA: every point in the search space must be reachable by one or more mutations. there must be no preference for parts or directions in the search space (no drift). small mutations should be more probable than large ones. For different genome types, different mutation types are suitable. Some mutations are Gaussian, Uniform, Zigzag, Scramble, Insertion, Inversion, Swap, and so on. An overview and more operators than those presented below can be found in the introductory book by Eiben and Smith or in. == Bit string mutation == The mutation of bit strings ensue through bit flips at random positions. Example: The probability of a mutation of a bit is 1 l {\displaystyle {\frac {1}{l}}} , where l {\displaystyle l} is the length of the binary vector. Thus, a mutation rate of 1 {\displaystyle 1} per mutation and individual selected for mutation is reached. == Mutation of real numbers == Many EAs, such as the evolution strategy or the real-coded genetic algorithms, work with real numbers instead of bit strings. This is due to the good experiences that have been made with this type of coding. The value of a real-valued gene can either be changed or redetermined. A mutation that implements the latter should only ever be used in conjunction with the value-changing mutations and then only with comparatively low probability, as it can lead to large changes. In practical applications, the respective value range of the decision variables to be changed of the optimisation problem to be solved is usually limited. Accordingly, the values of the associated genes are each restricted to an interval [ x min , x max ] {\displaystyle [x_{\min },x_{\max }]} . Mutations may or may not take these restrictions into account. In the latter case, suitable post-treatment is then required as described below. === Mutation without consideration of restrictions === A real number x {\displaystyle x} can be mutated using normal distribution N ( 0 , σ ) {\displaystyle {\mathcal {N}}(0,\sigma )} by adding the generated random value to the old value of the gene, resulting in the mutated value x ′ {\displaystyle x'} : x ′ = x + N ( 0 , σ ) {\displaystyle x'=x+{\mathcal {N}}(0,\sigma )} In the case of genes with a restricted range of values, it is a good idea to choose the step size of the mutation σ {\displaystyle \sigma } so that it reasonably fits the range [ x min , x max ] {\displaystyle [x_{\min },x_{\max }]} of the gene to be changed, e.g.: σ = x max − x min 6 {\displaystyle \sigma ={\frac {x_{\text{max}}-x_{\text{min}}}{6}}} The step size can also be adjusted to the smaller permissible change range depending on the current value. In any case, however, it is likely that the new value x ′ {\displaystyle x'} of the gene will be outside the permissible range of values. Such a case must be considered a lethal mutation, since the obvious repair by using the respective violated limit as the new value of the gene would lead to a drift. This is because the limit value would then be selected with the entire probability of the values beyond the limit of the value range. The evolution strategy works with real numbers and mutation based on normal distribution. The step sizes are part of the chromosome and are subject to evolution together with the actual decision variables. === Mutation with consideration of restrictions === One possible form of changing the value of a gene while taking its value range [ x min , x max ] {\displaystyle [x_{\min },x_{\max }]} into account is the mutation relative parameter change of the evolutionary algorithm GLEAM (General Learning Evolutionary Algorithm and Method), in which, as with the mutation presented earlier, small changes are more likely than large ones. First, an equally distributed decision is made as to whether the current value x {\displaystyle x} should be increased or decreased and then the corresponding total change interval is determined. Without loss of generality, an increase is assumed for the explanation and the total change interval is then [ x , x max ] {\displaystyle [x,x_{\max }]} . It is divided into k {\displaystyle k} sub-areas of equal size with the width δ {\displaystyle \delta } , from which k {\displaystyle k} sub-change intervals of different size are formed: i {\displaystyle i} -th sub-change interval: [ x , x + δ ⋅ i ] {\displaystyle [x,x+\delta \cdot i]} with δ = ( x max − x ) k {\displaystyle \delta ={\frac {(x_{\text{max}}-x)}{k}}} and i = 1 , … , k {\displaystyle i=1,\dots ,k} Subsequently, one of the k {\displaystyle k} sub-change intervals is selected in equal distribution and a random number, also equally distributed, is drawn from it as the new value x ′ {\displaystyle x'} of the gene. The resulting summed probabilities of the sub-change intervals result in the probability distribution of the k {\displaystyle k} sub-areas shown in the adjacent figure for the exemplary case of k = 10 {\displaystyle k=10} . This is not a normal distribution as before, but this distribution also clearly favours small changes over larger ones. This mutation for larger values of k {\displaystyle k} , such as 10, is less well suited for tasks where the optimum lies on one of the value range boundaries. This can be remedied by significantly reducing k {\displaystyle k} when a gene value approaches its limits very closely. === Common properties === For both mutation operators for real-valued numbers, the probability of an increase and decrease is independent of the current value and is 50% in each case. In addition, small changes are considerably more likely than large ones. For mixed-integer optimization problems, rounding is usually used. == Mutation of permutations == Mutations of permutations are specially designed for genomes that are themselves permutations of a set. These are often used to solve combinatorial tasks. In the two mutations presented, parts of the genome are rotated or inverted. === Rotation to the right === The presentation of the procedure is illustrated by an example on the right: === Inversion === The presentation of the procedure is illustrated by an example on the right: === Variants with preference for smaller changes === The requirement raised at the beginning for mutations, according to which small changes should be more probable than large ones, is only inadequately fulfilled by the two permutation mutations presented, since the lengths of the partial lists and the number of shift positions are determined in an equally distributed manner. However, the longer the partial list and the shift, the greater the change in gene order. This can be remedied by the following modifications. The end index j {\displaystyle j} of the partial lists is determined as the distance d {\displaystyle d} to the start index i {\displaystyle i} : j = ( i + d ) mod | P 0 | {\displaystyle j=(i+d){\bmod {\left|P_{0}\right|}}} where d {\displaystyle d} is determined randomly according to one of the two procedures for the mutation of real numbers from the interval [ 0 , | P 0 | − 1 ] {\displaystyle \left[0,\left|P_{0}\right|-1\right]} and rounded. For the rotation, k {\displaystyle k} is determined similarly to the distance d {\displaystyle d} , but the value 0 {\displaystyle 0} is forbidden. For the inversion, note that i ≠ j {\displaystyle i\neq j} must hold, so for d {\displaystyle d} the value 0 {\displaystyle 0} must be excluded.

    Read more →
  • Micro stuttering

    Micro stuttering

    Micro stuttering is a visual artifact in real-time computer graphics in which the time intervals between consecutively displayed frames are uneven, even though the average frame rate reported by benchmarking software appears adequate. Tools such as 3DMark typically compute frame rates over intervals of one second or more, which can conceal momentary drops in the instantaneous frame rate that the viewer perceives as hitching or jerking of on-screen motion. At low frame rates the effect is visible as a stutter in moving images, degrading the experience in interactive applications such as video games. In severe cases a lower but more consistent frame rate can appear smoother than a higher but more erratic one. The term gained prominence in the late 2000s in discussions of multi-GPU rendering (see History), but micro stuttering also affects single-GPU systems. Common causes on modern hardware include real-time shader compilation, asset streaming from storage, VRAM exhaustion, and driver bugs. == Causes == === Shader compilation === A common cause of micro stuttering on modern PCs is real-time shader compilation. Shaders are small programs that instruct the GPU on how to render visual effects such as lighting, shadows, and reflections. On consoles, developers can pre-compile all shaders for the known, fixed hardware. On PCs, the variety of GPU architectures means shaders must often be compiled at run time, either when the game launches or during gameplay itself. When the rendering engine encounters a shader that has not yet been compiled, the CPU must finish the compilation before the GPU can draw the affected object. This causes a spike in frame time that the player perceives as a hitch. The problem has been particularly associated with games built on Unreal Engine 4 running under DirectX 12, because DX12 shifts more shader management responsibility to the application. Several techniques exist to reduce shader compilation stutter. Pipeline State Object (PSO) pre-caching records the shader permutations used at runtime so that they can be compiled in advance on subsequent launches. Asynchronous shader compilation moves the work to background CPU threads to avoid blocking the main rendering thread. Platform-level services such as Steam's shader pre-caching distribute previously compiled shaders to users with matching GPU hardware. The Steam Deck, which contains a single fixed GPU, benefits from pre-compiled shader caches because all units share the same hardware configuration. === Other causes === Micro stuttering on single-GPU systems can have several additional causes. CPU bottlenecks or scheduling interruptions from background tasks can prevent the processor from preparing frames at regular intervals. Asset streaming during gameplay (loading textures, geometry, or audio from storage) can produce hitches sometimes called traversal stutter; the use of solid-state drives and technologies such as DirectStorage has reduced but not eliminated this. VRAM exhaustion forces data to be swapped between video memory and system memory over the PCI Express bus, which is slower. Graphics driver bugs can also introduce stutter; Nvidia released hotfix driver 551.46 in February 2024 to correct intermittent micro stuttering when V-Sync was enabled. == Measurement == Micro stuttering drew attention to the limitations of average frame rate as a performance metric. In 2013, Scott Wasson at The Tech Report published a series of articles advocating frame time analysis, in which the delivery time of every individual frame is recorded and plotted rather than collapsed into a single frames-per-second figure. This approach was adopted by other hardware review publications in the following years. GPU reviews now routinely report 1% low and 0.1% low frame rates alongside the average. The 1% low is the average frame rate of the slowest 1% of frames in a sample; it serves as an indicator of worst-case smoothness. A large gap between the average and the 1% low suggests poor frame pacing. Tools for capturing per-frame timing data include FRAPS, PresentMon, OCAT, CapFrameX, and MSI Afterburner with RivaTuner Statistics Server. == Mitigation == === Frame pacing === Frame pacing is a software technique that regulates the timing of frame delivery to produce even intervals between displayed frames. Game engines, GPU drivers, and platform libraries all implement frame pacing strategies to varying degrees. On mobile platforms, Google provides the Android Frame Pacing library (Swappy) as part of the Android Game Development Kit. In December 2025, the Khronos Group published the VK_EXT_present_timing Vulkan extension, giving developers explicit control over presentation timing in a cross-platform graphics API for the first time. === Variable refresh rate === Variable refresh rate (VRR) display technologies allow a monitor's refresh rate to change to match the GPU's frame output. Implementations include Nvidia G-Sync (2013), AMD FreeSync (2015), and the VESA Adaptive-Sync standard built into DisplayPort 1.2a and later. VRR eliminates the screen tearing that results from a mismatch between frame rate and refresh rate, and avoids the frame-holding behaviour of V-Sync that can itself cause stutter. It is effective at smoothing moderate frame rate fluctuations but cannot compensate for large sudden spikes in frame time such as those caused by shader compilation or heavy asset streaming. VRR support has become standard in gaming monitors, televisions (via HDMI 2.1), and the Xbox Series X/S and PlayStation 5 consoles. === Frame generation === Beginning with DLSS 3 on the GeForce RTX 40 series in 2022, Nvidia introduced AI-based frame generation, which uses dedicated optical flow hardware and a neural network to create new frames between traditionally rendered ones. AMD followed with FSR 3 in 2023, using an algorithmic approach, and the AI-based FSR 4 for the Radeon RX 9000 series in 2025. DLSS 4, released in January 2025 for the GeForce RTX 50 series, can generate up to three frames per rendered frame using a technique called Multi Frame Generation. Frame generation increases the displayed frame rate but introduces its own frame pacing concerns. If the underlying rendered frames are unevenly timed, the interpolated frames can make the unevenness more apparent rather than less. DLSS 4 addresses this with hardware-level flip metering on the GPU's display engine, which controls the timing of frame presentation more precisely than the CPU-based pacing used in DLSS 3. Both vendors pair frame generation with latency-reduction features (Nvidia Reflex and AMD Anti-Lag+) to offset the additional input latency that results from inserting synthetic frames into the pipeline. === Frame rate limiters === Capping the frame rate below the display's maximum refresh rate, using tools such as RivaTuner Statistics Server, in-game limiters, or driver-level settings, is a common way to improve frame pacing. Preventing the GPU from running ahead of the display reduces variability in frame delivery times and can produce a smoother result than an uncapped but more irregular frame rate. == History == === Multi-GPU configurations === Micro stuttering was first widely documented in the late 2000s as a side effect of multi-GPU configurations using Alternate Frame Rendering (AFR), in which consecutive frames are assigned to alternating GPUs. Because each GPU may take a different amount of time to complete its assigned frame — due to varying scene complexity, driver scheduling, or inter-GPU communication overhead — the resulting frame delivery is irregular even when the average frame rate is high. Both Nvidia SLI and AMD CrossFireX were affected, with dual-GPU setups exhibiting the worst frame pacing irregularities. In 2012 benchmarks using Battlefield 3, dual Radeon HD 7970 cards in CrossFire showed 85% variation in frame delivery times compared with 7% for a single card, while dual GeForce GTX 680 cards in SLI showed only 7% variation compared with 5% for a single card. Multi-GPU micro stuttering became a significant factor in the eventual decline and discontinuation of consumer multi-GPU gaming. Nvidia restricted SLI to a handful of enthusiast-class cards from the GeForce 10 series onward, then replaced it with NVLink on the GeForce RTX 20 series, which saw limited gaming adoption. AMD ceased active CrossFire development around 2017. By the mid-2020s, neither vendor's current consumer GPUs support multi-GPU rendering for games. Other factors that contributed to the decline include DirectX 12 placing multi-GPU support in the hands of game developers rather than driver authors, the incompatibility of temporal anti-aliasing and other temporal rendering techniques with AFR, and the increasing size, power draw, and cost of individual GPUs. The third-party utility RadeonPro could reduce CrossFire micro stuttering through dynamic V-Sync and frame pacing adjustments, and AMD later introduced a driver-level frame paci

    Read more →
  • LPBoost

    LPBoost

    Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes, and thus also belongs to the class of margin classifier algorithms. Consider a classification function f : X → { − 1 , 1 } , {\displaystyle f:{\mathcal {X}}\to \{-1,1\},} which classifies samples from a space X {\displaystyle {\mathcal {X}}} into one of two classes, labelled 1 and -1, respectively. LPBoost is an algorithm for learning such a classification function, given a set of training examples with known class labels. LPBoost is a machine learning technique especially suited for joint classification and feature selection in structured domains. == LPBoost overview == As in all boosting classifiers, the final classification function is of the form f ( x ) = ∑ j = 1 J α j h j ( x ) , {\displaystyle f({\boldsymbol {x}})=\sum _{j=1}^{J}\alpha _{j}h_{j}({\boldsymbol {x}}),} where α j {\displaystyle \alpha _{j}} are non-negative weightings for weak classifiers h j : X → { − 1 , 1 } {\displaystyle h_{j}:{\mathcal {X}}\to \{-1,1\}} . Each individual weak classifier h j {\displaystyle h_{j}} may be just a little bit better than random, but the resulting linear combination of many weak classifiers can perform very well. LPBoost constructs f {\displaystyle f} by starting with an empty set of weak classifiers. Iteratively, a single weak classifier to add to the set of considered weak classifiers is selected, added and all the weights α {\displaystyle {\boldsymbol {\alpha }}} for the current set of weak classifiers are adjusted. This is repeated until no weak classifiers to add remain. The property that all classifier weights are adjusted in each iteration is known as totally-corrective property. Early boosting methods, such as AdaBoost do not have this property and converge slower. == Linear program == More generally, let H = { h ( ⋅ ; ω ) | ω ∈ Ω } {\displaystyle {\mathcal {H}}=\{h(\cdot ;\omega )|\omega \in \Omega \}} be the possibly infinite set of weak classifiers, also termed hypotheses. One way to write down the problem LPBoost solves is as a linear program with infinitely many variables. The primal linear program of LPBoost, optimizing over the non-negative weight vector α {\displaystyle {\boldsymbol {\alpha }}} , the non-negative vector ξ {\displaystyle {\boldsymbol {\xi }}} of slack variables and the margin ρ {\displaystyle \rho } is the following. min α , ξ , ρ − ρ + D ∑ n = 1 ℓ ξ n sb.t. ∑ ω ∈ Ω y n α ω h ( x n ; ω ) + ξ n ≥ ρ , n = 1 , … , ℓ , ∑ ω ∈ Ω α ω = 1 , ξ n ≥ 0 , n = 1 , … , ℓ , α ω ≥ 0 , ω ∈ Ω , ρ ∈ R . {\displaystyle {\begin{array}{cl}{\underset {{\boldsymbol {\alpha }},{\boldsymbol {\xi }},\rho }{\min }}&-\rho +D\sum _{n=1}^{\ell }\xi _{n}\\{\textrm {sb.t.}}&\sum _{\omega \in \Omega }y_{n}\alpha _{\omega }h({\boldsymbol {x}}_{n};\omega )+\xi _{n}\geq \rho ,\qquad n=1,\dots ,\ell ,\\&\sum _{\omega \in \Omega }\alpha _{\omega }=1,\\&\xi _{n}\geq 0,\qquad n=1,\dots ,\ell ,\\&\alpha _{\omega }\geq 0,\qquad \omega \in \Omega ,\\&\rho \in {\mathbb {R} }.\end{array}}} Note the effects of slack variables ξ ≥ 0 {\displaystyle {\boldsymbol {\xi }}\geq 0} : their one-norm is penalized in the objective function by a constant factor D {\displaystyle D} , which—if small enough—always leads to a primal feasible linear program. Here we adopted the notation of a parameter space Ω {\displaystyle \Omega } , such that for a choice ω ∈ Ω {\displaystyle \omega \in \Omega } the weak classifier h ( ⋅ ; ω ) : X → { − 1 , 1 } {\displaystyle h(\cdot ;\omega ):{\mathcal {X}}\to \{-1,1\}} is uniquely defined. When the above linear program was first written down in early publications about boosting methods it was disregarded as intractable due to the large number of variables α {\displaystyle {\boldsymbol {\alpha }}} . Only later it was discovered that such linear programs can indeed be solved efficiently using the classic technique of column generation. === Column generation for LPBoost === In a linear program a column corresponds to a primal variable. Column generation is a technique to solve large linear programs. It typically works in a restricted problem, dealing only with a subset of variables. By generating primal variables iteratively and on-demand, eventually the original unrestricted problem with all variables is recovered. By cleverly choosing the columns to generate the problem can be solved such that while still guaranteeing the obtained solution to be optimal for the original full problem, only a small fraction of columns has to be created. ==== LPBoost dual problem ==== Columns in the primal linear program corresponds to rows in the dual linear program. The equivalent dual linear program of LPBoost is the following linear program. max λ , γ γ sb.t. ∑ n = 1 ℓ y n h ( x n ; ω ) λ n + γ ≤ 0 , ω ∈ Ω , 0 ≤ λ n ≤ D , n = 1 , … , ℓ , ∑ n = 1 ℓ λ n = 1 , γ ∈ R . {\displaystyle {\begin{array}{cl}{\underset {{\boldsymbol {\lambda }},\gamma }{\max }}&\gamma \\{\textrm {sb.t.}}&\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}+\gamma \leq 0,\qquad \omega \in \Omega ,\\&0\leq \lambda _{n}\leq D,\qquad n=1,\dots ,\ell ,\\&\sum _{n=1}^{\ell }\lambda _{n}=1,\\&\gamma \in \mathbb {R} .\end{array}}} For linear programs the optimal value of the primal and dual problem are equal. For the above primal and dual problems, the optimal value is equal to the negative 'soft margin'. The soft margin is the size of the margin separating positive from negative training instances minus positive slack variables that carry penalties for margin-violating samples. Thus, the soft margin may be positive although not all samples are linearly separated by the classification function. The latter is called the 'hard margin' or 'realized margin'. ==== Convergence criterion ==== Consider a subset of the satisfied constraints in the dual problem. For any finite subset we can solve the linear program and thus satisfy all constraints. If we could prove that of all the constraints which we did not add to the dual problem no single constraint is violated, we would have proven that solving our restricted problem is equivalent to solving the original problem. More formally, let γ ∗ {\displaystyle \gamma ^{}} be the optimal objective function value for any restricted instance. Then, we can formulate a search problem for the 'most violated constraint' in the original problem space, namely finding ω ∗ ∈ Ω {\displaystyle \omega ^{}\in \Omega } as ω ∗ = argmax ω ∈ Ω ∑ n = 1 ℓ y n h ( x n ; ω ) λ n . {\displaystyle \omega ^{}={\underset {\omega \in \Omega }{\textrm {argmax}}}\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}.} That is, we search the space H {\displaystyle {\mathcal {H}}} for a single decision stump h ( ⋅ ; ω ∗ ) {\displaystyle h(\cdot ;\omega ^{})} maximizing the left hand side of the dual constraint. If the constraint cannot be violated by any choice of decision stump, none of the corresponding constraint can be active in the original problem and the restricted problem is equivalent. ==== Penalization constant ==== D {\displaystyle D} The positive value of penalization constant D {\displaystyle D} has to be found using model selection techniques. However, if we choose D = 1 ℓ ν {\displaystyle D={\frac {1}{\ell \nu }}} , where ℓ {\displaystyle \ell } is the number of training samples and 0 < ν < 1 {\displaystyle 0<\nu <1} , then the new parameter ν {\displaystyle \nu } has the following properties. ν {\displaystyle \nu } is an upper bound on the fraction of training errors; that is, if k {\displaystyle k} denotes the number of misclassified training samples, then k ℓ ≤ ν {\displaystyle {\frac {k}{\ell }}\leq \nu } . ν {\displaystyle \nu } is a lower bound on the fraction of training samples outside or on the margin. == Algorithm == Input: Training set X = { x 1 , … , x ℓ } {\displaystyle X=\{{\boldsymbol {x}}_{1},\dots ,{\boldsymbol {x}}_{\ell }\}} , x i ∈ X {\displaystyle {\boldsymbol {x}}_{i}\in {\mathcal {X}}} Training labels Y = { y 1 , … , y ℓ } {\displaystyle Y=\{y_{1},\dots ,y_{\ell }\}} , y i ∈ { − 1 , 1 } {\displaystyle y_{i}\in \{-1,1\}} Convergence threshold θ ≥ 0 {\displaystyle \theta \geq 0} Output: Classification function f : X → { − 1 , 1 } {\displaystyle f:{\mathcal {X}}\to \{-1,1\}} Initialization Weights, uniform λ n ← 1 ℓ , n = 1 , … , ℓ {\displaystyle \lambda _{n}\leftarrow {\frac {1}{\ell }},\quad n=1,\dots ,\ell } Edge γ ← 0 {\displaystyle \gamma \leftarrow 0} Hypothesis count J ← 1 {\displaystyle J\leftarrow 1} Iterate h ^ ← argmax ω ∈ Ω ∑ n = 1 ℓ y n h ( x n ; ω ) λ n {\displaystyle {\hat {h}}\leftarrow {\underset {\omega \in \Omega }{\textrm {argmax}}}\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}} if ∑ n = 1 ℓ y n h ^ ( x n ) λ n + γ ≤ θ {\displaystyle \sum _{n=1}^{\ell }y_{n}{\hat {h}}({\boldsymbol {x}}_{n})\lambda _{n}+\gamma \leq \theta } then break h J ← h ^ {\displaystyle h_{J}\leftarrow {\hat {h}}} J

    Read more →
  • Softplus

    Softplus

    In mathematics and machine learning, the softplus function is f ( x ) = ln ⁡ ( 1 + e x ) . {\displaystyle f(x)=\ln(1+e^{x}).} It is a smooth approximation (in fact, an analytic function) to the ramp function, which is known as the rectifier or ReLU (rectified linear unit) in machine learning. For large negative x {\displaystyle x} it is ln ⁡ ( 1 + e x ) = ln ⁡ ( 1 + ϵ ) ⪆ ln ⁡ 1 = 0 {\displaystyle \ln(1+e^{x})=\ln(1+\epsilon )\gtrapprox \ln 1=0} , so just above 0, while for large positive x {\displaystyle x} it is ln ⁡ ( 1 + e x ) ⪆ ln ⁡ ( e x ) = x {\displaystyle \ln(1+e^{x})\gtrapprox \ln(e^{x})=x} , so just above x {\displaystyle x} . The names softplus and SmoothReLU are used in machine learning. The name "softplus" (2000), by analogy with the earlier softmax (1989) is presumably because it is a smooth (soft) approximation of the positive part of x, which is sometimes denoted with a superscript plus, x + := max ( 0 , x ) {\displaystyle x^{+}:=\max(0,x)} . == Alternative forms == This function can be approximated as: ln ⁡ ( 1 + e x ) ≈ { ln ⁡ 2 , x = 0 , x 1 − e − x / ln ⁡ 2 , x ≠ 0 {\displaystyle \ln \left(1+e^{x}\right)\approx {\begin{cases}\ln 2,&x=0,\\[6pt]{\frac {x}{1-e^{-x/\ln 2}}},&x\neq 0\end{cases}}} By making the change of variables x = y ln ⁡ ( 2 ) {\displaystyle x=y\ln(2)} , this is equivalent to log 2 ⁡ ( 1 + 2 y ) ≈ { 1 , y = 0 , y 1 − e − y , y ≠ 0. {\displaystyle \log _{2}(1+2^{y})\approx {\begin{cases}1,&y=0,\\[6pt]{\frac {y}{1-e^{-y}}},&y\neq 0.\end{cases}}} A sharpness parameter k {\displaystyle k} may be included: f ( x ) = ln ⁡ ( 1 + e k x ) k , f ′ ( x ) = e k x 1 + e k x = 1 1 + e − k x . {\displaystyle f(x)={\frac {\ln(1+e^{kx})}{k}},\qquad \qquad f'(x)={\frac {e^{kx}}{1+e^{kx}}}={\frac {1}{1+e^{-kx}}}.} Additionally, the softplus function is equivalent to the log of the sigmoid function in the following way: − ln ⁡ ( sigmoid ( − x ) ) = − ln ⁡ ( 1 1 + e x ) = ln ⁡ ( 1 + e x ) = softplus ( x ) {\displaystyle -\ln({\text{sigmoid}}(-x))=-\ln \left({\frac {1}{1+e^{x}}}\right)=\ln \left(1+e^{x}\right)={\text{softplus}}(x)} == Related functions == The derivative of softplus is the standard logistic function: f ′ ( x ) = e x 1 + e x = 1 1 + e − x {\displaystyle f'(x)={\frac {e^{x}}{1+e^{x}}}={\frac {1}{1+e^{-x}}}} The logistic function or the sigmoid function is a smooth approximation of the rectifier, the Heaviside step function. === LogSumExp === The multivariable generalization of single-variable softplus is the LogSumExp with the first argument set to zero: L S E 0 + ⁡ ( x 1 , … , x n ) := LSE ⁡ ( 0 , x 1 , … , x n ) = ln ⁡ ( 1 + e x 1 + ⋯ + e x n ) . {\displaystyle \operatorname {LSE_{0}} ^{+}(x_{1},\dots ,x_{n}):=\operatorname {LSE} (0,x_{1},\dots ,x_{n})=\ln(1+e^{x_{1}}+\cdots +e^{x_{n}}).} The LogSumExp function is LSE ⁡ ( x 1 , … , x n ) = ln ⁡ ( e x 1 + ⋯ + e x n ) , {\displaystyle \operatorname {LSE} (x_{1},\dots ,x_{n})=\ln(e^{x_{1}}+\cdots +e^{x_{n}}),} and its gradient is the softmax; the softmax with the first argument set to zero is the multivariable generalization of the logistic function. Both LogSumExp and softmax are used in machine learning. === Convex conjugate === The convex conjugate (specifically, the Legendre transformation) of the softplus function is the negative binary entropy function (with base e). This is because (following the definition of the Legendre transformation: the derivatives are inverse functions) the derivative of softplus is the logistic function, whose inverse function is the logit, which is the derivative of negative binary entropy. Softplus can be interpreted as logistic loss (as a positive number), so, by duality, minimizing logistic loss corresponds to maximizing entropy. This justifies the principle of maximum entropy as loss minimization.

    Read more →
  • Almeida–Pineda recurrent backpropagation

    Almeida–Pineda recurrent backpropagation

    Almeida–Pineda recurrent backpropagation is an extension to the backpropagation algorithm that is applicable to recurrent neural networks. It is a type of supervised learning. It was described somewhat cryptically in Richard Feynman's senior thesis, and rediscovered independently in the context of artificial neural networks by both Fernando Pineda and Luis B. Almeida. A recurrent neural network for this algorithm consists of some input units, some output units and eventually some hidden units. For a given set of (input, target) states, the network is trained to settle into a stable activation state with the output units in the target state, based on a given input state clamped on the input units.

    Read more →
  • Diagnostically acceptable irreversible compression

    Diagnostically acceptable irreversible compression

    Diagnostically acceptable irreversible compression (DAIC) is the amount of lossy compression which can be used on a medical image to produce a result that does not prevent the reader from using the image to make a medical diagnosis. The term was first introduced at a workshop on irreversible compression convened by the European Society of Radiology (ESR) in Palma de Mallorca October 13, 2010, the results of which were reported in a subsequent position paper. == Determination == The "amount of compression" in irreversible compression used to be determined by the compression ratio, where the acceptable minimum is determined by the algorithm (typically JPEG or J2K) and the data type (body part and imaging method). Such a definition is easy to follow, and has been used by medical bodies in 2010 around the world. However, its downside is obvious: the compression ratio tells nothing about the real quality of the image, as different compressors can produce vastly different qualities under the same file size. For example, the JPEG format of 1992 can perform as well as many modern formats given newer techniques exploited in mozjpeg and ISO libjpeg, yet they would be lumped together with the legacy encoders in such a scheme. The image compression community has long used objective quality metrics like SSIM to measure the effects of compression. In the absence of good data regarding SSIM, the ESR review of 2010 concluded that it is still difficult to establish a criterion for whether a particular irreversible compression scheme applied with particular parameters to a particular individual image, or category of images, avoids the introduction of some quantifiable risk of a diagnostic error for any particular diagnostic task. A 2017 study showed that a SSIM variant called 4-G-r (4-component, gradient, structural component of SSIM) best reflects changes in images that affect the decision of radiologists out of 16 SSIM variants. A 2020 study shows that visual information fidelity (VIF), feature similarity index (FSIM), and noise quality metric (NQM) best reflect radiologist preferences out of ten metrics. It also mentions that the original version of SSIM works as poorly as a basic root-mean-square distance (RMSD) for this purpose, a result echoed by the 2017 study. The 4-G-r modification is not tested in the study.

    Read more →
  • Error-driven learning

    Error-driven learning

    In reinforcement learning, error-driven learning is a method for adjusting a model's (intelligent agent's) parameters based on the difference between its output results and the ground truth. These models stand out as they depend on environmental feedback, rather than explicit labels or categories. They are based on the idea that language acquisition involves the minimization of the prediction error (MPSE). By leveraging these prediction errors, the models consistently refine expectations and decrease computational complexity. Typically, these algorithms are operated by the GeneRec algorithm. Error-driven learning has widespread applications in cognitive sciences and computer vision. These methods have also found successful application in natural language processing (NLP), including areas like part-of-speech tagging, parsing, named entity recognition (NER), machine translation (MT), speech recognition (SR), and dialogue systems. == Formal Definition == Error-driven learning models are ones that rely on the feedback of prediction errors to adjust the expectations or parameters of a model. The key components of error-driven learning include the following: A set S {\displaystyle S} of states representing the different situations that the learner can encounter. A set A {\displaystyle A} of actions that the learner can take in each state. A prediction function P ( s , a ) {\displaystyle P(s,a)} that gives the learner's current prediction of the outcome of taking action a {\displaystyle a} in state s {\displaystyle s} . An error function E ( o , p ) {\displaystyle E(o,p)} that compares the actual outcome o {\displaystyle o} with the prediction p {\displaystyle p} and produces an error value. An update rule U ( p , e ) {\displaystyle U(p,e)} that adjusts the prediction p {\displaystyle p} in light of the error e {\displaystyle e} . == Algorithms == Error-driven learning algorithms refer to a category of reinforcement learning algorithms that leverage the disparity between the real output and the expected output of a system to regulate the system's parameters. Typically applied in supervised learning, these algorithms are provided with a collection of input-output pairs to facilitate the process of generalization. The widely utilized error backpropagation learning algorithm is known as GeneRec, a generalized recirculation algorithm primarily employed for gene prediction in DNA sequences. Many other error-driven learning algorithms are derived from alternative versions of GeneRec. == Applications == === Cognitive science === Simpler error-driven learning models effectively capture complex human cognitive phenomena and anticipate elusive behaviors. They provide a flexible mechanism for modeling the brain's learning process, encompassing perception, attention, memory, and decision-making. By using errors as guiding signals, these algorithms adeptly adapt to changing environmental demands and objectives, capturing statistical regularities and structure. Furthermore, cognitive science has led to the creation of new error-driven learning algorithms that are both biologically acceptable and computationally efficient. These algorithms, including deep belief networks, spiking neural networks, and reservoir computing, follow the principles and constraints of the brain and nervous system. Their primary aim is to capture the emergent properties and dynamics of neural circuits and systems. === Computer vision === Computer vision is a complex task that involves understanding and interpreting visual data, such as images or videos. In the context of error-driven learning, the computer vision model learns from the mistakes it makes during the interpretation process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This repeated process of learning from errors helps improve the model's performance over time. For NLP to do well at computer vision, it employs deep learning techniques. This form of computer vision is sometimes called neural computer vision (NCV), since it makes use of neural networks. NCV therefore interprets visual data based on a statistical, trial and error approach and can deal with context and other subtleties of visual data. === Natural Language Processing === ==== Part-of-speech tagging ==== Part-of-speech (POS) tagging is a crucial component in Natural Language Processing (NLP). It helps resolve human language ambiguity at different analysis levels. In addition, its output (tagged data) can be used in various applications of NLP such as information extraction, information retrieval, question Answering, speech eecognition, text-to-speech conversion, partial parsing, and grammar correction. ==== Parsing ==== Parsing in NLP involves breaking down a text into smaller pieces (phrases) based on grammar rules. If a sentence cannot be parsed, it may contain grammatical errors. In the context of error-driven learning, the parser learns from the mistakes it makes during the parsing process. When an error is encountered, the parser updates its internal model to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the parser's performance over time. In conclusion, error-driven learning plays a crucial role in improving the accuracy and efficiency of NLP parsers by allowing them to learn from their mistakes and adapt their internal models accordingly. ==== Named entity recognition (NER) ==== NER is the task of identifying and classifying entities (such as persons, locations, organizations, etc.) in a text. Error-driven learning can help the model learn from its false positives and false negatives and improve its recall and precision on (NER). In the context of error-driven learning, the significance of NER is quite profound. Traditional sequence labeling methods identify nested entities layer by layer. If an error occurs in the recognition of an inner entity, it can lead to incorrect identification of the outer entity, leading to a problem known as error propagation of nested entities. This is where the role of NER becomes crucial in error-driven learning. By accurately recognizing and classifying entities, it can help minimize these errors and improve the overall accuracy of the learning process. Furthermore, deep learning-based NER methods have shown to be more accurate as they are capable of assembling words, enabling them to understand the semantic and syntactic relationship between various words better. ==== Machine translation ==== Machine translation is a complex task that involves converting text from one language to another. In the context of error-driven learning, the machine translation model learns from the mistakes it makes during the translation process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. ==== Speech recognition ==== Speech recognition is a complex task that involves converting spoken language into written text. In the context of error-driven learning, the speech recognition model learns from the mistakes it makes during the recognition process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. ==== Dialogue systems ==== Dialogue systems are a popular NLP task as they have promising real-life applications. They are also complicated tasks since many NLP tasks deserving study are involved. In the context of error-driven learning, the dialogue system learns from the mistakes it makes during the dialogue process. When an error is encountered, the model updates its internal parameters to avoid making the same mistake in the future. This iterative process of learning from errors helps improve the model's performance over time. == Advantages == Error-driven learning has several advantages over other types of machine learning algorithms: They can learn from feedback and correct their mistakes, which makes them adaptive and robust to noise and changes in the data. They can handle large and high-dimensional data sets, as they do not require explicit feature engineering or prior knowledge of the data distribution. They can achieve high accuracy and performance, as they can learn complex and nonlinear relationships between the input and the output. == Limitations == Although error driven learning has its advantages, their algorithms also have the following limitations: They can suffer from overfitting, which means that they memorize the training data and fail to generalize to new and unseen data. This can be mitigated by using regularization techniques, such as adding a penalty term to the loss function, or reducing the complexity of the model. They can be sensitive to the choice of

    Read more →
  • Handwriting recognition

    Handwriting recognition

    Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning (optical character recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words. == Offline recognition == Offline handwriting recognition involves the automatic conversion of text in an image into letter codes that are usable within computer and text-processing applications. The data obtained by this form is regarded as a static representation of handwriting. Offline handwriting recognition is comparatively difficult, as different people have different handwriting styles. And, as of today, OCR engines are primarily focused on machine printed text and ICR for hand "printed" (written in capital letters) text. === Traditional techniques === ==== Character extraction ==== Offline character recognition often involves scanning a form or document. This means the individual characters contained in the scanned image will need to be extracted. Tools exist that are capable of performing this step. However, there are several common imperfections in this step. The most common is when characters that are connected are returned as a single sub-image containing both characters. This causes a major problem in the recognition stage. Yet many algorithms are available that reduce the risk of connected characters. ==== Character recognition ==== After individual characters have been extracted, a recognition engine is used to identify the corresponding computer character. Several different recognition techniques are currently available. ===== Feature extraction ===== Feature extraction works in a similar fashion to neural network recognizers. However, programmers must manually determine the properties they feel are important. This approach gives the recognizer more control over the properties used in identification. Yet any system using this approach requires substantially more development time than a neural network because the properties are not learned automatically. === Modern techniques === Where traditional techniques focus on segmenting individual characters for recognition, modern techniques focus on recognizing all the characters in a segmented line of text. Particularly they focus on machine learning techniques that are able to learn visual features, avoiding the limiting feature engineering previously used. State-of-the-art methods use convolutional networks to extract visual features over several overlapping windows of a text line image which a recurrent neural network uses to produce character probabilities. == Online recognition == Online handwriting recognition involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching. This kind of data is known as digital ink and can be regarded as a digital representation of handwriting. The obtained signal is converted into letter codes that are usable within computer and text-processing applications. The elements of an online handwriting recognition interface typically include: a pen or stylus for the user to write with a touch sensitive surface, which may be integrated with, or adjacent to, an output display. a software application which interprets the movements of the stylus across the writing surface, translating the resulting strokes into digital text. The process of online handwriting recognition can be broken down into a few general steps: preprocessing, feature extraction and classification The purpose of preprocessing is to discard irrelevant information in the input data, that can negatively affect the recognition. This concerns speed and accuracy. Preprocessing usually consists of binarization, normalization, sampling, smoothing and denoising. The second step is feature extraction. Out of the two- or higher-dimensional vector field received from the preprocessing algorithms, higher-dimensional data is extracted. The purpose of this step is to highlight important information for the recognition model. This data may include information like pen pressure, velocity or the changes of writing direction. The last big step is classification. In this step, various models are used to map the extracted features to different classes and thus identifying the characters or words the features represent. === Hardware === Commercial products incorporating handwriting recognition as a replacement for keyboard input were introduced in the early 1980s. Examples include handwriting terminals such as the Pencept Penpad and the Inforite point-of-sale terminal. With the advent of the large consumer market for personal computers, several commercial products were introduced to replace the keyboard and mouse on a personal computer with a single pointing/handwriting system, such as those from Pencept, CIC and others. The first commercially available tablet-type portable computer was the Write-Top from Linus Technologies, released in July 1988. Its operating system was based on MS-DOS. In the early 1990s, hardware makers including NCR, IBM and EO released tablet computers running the PenPoint operating system developed by GO Corp. PenPoint used handwriting recognition and gestures throughout and provided the facilities to third-party software. IBM's tablet computer was the first to use the ThinkPad name and used IBM's handwriting recognition. This recognition system was later ported to Microsoft Windows for Pen Computing, and IBM's Pen for OS/2. None of these were commercially successful. Advancements in electronics allowed the computing power necessary for handwriting recognition to fit into a smaller form factor than tablet computers, and handwriting recognition is often used as an input method for hand-held PDAs. The first PDA to provide written input was the Apple Newton, which exposed the public to the advantage of a streamlined user interface. However, the device was not a commercial success, owing to the unreliability of the software, which tried to learn a user's writing patterns. By the time of the release of the Newton OS 2.0, wherein the handwriting recognition was greatly improved, including unique features still not found in current recognition systems such as modeless error correction, the largely negative first impression had been made. After discontinuation of Apple Newton, the feature was incorporated in Mac OS X 10.2 and later as Inkwell. Palm later launched a successful series of PDAs based on the Graffiti recognition system. Graffiti improved usability by defining a set of "unistrokes", or one-stroke forms, for each character. This narrowed the possibility for erroneous input, although memorization of the stroke patterns did increase the learning curve for the user. The Graffiti handwriting recognition was found to infringe on a patent held by Xerox, and Palm replaced Graffiti with a licensed version of the CIC handwriting recognition which, while also supporting unistroke forms, pre-dated the Xerox patent. The court finding of infringement was reversed on appeal, and then reversed again on a later appeal. The parties involved subsequently negotiated a settlement concerning this and other patents. A Tablet PC is a notebook computer with a digitizer tablet and a stylus, which allows a user to handwrite text on the unit's screen. The operating system recognizes the handwriting and converts it into text. Windows Vista and Windows 7 include personalization features that learn a user's writing patterns or vocabulary for English, Japanese, Chinese Traditional, Chinese Simplified and Korean. The features include a "personalization wizard" that prompts for samples of a user's handwriting and uses them to retrain the system for higher accuracy recognition. This system is distinct from the less advanced handwriting recognition system employed in its Windows Mobile OS for PDAs. Although handwriting recognition is an input form that the public has become accustomed to, it has not achieved widespread use in either desktop computers or laptops. It is still generally accepted that keyboard input is both faster and more reliable. As of 2006, many PDAs offer handwriting input, sometimes even accepting natural cursive handwriting, but accuracy is still a problem, and some people still find even a simple on-screen keyboard more efficient. === Software === Early software could understand print handwriting where the characters were separated; however, cursive handwriting

    Read more →
  • Gaussian process emulator

    Gaussian process emulator

    In statistics, Gaussian process emulator is one name for a general type of statistical model that has been used in contexts where the problem is to make maximum use of the outputs of a complicated (often non-random) computer-based simulation model. Each run of the simulation model is computationally expensive and each run is based on many different controlling inputs. The variation of the outputs of the simulation model is expected to vary reasonably smoothly with the inputs, but in an unknown way. The overall analysis involves two models: the simulation model, or "simulator", and the statistical model, or "emulator", which notionally emulates the unknown outputs from the simulator. The Gaussian process emulator model treats the problem from the viewpoint of Bayesian statistics. In this approach, even though the output of the simulation model is fixed for any given set of inputs, the actual outputs are unknown unless the computer model is run and hence can be made the subject of a Bayesian analysis. The main element of the Gaussian process emulator model is that it models the outputs as a Gaussian process on a space that is defined by the model inputs. The model includes a description of the correlation or covariance of the outputs, which enables the model to encompass the idea that differences in the output will be small if there are only small differences in the inputs.

    Read more →
  • Scripped

    Scripped

    Scripped was an online screenplay services company offering three services: script writing, script registration, and script coverage. Scripped did not facilitate collaboration among screenwriters. It combined with Zhura in 2010. According to Techcrunch, Scripped had more than 60,000 writers as of March 2010. Scripped was administered by Sunil Rajaraman, Ryan Buckley and Zak Freer. Actor, writer, and director Edward Burns and screenwriter Steven E. de Souza joined Scripped's Board of Advisers in May 2008. In 2008, the company formed a partnership with Write Brothers, makers of Movie Magic Screenwriter software. On March 29, 2010, Scripped announced that it closed $250,000 in private investment and merged with competitor Zhura. Scripped's CEO, Sunil Rajaraman, remains the merged company's Chief Executive Officer. On April 1, 2015, citing a serious technical failure, Scripped shuttered its service. As part of the announcement, it was disclosed that their backup servers had failed as well, losing all of its users' stored scripts. The website URL currently redirects to WriterDuet's website, another online scriptwriting service; Scripped had advertised WriterDuet in Scripped's shutdown open letter. == Features == The Scripped Writer provided a built-in screenplay template which formatted the document to a standard for scripts as recommended by the AMPAS. The screenplay document was composed of seven elements: scene, action, character, dialog, parenthetical, transition and general. Each element had a specific style to which the Scripped Writer conformed as text was entered. Like other client-side screenplay software, Scripped offered Tab-Enter toggling between screenplay elements, making the writing process much faster. Text files could be imported into the Scripped Writer and automatically conformed to the screenplay template. Completed scripts could be exported as PDF files. In May 2011 the administrators of Scripped launched Scripted.com - a sister site focused on freelance writing jobs. Subsequent to the service's launch, the company was renamed to Scripted, Inc.

    Read more →
  • Operational taxonomic unit

    Operational taxonomic unit

    An operational taxonomic unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, where an "operational taxonomic unit" is simply the group of organisms currently being studied. In this sense, an OTU is a pragmatic definition to group individuals by similarity, equivalent to but not necessarily in line with classical Linnaean taxonomy or modern evolutionary taxonomy. Nowadays, however, the term is commonly used in a different context and refers to clusters of (uncultivated or unknown) organisms, grouped by DNA sequence similarity of a specific taxonomic marker gene (originally coined as mOTU; molecular OTU). In other words, OTUs are pragmatic proxies for "species" at different taxonomic levels, in the absence of traditional systems of biological classification as are available for macroscopic organisms. For several years, OTUs have been the most commonly used units of diversity, especially when analysing small subunit 16S (for prokaryotes) or 18S rRNA (for eukaryotes) marker gene sequence datasets. == Molecular OTU by clustering of marker gene sequences == In the approach represented by DNA barcoding, a particular locus is chosen to be used as the marker gene for classification. This locus should be universally present in the scope selected, variable enough to be different among close-related species, and be flanked by conservative sequences that allow for easy amplification and detection. There are databases containing sequences for such marker genes from many different species, allowing for comparison. (Sometimes only using one locus does not provide sufficient resolution, so multiple marker genes are used. This is the case for plants, where rbcL+matK is common.) Sequences obtained this way can be clustered according to their similarity to one another, and operational taxonomic units are defined based on the similarity threshold set by the researcher. The exact threshold depends on the taxa in question and the mutational rates of the selected locus in the taxon. 97–99% are commonly used, but "it is now recognized to be somewhat arbitrary as sequence variation within and among species varies across taxa". 100% similarity (fully identical) is also common, also known as single variants. It remains debatable how well this commonly used method recapitulates true microbial species phylogeny or ecology. Although OTUs can be calculated differently when using different algorithms or thresholds, research by Schmidt et al. (2014) demonstrated that 16S-derived microbial OTUs were generally ecologically consistent across habitats and several clustering approaches. The number of OTUs defined may be inflated due to errors in DNA sequencing. === OTU clustering approaches === There are three main approaches to clustering OTUs: De novo, for which the clustering is based on similarities between sequencing reads. Closed-reference, for which the clustering is performed against a reference database of sequences. Open-reference, where clustering is first performed against a reference database of sequences, then any remaining sequences that could not be mapped to the reference are clustered de novo. Using a reference provides taxonomic context for the OTUs found. Alternatively, taxonomic context can be found after the construction of clusters by comparing representative sequences from clusters against a reference database. There are also specialized classifiers for this purpose which are much faster than naive comparison using BLAST. === OTU clustering algorithms === Hierarchical clustering algorithms (HCA): uclust & cd-hit & ESPRIT Bayesian clustering: CROP == Molecular OTU by other methods == In addition to similarity-based grouping, marker gene sequences can be sorted into OTUs using molecular phylogeny, k-mer composition, or hybrid methods combining these methods with similarity. There are also Bayesian tree-less methods and machine learning approaches. Using phylogeny often involves manually assigning terminal clades or single nodes to an OTU, so this is usually only done for refinement. Genome skimming can be used to obtain high-copy DNA without the need to choose marker genes or to design PCR primers for the chosen genes. It can provide fairly good coverage of organelle DNA and repetitive elements such as ribosomal DNA, both of which can be used like marker genes in OTU analysis. Whole-genome sequencing is more expensive and involves the production and processing of more data. By considering the entire genome, many (sometimes over 100) marker genes can be used at the same time, producing highly resolved phylogenies that correctly identify problematic taxa. It is also possible to use entire genomes for OTU assignment. For example, genomes from different bacterial species almost always have an average nucleotide identity lower than 95%, a fact that can be used to define new OTUs (and likely new species).

    Read more →
  • Neural cryptography

    Neural cryptography

    Neural cryptography is a branch of cryptography dedicated to analyzing the application of stochastic algorithms, especially artificial neural network algorithms, for use in encryption and cryptanalysis. == Definition == Artificial neural networks are well known for their ability to selectively explore the solution space of a given problem. This feature finds a natural niche of application in the field of cryptanalysis. At the same time, neural networks offer a new approach to attack ciphering algorithms based on the principle that any function could be reproduced by a neural network, which is a powerful proven computational tool that can be used to find the inverse-function of any cryptographic algorithm. The ideas of mutual learning, self learning, and stochastic behavior of neural networks and similar algorithms can be used for different aspects of cryptography, like public-key cryptography, solving the key distribution problem using neural network mutual synchronization, hashing or generation of pseudo-random numbers. Another idea is the ability of a neural network to separate space in non-linear pieces using "bias". It gives different probabilities of activating the neural network or not. This is very useful in the case of Cryptanalysis. Two names are used to design the same domain of research: Neuro-Cryptography and Neural Cryptography. The first work that it is known on this topic can be traced back to 1995 in an IT Master Thesis. == Applications == In 1995, Sebastien Dourlens applied neural networks to cryptanalyze DES by allowing the networks to learn how to invert the S-tables of the DES. The bias in DES studied through Differential Cryptanalysis by Adi Shamir is highlighted. The experiment shows about 50% of the key bits can be found, allowing the complete key to be found in a short time. Hardware application with multi micro-controllers have been proposed due to the easy implementation of multilayer neural networks in hardware. One example of a public-key protocol is given by Khalil Shihab . He describes the decryption scheme and the public key creation that are based on a backpropagation neural network. The encryption scheme and the private key creation process are based on Boolean algebra. This technique has the advantage of small time and memory complexities. A disadvantage is the property of backpropagation algorithms: because of huge training sets, the learning phase of a neural network is very long. Therefore, the use of this protocol is only theoretical so far. == Neural key exchange protocol == The most used protocol for key exchange between two parties A and B in the practice is Diffie–Hellman key exchange protocol. Neural key exchange, which is based on the synchronization of two tree parity machines, should be a secure replacement for this method. Synchronizing these two machines is similar to synchronizing two chaotic oscillators in chaos communications. === Tree parity machine === The tree parity machine is a special type of multi-layer feedforward neural network. It consists of one output neuron, K hidden neurons and K×N input neurons. Inputs to the network take three values: x i j ∈ { − 1 , 0 , + 1 } {\displaystyle x_{ij}\in \left\{-1,0,+1\right\}} The weights between input and hidden neurons take the values: w i j ∈ { − L , . . . , 0 , . . . , + L } {\displaystyle w_{ij}\in \left\{-L,...,0,...,+L\right\}} Output value of each hidden neuron is calculated as a sum of all multiplications of input neurons and these weights: σ i = sgn ⁡ ( ∑ j = 1 N w i j x i j ) {\displaystyle \sigma _{i}=\operatorname {sgn}(\sum _{j=1}^{N}w_{ij}x_{ij})} Signum is a simple function, which returns −1,0 or 1: sgn ⁡ ( x ) = { − 1 if x < 0 , 0 if x = 0 , 1 if x > 0. {\displaystyle \operatorname {sgn}(x)={\begin{cases}-1&{\text{if }}x<0,\\0&{\text{if }}x=0,\\1&{\text{if }}x>0.\end{cases}}} If the scalar product is 0, the output of the hidden neuron is mapped to −1 in order to ensure a binary output value. The output of neural network is then computed as the multiplication of all values produced by hidden elements: τ = ∏ i = 1 K σ i {\displaystyle \tau =\prod _{i=1}^{K}\sigma _{i}} Output of the tree parity machine is binary. === Protocol === Each party (A and B) uses its own tree parity machine. Synchronization of the tree parity machines is achieved in these steps Initialize random weight values Execute these steps until the full synchronization is achieved Generate random input vector X Compute the values of the hidden neurons Compute the value of the output neuron Compare the values of both tree parity machines Outputs are the same: one of the suitable learning rules is applied to the weights Outputs are different: go to 2.1 After the full synchronization is achieved (the weights wij of both tree parity machines are same), A and B can use their weights as keys. This method is known as a bidirectional learning. One of the following learning rules can be used for the synchronization: Hebbian learning rule: w i + = g ( w i + σ i x i Θ ( σ i τ ) Θ ( τ A τ B ) ) {\displaystyle w_{i}^{+}=g(w_{i}+\sigma _{i}x_{i}\Theta (\sigma _{i}\tau )\Theta (\tau ^{A}\tau ^{B}))} Anti-Hebbian learning rule: w i + = g ( w i − σ i x i Θ ( σ i τ ) Θ ( τ A τ B ) ) {\displaystyle w_{i}^{+}=g(w_{i}-\sigma _{i}x_{i}\Theta (\sigma _{i}\tau )\Theta (\tau ^{A}\tau ^{B}))} Random walk: w i + = g ( w i + x i Θ ( σ i τ ) Θ ( τ A τ B ) ) {\displaystyle w_{i}^{+}=g(w_{i}+x_{i}\Theta (\sigma _{i}\tau )\Theta (\tau ^{A}\tau ^{B}))} Where: Θ ( a , b ) = 0 {\displaystyle \Theta (a,b)=0} if a ≠ b {\displaystyle a\neq b} otherwise Θ ( a , b ) = 1 {\displaystyle \Theta (a,b)=1} And: g ( x ) {\displaystyle g(x)} is a function that keeps the w i {\displaystyle w_{i}} in the range { − L , − L + 1 , . . . , 0 , . . . , L − 1 , L } {\displaystyle \{-L,-L+1,...,0,...,L-1,L\}} === Attacks and security of this protocol === In every attack it is considered, that the attacker E can eavesdrop messages between the parties A and B, but does not have an opportunity to change them. ==== Brute force ==== To provide a brute force attack, an attacker has to test all possible keys (all possible values of weights wij). By K hidden neurons, K×N input neurons and boundary of weights L, this gives (2L+1)KN possibilities. For example, the configuration K = 3, L = 3 and N = 100 gives us 310253 key possibilities, making the attack impossible with today's computer power. ==== Learning with own tree parity machine ==== One of the basic attacks can be provided by an attacker, who owns the same tree parity machine as the parties A and B. He wants to synchronize his tree parity machine with these two parties. In each step there are three situations possible: Output(A) ≠ Output(B): None of the parties updates its weights. Output(A) = Output(B) = Output(E): All the three parties update weights in their tree parity machines. Output(A) = Output(B) ≠ Output(E): Parties A and B update their tree parity machines, but the attacker can not do that. Because of this situation his learning is slower than the synchronization of parties A and B. It has been proven, that the synchronization of two parties is faster than learning of an attacker. It can be improved by increasing of the synaptic depth L of the neural network. That gives this protocol enough security and an attacker can find out the key only with small probability. ==== Other attacks ==== For conventional cryptographic systems, we can improve the security of the protocol by increasing of the key length. In the case of neural cryptography, we improve it by increasing of the synaptic depth L of the neural networks. Changing this parameter increases the cost of a successful attack exponentially, while the effort for the users grows polynomially. Therefore, breaking the security of neural key exchange belongs to the complexity class NP. Alexander Klimov, Anton Mityaguine, and Adi Shamir say that the original neural synchronization scheme can be broken by at least three different attacks—geometric, probabilistic analysis, and using genetic algorithms. Even though this particular implementation is insecure, the ideas behind chaotic synchronization could potentially lead to a secure implementation. === Permutation parity machine === The permutation parity machine is a binary variant of the tree parity machine. It consists of one input layer, one hidden layer and one output layer. The number of neurons in the output layer depends on the number of hidden units K. Each hidden neuron has N binary input neurons: x i j ∈ { 0 , 1 } {\displaystyle x_{ij}\in \left\{0,1\right\}} The weights between input and hidden neurons are also binary: w i j ∈ { 0 , 1 } {\displaystyle w_{ij}\in \left\{0,1\right\}} Output value of each hidden neuron is calculated as a sum of all exclusive disjunctions (exclusive or) of input neurons and these weights: σ i = θ N ( ∑ j = 1 N w i j ⊕ x i j ) {\displaystyle \sigma _{i}=\theta _{N}(\sum _{j=1}^{N}w_{ij}\oplus x_{ij})} (⊕ means XOR). Th

    Read more →