AI Driven Spreadsheet

AI Driven Spreadsheet — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Multimodal representation learning

    Multimodal representation learning

    Multimodal representation learning is a subfield of representation learning focused on integrating and interpreting information from different modalities, such as text, images, audio, or video, by projecting them into a shared latent space. This allows for semantically similar content across modalities to be mapped to nearby points within that space, facilitating a unified understanding of diverse data types. By automatically learning meaningful features from each modality and capturing their inter-modal relationships, multimodal representation learning enables a unified representation that enhances performance in cross-media analysis tasks such as video classification, event detection, and sentiment analysis. It also supports cross-modal retrieval and translation, including image captioning, video description, and text-to-image synthesis. == Motivation == The primary motivations for multimodal representation learning arise from the inherent nature of real-world data and the limitations of unimodal approaches. Since multimodal data offers complementary and supplementary information about an object or event from different perspectives, it is more informative than relying on a single modality. A key motivation is to narrow the heterogeneity gap that exists between different modalities by projecting their features into a shared semantic subspace. This allows semantically similar content across modalities to be represented by similar vectors, facilitating the understanding of relationships and correlations between them. Multimodal representation learning aims to leverage the unique information provided by each modality to achieve a more comprehensive and accurate understanding of concepts. These unified representations are crucial for improving performance in various cross-media analysis tasks such as video classification, event detection, and sentiment analysis. They also enable cross-modal retrieval, allowing users to search and retrieve content across different modalities. Additionally, it facilitates cross-modal translation, where information can be converted from one modality to another, as seen in applications like image captioning and text-to-image synthesis. The abundance of ubiquitous multimodal data in real-world applications, including understudied areas like healthcare, finance, and human-computer interaction (HCI), further motivates the development of effective multimodal representation learning techniques. == Approaches and methods == === Canonical-correlation analysis based methods === Canonical-correlation analysis (CCA) was first introduced in 1936 by Harold Hotelling and is a fundamental approach for multimodal learning. CCA aims to find linear relationships between two sets of variables. Given two data matrices X ∈ R n × p {\displaystyle X\in \mathbb {R} ^{n\times p}} and Y ∈ R n × q {\displaystyle Y\in \mathbb {R} ^{n\times q}} representing different modalities, CCA finds projection vectors w x ∈ R p {\displaystyle w_{x}\in \mathbb {R} ^{p}} and w y ∈ R q {\displaystyle w_{y}\in \mathbb {R} ^{q}} that maximizes the correlation between the projected variables: ρ = max w x , w y w x ⊤ Σ x y w y w x ⊤ Σ x x w x w y ⊤ Σ y y w y {\displaystyle \rho =\max _{w_{x},w_{y}}{\frac {w_{x}^{\top }\Sigma _{xy}w_{y}}{{\sqrt {w_{x}^{\top }\Sigma _{xx}w_{x}}}{\sqrt {w_{y}^{\top }\Sigma _{yy}w_{y}}}}}} such that Σ x x {\displaystyle \Sigma _{xx}} and Σ y y {\displaystyle \Sigma _{yy}} are the within-modality covariance matrices, and Σ x y {\displaystyle \Sigma _{xy}} is the between-modality covariance matrix. However, standard CCA is limited by its linearity, which led to the development of nonlinear extensions, such as kernel CCA and deep CCA. ==== Kernel CCA ==== Kernel canonical correlation analysis (KCCA) extends traditional CCA to capture nonlinear relationships between modalities by implicitly mapping the data into high dimensional feature spaces using kernel functions. Given kernel functions K x {\displaystyle K_{x}} and K y {\displaystyle K_{y}} with corresponding Gram matrices K x ∈ R n × n {\displaystyle K_{x}\in \mathbb {R} ^{n\times n}} and K y ∈ R n × n {\displaystyle K_{y}\in \mathbb {R} ^{n\times n}} , KCCA seeks coefficients α {\displaystyle \alpha } and β {\displaystyle \beta } that maximize: ρ = max α , β α ⊤ K x K y β α ⊤ K x 2 α β ⊤ K y 2 β {\displaystyle \rho =\max _{\alpha ,\beta }{\frac {\alpha ^{\top }K_{x}Ky\beta }{{\sqrt {\alpha ^{\top }K_{x}^{2}\alpha }}{\sqrt {\beta ^{\top }K_{y}^{2}\beta }}}}} To prevent overfitting, regularization terms are typically added, resulting in: ρ = max α , β α T K x K y β α T ( K x 2 + λ x K x ) α β T ( K y 2 + λ y K y ) β {\displaystyle \rho =\max _{\alpha ,\beta }{\frac {\alpha ^{T}K_{x}K_{y}\beta }{{\sqrt {\alpha ^{T}\left(K_{x}^{2}+\lambda _{x}K_{x}\right)\alpha }}{\sqrt {\;\beta ^{T}\left(K_{y}^{2}+\lambda _{y}K_{y}\right)\beta }}}}} where λ x {\displaystyle \lambda _{x}} and λ y {\displaystyle \lambda _{y}} are regularization parameters. KCCA has proven effective for tasks such as cross-modal retrieval and semantic analysis, though it faces computational challenges with large datasets due to its O ( n 2 ) {\displaystyle O(n^{2})} memory requirement for sorting kernel matrices. KCCA was proposed independently by several researchers. ==== Deep CCA ==== Deep canonical correlation analysis (DCCA), introduced in 2013, employs neural networks to learn nonlinear transformations for maximizing the correlation between modalities. DCCA uses separate neural networks f x {\displaystyle f_{x}} and f y {\displaystyle f_{y}} for each modality to transform the original data before applying CCA: max W x , W y , θ x , θ y corr ⁡ ( f x ( X ; θ x ) , f y ( Y ; θ y ) ) {\displaystyle \max _{W_{x},W_{y},\theta _{x},\theta _{y}}\operatorname {corr} \left(f_{x}(X;\theta _{x}),f_{y}(Y;\theta _{y})\right)} where θ x {\displaystyle \theta _{x}} and θ y {\displaystyle \theta _{y}} represent the parameters of the neural networks, and W x {\displaystyle W_{x}} and W y {\displaystyle W_{y}} are the CCA projection matrices. The correlation objective is computed as: corr ⁡ ( H x , H y ) = tr ⁡ ( T − 1 / 2 H x T H y S − 1 / 2 ) {\displaystyle \operatorname {corr} (H_{x},H_{y})=\operatorname {tr} \left(T^{-1/2}H_{x}^{T}H_{y}S^{-1/2}\right)} where H x = f x ( X ) {\displaystyle H_{x}=f_{x}(X)} and H y = f y ( Y ) {\displaystyle H_{y}=f_{y}(Y)} are the network outputs, T = H x T H x + r x I {\displaystyle T=H_{x}^{T}H_{x}+r_{x}I} , S = H y T H y + r y I {\displaystyle S=H_{y}^{T}H_{y}+r_{y}I} and r x , r y {\displaystyle r_{x},r_{y}} are the regularization parameters. DCCA overcomes the limitations of linear CCA and kernel CCA by learning complex nonlinear relationships while maintaining computational efficiency for large datasets through mini-batch optimization. === Graph-based methods === Graph-based approaches for multimodal representation learning leverage graph structure to model relationships between entities across different modalities. These methods typically represent each modality as a graph and then learn embedding that preserve cross-modal similarities, enabling more effective joint representation of heterogeneous data. One such method is cross-modal graph neural networks (CMGNNs) that extend traditional graph neural networks (GNNs) to handle data from multiple modalities by constructing graphs that capture both intra-modal and inter-modal relationships. These networks model interactions across modalities by representing them as nodes and their relationships as edges. Other graph-based methods include Probabilistic Graphical Models (PGMs) such as deep belief networks (DBN) and deep Boltzmann machines (DBM). These models can learn a joint representation across modalities, for instance, a multimodal DBN achieves this by adding a shared restricted Boltzmann Machine (RBM) hidden layer on top of modality-specific DBNs. Additionally, the structure of data in some domains like Human-Computer Interaction (HCI), such as the view hierarchy of app screens, can potentially be modeled using graph-like structures. The field of graph representation learning is also relevant, with ongoing progress in developing evaluation benchmarks. === Diffusion maps === Another set of methods relevant to multimodal representation learning are based on diffusion maps and their extensions to handle multiple modalities. ==== Multi-view diffusion maps ==== Multi-view diffusion maps address the challenge of achieving multi-view dimensionality reduction by effectively utilizing the availability of multiple views to extract a coherent low-dimensional representation of the data. The core idea is to exploit both the intrinsic relations within each view and the mutual relations between the different views, defining a cross-view model where a random walk process implicitly hops between objects in different views. A multi-view kernel matrix is constructed by combining these relations, defining a cross-view diffusion process and associ

    Read more →
  • Latent class model

    Latent class model

    In statistics, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class model because the class to which each data point belongs is unobserved (or latent). Latent class analysis (LCA) is a subset of structural equation modeling used to find groups or subtypes of cases in multivariate categorical data. These groups or subtypes of cases are called "latent classes". When faced with the following situation, a researcher might opt to use LCA to better understand the data: Symptoms a, b, c, and d have been recorded in a variety of patients diagnosed with diseases X, Y, and Z. Disease X is associated with symptoms a, b, and c; disease Y is linked to symptoms b, c, and d; and disease Z is connected to symptoms a, c, and d. In this context, the LCA would attempt to detect the presence of latent classes (i.e., the disease entities), thus creating patterns of association in the symptoms. As in factor analysis, LCA can also be used to classify cases according to their maximum likelihood class membership probability. The key criterion for resolving the LCA is identifying latent classes in which the observed symptom associations are effectively rendered null. This is because within each class, the diseases responsible for the symptoms create a structure of dependencies. As a result, the symptoms become conditionally independent, meaning that, given the class a case belongs to, the symptoms are no longer related to one another. == Model == Within each latent class, the observed variables are statistically independent—an essential aspect of latent class modeling. Usually, the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes, variables are independent (local independence). Therefore, the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987). In one form, the LCM is written as p i 1 , i 2 , … , i N ≈ ∑ t T p t ∏ n N p i n , t n , {\displaystyle p_{i_{1},i_{2},\ldots ,i_{N}}\approx \sum _{t}^{T}p_{t}\,\prod _{n}^{N}p_{i_{n},t}^{n},} where T {\displaystyle T} is the number of latent classes and p t {\displaystyle p_{t}} are the so-called recruitment or unconditional probabilities that should sum to one. p i n , t n {\displaystyle p_{i_{n},t}^{n}} are the marginal or conditional probabilities. For a two-way latent class model, the form is p i j ≈ ∑ t T p t p i t p j t . {\displaystyle p_{ij}\approx \sum _{t}^{T}p_{t}\,p_{it}\,p_{jt}.} This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization. The probability model used in LCA is closely related to the Naive Bayes classifier. The main difference is that in LCA, the class membership of an individual is a latent variable, whereas in Naive Bayes classifiers, the class membership is an observed label. == Related methods == There are a number of methods with distinct names and uses that share a common relationship. Cluster analysis is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data and assumes that such data arise from a mixture of distributions, such as a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution, it is termed latent profile analysis. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension, allocating members to classes based on that dimension. An example would be assigning cases to social classes based on ability or merit. In a practical instance, the variables could be multiple choice items of a political questionnaire. In this case, the data consists of an N-way contingency table with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion, and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance that certain answers are chosen. == Application == LCA may be used in many fields, such as: collaborative filtering, Behavior Genetics and Evaluation of diagnostic tests.

    Read more →
  • Dynamic time warping

    Dynamic time warping

    In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications. In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules: Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match) The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match) The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if j > i {\displaystyle j>i} are indices from the first sequence, then there must not be two indices l > k {\displaystyle l>k} in the other sequence, such that index i {\displaystyle i} is matched with index l {\displaystyle l} and index j {\displaystyle j} is matched with index k {\displaystyle k} , and vice versa We can plot each match between the sequences 1 : M {\displaystyle 1:M} and 1 : N {\displaystyle 1:N} as a path in a M × N {\displaystyle M\times N} matrix from ( 1 , 1 ) {\displaystyle (1,1)} to ( M , N ) {\displaystyle (M,N)} , such that each step is one of ( 0 , 1 ) , ( 1 , 0 ) , ( 1 , 1 ) {\displaystyle (0,1),(1,0),(1,1)} . In this formulation, we see that the number of possible matches is the Delannoy number. The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values. The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the triangle inequality to hold. In addition to a similarity measure between the two sequences (a so called "warping path" is produced), by warping according to this path the two signals may be aligned in time. The signal with an original set of points X(original), Y(original) is transformed to X(warped), Y(warped). This finds applications in genetic sequence and audio synchronisation. In a related technique sequences of varying speed may be averaged using this technique see the average sequence section. This is conceptually very similar to the Needleman–Wunsch algorithm. == Implementation == This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and y, d ( x , y ) {\displaystyle d(x,y)} is a distance between the symbols, e.g., d ( x , y ) = | x − y | {\displaystyle d(x,y)=|x-y|} . int DTWDistance(s: array [1..n], t: array [1..m]) { DTW := array [0..n, 0..m] for i := 0 to n for j := 0 to m DTW[i, j] := infinity DTW[0, 0] := 0 for i := 1 to n for j := 1 to m cost := d(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion DTW[i , j-1], // deletion DTW[i-1, j-1]) // match return DTW[n, m] } where DTW[i, j] is the distance between s[1:i] and t[1:j] with the best alignment. We sometimes want to add a locality constraint. That is, we require that if s[i] is matched with t[j], then | i − j | {\displaystyle |i-j|} is no larger than w, a window parameter. We can easily modify the above algorithm to add a locality constraint (differences marked). However, the above given modification works only if | n − m | {\displaystyle |n-m|} is no larger than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that | n − m | ≤ w {\displaystyle |n-m|\leq w} (see the line marked with () in the code). int DTWDistance(s: array [1..n], t: array [1..m], w: int) { DTW := array [0..n, 0..m] w := max(w, abs(n-m)) // adapt window size () for i := 0 to n for j:= 0 to m DTW[i, j] := infinity DTW[0, 0] := 0 for i := 1 to n for j := max(1, i-w) to min(m, i+w) DTW[i, j] := 0 for i := 1 to n for j := max(1, i-w) to min(m, i+w) cost := d(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion DTW[i , j-1], // deletion DTW[i-1, j-1]) // match return DTW[n, m] } == Warping properties == The DTW algorithm produces a discrete matching between existing elements of one series to another. In other words, it does not allow time-scaling of segments within the sequence. Other methods allow continuous warping. For example, Correlation Optimized Warping (COW) divides the sequence into uniform segments that are scaled in time using linear interpolation, to produce the best matching warping. The segment scaling causes potential creation of new elements, by time-scaling segments either down or up, and thus produces a more sensitive warping than DTW's discrete matching of raw elements. == Complexity == The time complexity of the DTW algorithm is O ( N M ) {\displaystyle O(NM)} , where N {\displaystyle N} and M {\displaystyle M} are the lengths of the two input sequences. The 50 years old quadratic time bound was broken in 2016: an algorithm due to Gold and Sharir enables computing DTW in O ( N 2 / log ⁡ log ⁡ N ) {\displaystyle O({N^{2}}/\log \log N)} time and space for two input sequences of length N {\displaystyle N} . This algorithm can also be adapted to sequences of different lengths. Despite this improvement, it was shown that a strongly subquadratic running time of the form O ( N 2 − ϵ ) {\displaystyle O(N^{2-\epsilon })} for some ϵ > 0 {\displaystyle \epsilon >0} cannot exist unless the Strong exponential time hypothesis fails. While the dynamic programming algorithm for DTW requires O ( N M ) {\displaystyle O(NM)} space in a naive implementation, the space consumption can be reduced to O ( min ( N , M ) ) {\displaystyle O(\min(N,M))} using Hirschberg's algorithm. == Fast computation == Fast techniques for computing DTW include PrunedDTW, SparseDTW, FastDTW, and the MultiscaleDTW. A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh, LB_Improved, or LB_Petitjean. However, the Early Abandon and Pruned DTW algorithm reduces the degree of acceleration that lower bounding provides and sometimes renders it ineffective. In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient. Subsequent to this survey, the LB_Enhanced bound was developed that is always tighter than LB_Keogh while also being more efficient to compute. LB_Petitjean is the tightest known lower bound that can be computed in linear time. == Average sequence == Averaging for dynamic time warping is the problem of finding an average sequence for a set of sequences. NLAAF is an exact method to average two sequences using DTW. For more than two sequences, the problem is related to that of multiple alignment and requires heuristics. DBA is currently a reference method to average a set of sequences consistently with DTW. COMASA efficiently randomizes the search for the average sequence, using DBA as a local optimization process. == Supervised learning == A nearest-neighbour classifier can achieve state-of-the-art performance when using dynamic time warping as a distance measure. == Amerced Dynamic Time Warping == Amerced Dynamic Time Warping (ADTW) is a variant of DTW designed to better control DTW's permissiveness in the alignments that it allows. The windows that classical DTW uses to constrain alignments introduce a step function. Any warping of the path is allowed within the window and none beyond it. In contrast, ADTW employs an additive penalty that is incurred each time that the path is warped. Any amount of warping is allowed, but each warping action incurs a direct penalty. ADTW significantly outperforms DTW with windowing when applied as a nearest neighbor classifier on a set of benchmark time series classification tasks. == Alternative approaches == In functional data analysis, time series are regarde

    Read more →
  • Spatial Analysis of Principal Components

    Spatial Analysis of Principal Components

    Spatial Principal Component Analysis (sPCA) is a multivariate statistical technique that complements the traditional Principal Component Analysis (PCA) by incorporating spatial information into the analysis of genetic variation. While traditional PCA can be used to find spatial patterns, it focuses on reducing data dimensionality by identifying uncorrelated principal components that capture maximum variance, thus often lacking power to identify non-trivial spatial genetic patterns. By accounting for spatial autocorrelation, sPCA is able to uncover spatial patterns in the data and find the spatial structure of datasets where observations are either geographically or topologically linked. This statistical power improvement allows the investigation of cryptic spatial patterns of genetic variability otherwise overlooked. sPCA has been applied in various fields, including geography, ecology and genetics. == History == sPCA was introduced in 2008 by Thibaut Jombart, Sébastien Devillard, Anne-Béatrice Dufour, and D. Pontier as a spatially explicit method to investigate the spatial pattern of genetic variation among individuals or populations. In 2017, Valeria Montano and Thibaut Jombart published an alternative non-parametric test to evaluate the significance of global and local spatial genetic patterns with improved statistical power. == Details == sPCA modifies the PCA framework by integrating spatial weights, typically in the form of connectivity matrices or spatial adjacency graphs. It identifies principal components (PCs) that maximize both genentic variance and spatial autocorreation, as measured by Moran's I. These weights represent relationships between observations based on geographic distance or other spatial criteria. The method decomposes variance into two components: Global structures, correspond to positive autocorrelation, that is, reflect broad-scale spatial patterns where similar values cluster over large regions. Local structures, correspond to negative autocorrelation, that is, capture fine-scale spatial variations or localized patterns. The core of sPCA relies on the eigenanalysis of a spatially weighted covariance or correlation matrix. The spatial weight matrix can be constructed using techniques such as Delaunay triangulation, nearest-neighbor graphs, or distance-based criteria. Applications of sPCA should be used only as an explorative tool. == Applications == sPCA has been widely used in many fields, including: Ecology: To find spatial patterns in species distributions and environmental gradients. Genetics: Population structure and gene flow analysis while allowing for spatial autocorrelation considerations. Biogeography: To identify historical dispersal routes, and barriers to gene flow, providing insights into species distribution patterns and evolutionary history. == Software/Source Code == sPCA implementations are available in R in adegenet and ntbox . These tools facilitate the application of sPCA by providing functions for constructing spatial weight matrices, performing eigenanalysis, and obtaining spatial principal components in an easy-to-read form.

    Read more →
  • Computer security compromised by hardware failure

    Computer security compromised by hardware failure

    Computer security compromised by hardware failure is a branch of computer security applied to hardware. The objective of computer security includes protection of information and property from theft, corruption, or natural disaster, while allowing the information and property to remain accessible and productive to its intended users. Such secret information could be retrieved by different ways. This article focus on the retrieval of data thanks to misused hardware or hardware failure. Hardware could be misused or exploited to get secret data. This article collects main types of attack that can lead to data theft. Computer security can be compromised by devices, such as keyboards, monitors or printers (thanks to electromagnetic or acoustic emanation for example) or by components of the computer, such as the memory, the network card or the processor (thanks to time or temperature analysis for example). == Devices == === Monitor === The monitor is the main device used to access data on a computer. It has been shown that monitors radiate or reflect data on their environment, potentially giving attackers access to information displayed on the monitor. ==== Electromagnetic emanations ==== Video display units radiate: narrowband harmonics of the digital clock signals; broadband harmonics of the various 'random' digital signals such as the video signal. Known as compromising emanations or TEMPEST radiation, a code word for a U.S. government programme aimed at attacking the problem, the electromagnetic broadcast of data has been a significant concern in sensitive computer applications. Eavesdroppers can reconstruct video screen content from radio frequency emanations. Each (radiated) harmonic of the video signal shows a remarkable resemblance to a broadcast TV signal. It is therefore possible to reconstruct the picture displayed on the video display unit from the radiated emission by means of a normal television receiver. If no preventive measures are taken, eavesdropping on a video display unit is possible at distances up to several hundreds of meters, using only a normal black-and-white TV receiver, a directional antenna and an antenna amplifier. It is even possible to pick up information from some types of video display units at a distance of over 1 kilometer. If more sophisticated receiving and decoding equipment is used, the maximum distance can be much greater. ==== Compromising reflections ==== What is displayed by the monitor is reflected on the environment. The time-varying diffuse reflections of the light emitted by a CRT monitor can be exploited to recover the original monitor image. This is an eavesdropping technique for spying at a distance on data that is displayed on an arbitrary computer screen, including the currently prevalent LCD monitors. The technique exploits reflections of the screen's optical emanations in various objects that one commonly finds close to the screen and uses those reflections to recover the original screen content. Such objects include eyeglasses, tea pots, spoons, plastic bottles, and even the eye of the user. This attack can be successfully mounted to spy on even small fonts using inexpensive, off-the-shelf equipment (less than 1500 dollars) from a distance of up to 10 meters. Relying on more expensive equipment allowed to conduct this attack from over 30 meters away, demonstrating that similar attacks are feasible from the other side of the street or from a close by building. Many objects that may be found at a usual workplace can be exploited to retrieve information on a computer's display by an outsider. Particularly good results were obtained from reflections in a user's eyeglasses or a tea pot located on the desk next to the screen. Reflections that stem from the eye of the user also provide good results. However, eyes are harder to spy on at a distance because they are fast-moving objects and require high exposure times. Using more expensive equipment with lower exposure times helps to remedy this problem. The reflections gathered from curved surfaces on close by objects indeed pose a substantial threat to the confidentiality of data displayed on the screen. Fully invalidating this threat without at the same time hiding the screen from the legitimate user seems difficult, without using curtains on the windows or similar forms of strong optical shielding. Most users, however, will not be aware of this risk and may not be willing to close the curtains on a nice day. The reflection of an object, a computer display, in a curved mirror creates a virtual image that is located behind the reflecting surface. For a flat mirror this virtual image has the same size and is located behind the mirror at the same distance as the original object. For curved mirrors, however, the situation is more complex. === Keyboard === ==== Electromagnetic emanations ==== Computer keyboards are often used to transmit confidential data such as passwords. Since they contain electronic components, keyboards emit electromagnetic waves. These emanations could reveal sensitive information such as keystrokes. Electromagnetic emanations have turned out to constitute a security threat to computer equipment. The figure below presents how a keystroke is retrieved and what material is necessary. The approach is to acquire the raw signal directly from the antenna and to process the entire captured electromagnetic spectrum. Thanks to this method, four different kinds of compromising electromagnetic emanations have been detected, generated by wired and wireless keyboards. These emissions lead to a full or a partial recovery of the keystrokes. The best practical attack fully recovered 95% of the keystrokes of a PS/2 keyboard at a distance up to 20 meters, even through walls. Because each keyboard has a specific fingerprint based on the clock frequency inconsistencies, it can determine the source keyboard of a compromising emanation, even if multiple keyboards from the same model are used at the same time. The four different kinds way of compromising electromagnetic emanations are described below. ===== The Falling Edge Transition Technique ===== When a key is pressed, released or held down, the keyboard sends a packet of information known as a scan code to the computer. The protocol used to transmit these scan codes is a bidirectional serial communication, based on four wires: Vcc (5 volts), ground, data and clock. Clock and data signals are identically generated. Hence, the compromising emanation detected is the combination of both signals. However, the edges of the data and the clock lines are not superposed. Thus, they can be easily separated to obtain independent signals. ===== The Generalized Transition Technique ===== The Falling Edge Transition attack is limited to a partial recovery of the keystrokes. This is a significant limitation. The GTT is a falling edge transition attack improved, which recover almost all keystrokes. Indeed, between two traces, there is exactly one data rising edge. If attackers are able to detect this transition, they can fully recover the keystrokes. ===== The Modulation Technique ===== Harmonics compromising electromagnetic emissions come from unintentional emanations such as radiations emitted by the clock, non-linear elements, crosstalk, ground pollution, etc. Determining theoretically the reasons of these compromising radiations is a very complex task. These harmonics correspond to a carrier of approximately 4 MHz which is very likely the internal clock of the micro-controller inside the keyboard. These harmonics are correlated with both clock and data signals, which describe modulated signals (in amplitude and frequency) and the full state of both clock and data signals. This means that the scan code can be completely recovered from these harmonics. ===== The Matrix Scan Technique ===== Keyboard manufacturers arrange the keys in a matrix. The keyboard controller, often an 8-bit processor, parses columns one-by-one and recovers the state of 8 keys at once. This matrix scan process can be described as 192 keys (some keys may not be used, for instance modern keyboards use 104/105 keys) arranged in 24 columns and 8 rows. These columns are continuously pulsed one-by-one for at least 3μs. Thus, these leads may act as an antenna and generate electromagnetic emanations. If an attacker is able to capture these emanations, he can easily recover the column of the pressed key. Even if this signal does not fully describe the pressed key, it still gives partial information on the transmitted scan code, i.e. the column number. Note that the matrix scan routine loops continuously. When no key is pressed, we still have a signal composed of multiple equidistant peaks. These emanations may be used to remotely detect the presence of powered computers. Concerning wireless keyboards, the wireless data burst transmission can be used as an electromagnetic trigger to detect exactly when a key is pressed, while the matrix s

    Read more →
  • Handwriting recognition

    Handwriting recognition

    Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning (optical character recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words. == Offline recognition == Offline handwriting recognition involves the automatic conversion of text in an image into letter codes that are usable within computer and text-processing applications. The data obtained by this form is regarded as a static representation of handwriting. Offline handwriting recognition is comparatively difficult, as different people have different handwriting styles. And, as of today, OCR engines are primarily focused on machine printed text and ICR for hand "printed" (written in capital letters) text. === Traditional techniques === ==== Character extraction ==== Offline character recognition often involves scanning a form or document. This means the individual characters contained in the scanned image will need to be extracted. Tools exist that are capable of performing this step. However, there are several common imperfections in this step. The most common is when characters that are connected are returned as a single sub-image containing both characters. This causes a major problem in the recognition stage. Yet many algorithms are available that reduce the risk of connected characters. ==== Character recognition ==== After individual characters have been extracted, a recognition engine is used to identify the corresponding computer character. Several different recognition techniques are currently available. ===== Feature extraction ===== Feature extraction works in a similar fashion to neural network recognizers. However, programmers must manually determine the properties they feel are important. This approach gives the recognizer more control over the properties used in identification. Yet any system using this approach requires substantially more development time than a neural network because the properties are not learned automatically. === Modern techniques === Where traditional techniques focus on segmenting individual characters for recognition, modern techniques focus on recognizing all the characters in a segmented line of text. Particularly they focus on machine learning techniques that are able to learn visual features, avoiding the limiting feature engineering previously used. State-of-the-art methods use convolutional networks to extract visual features over several overlapping windows of a text line image which a recurrent neural network uses to produce character probabilities. == Online recognition == Online handwriting recognition involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching. This kind of data is known as digital ink and can be regarded as a digital representation of handwriting. The obtained signal is converted into letter codes that are usable within computer and text-processing applications. The elements of an online handwriting recognition interface typically include: a pen or stylus for the user to write with a touch sensitive surface, which may be integrated with, or adjacent to, an output display. a software application which interprets the movements of the stylus across the writing surface, translating the resulting strokes into digital text. The process of online handwriting recognition can be broken down into a few general steps: preprocessing, feature extraction and classification The purpose of preprocessing is to discard irrelevant information in the input data, that can negatively affect the recognition. This concerns speed and accuracy. Preprocessing usually consists of binarization, normalization, sampling, smoothing and denoising. The second step is feature extraction. Out of the two- or higher-dimensional vector field received from the preprocessing algorithms, higher-dimensional data is extracted. The purpose of this step is to highlight important information for the recognition model. This data may include information like pen pressure, velocity or the changes of writing direction. The last big step is classification. In this step, various models are used to map the extracted features to different classes and thus identifying the characters or words the features represent. === Hardware === Commercial products incorporating handwriting recognition as a replacement for keyboard input were introduced in the early 1980s. Examples include handwriting terminals such as the Pencept Penpad and the Inforite point-of-sale terminal. With the advent of the large consumer market for personal computers, several commercial products were introduced to replace the keyboard and mouse on a personal computer with a single pointing/handwriting system, such as those from Pencept, CIC and others. The first commercially available tablet-type portable computer was the Write-Top from Linus Technologies, released in July 1988. Its operating system was based on MS-DOS. In the early 1990s, hardware makers including NCR, IBM and EO released tablet computers running the PenPoint operating system developed by GO Corp. PenPoint used handwriting recognition and gestures throughout and provided the facilities to third-party software. IBM's tablet computer was the first to use the ThinkPad name and used IBM's handwriting recognition. This recognition system was later ported to Microsoft Windows for Pen Computing, and IBM's Pen for OS/2. None of these were commercially successful. Advancements in electronics allowed the computing power necessary for handwriting recognition to fit into a smaller form factor than tablet computers, and handwriting recognition is often used as an input method for hand-held PDAs. The first PDA to provide written input was the Apple Newton, which exposed the public to the advantage of a streamlined user interface. However, the device was not a commercial success, owing to the unreliability of the software, which tried to learn a user's writing patterns. By the time of the release of the Newton OS 2.0, wherein the handwriting recognition was greatly improved, including unique features still not found in current recognition systems such as modeless error correction, the largely negative first impression had been made. After discontinuation of Apple Newton, the feature was incorporated in Mac OS X 10.2 and later as Inkwell. Palm later launched a successful series of PDAs based on the Graffiti recognition system. Graffiti improved usability by defining a set of "unistrokes", or one-stroke forms, for each character. This narrowed the possibility for erroneous input, although memorization of the stroke patterns did increase the learning curve for the user. The Graffiti handwriting recognition was found to infringe on a patent held by Xerox, and Palm replaced Graffiti with a licensed version of the CIC handwriting recognition which, while also supporting unistroke forms, pre-dated the Xerox patent. The court finding of infringement was reversed on appeal, and then reversed again on a later appeal. The parties involved subsequently negotiated a settlement concerning this and other patents. A Tablet PC is a notebook computer with a digitizer tablet and a stylus, which allows a user to handwrite text on the unit's screen. The operating system recognizes the handwriting and converts it into text. Windows Vista and Windows 7 include personalization features that learn a user's writing patterns or vocabulary for English, Japanese, Chinese Traditional, Chinese Simplified and Korean. The features include a "personalization wizard" that prompts for samples of a user's handwriting and uses them to retrain the system for higher accuracy recognition. This system is distinct from the less advanced handwriting recognition system employed in its Windows Mobile OS for PDAs. Although handwriting recognition is an input form that the public has become accustomed to, it has not achieved widespread use in either desktop computers or laptops. It is still generally accepted that keyboard input is both faster and more reliable. As of 2006, many PDAs offer handwriting input, sometimes even accepting natural cursive handwriting, but accuracy is still a problem, and some people still find even a simple on-screen keyboard more efficient. === Software === Early software could understand print handwriting where the characters were separated; however, cursive handwriting

    Read more →
  • Recursive neural network

    Recursive neural network

    A recursive neural network is a kind of deep neural network created by applying the same set of weights recursively over a structured input, to produce a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order. These networks were first introduced to learn distributed representations of structure (such as logical terms), but have been successful in multiple applications, for instance in learning sequence and tree structures in natural language processing (mainly continuous representations of phrases and sentences based on word embeddings). == Architectures == === Basic === In the simplest architecture, nodes are combined into parents using a weight matrix (which is shared across the whole network) and a non-linearity such as the tanh {\displaystyle \tanh } hyperbolic function. If c 1 {\displaystyle c_{1}} and c 2 {\displaystyle c_{2}} are n {\displaystyle n} -dimensional vector representations of nodes, their parent will also be an n {\displaystyle n} -dimensional vector, defined as: p 1 , 2 = tanh ⁡ ( W [ c 1 ; c 2 ] ) {\displaystyle p_{1,2}=\tanh(W[c_{1};c_{2}])} where W {\displaystyle W} is a learned n × 2 n {\displaystyle n\times 2n} weight matrix. This architecture, with a few improvements, has been used for successfully parsing natural scenes, syntactic parsing of natural language sentences, and recursive autoencoding and generative modeling of 3D shape structures in the form of cuboid abstractions. === Recursive cascade correlation (RecCC) === RecCC is a constructive neural network approach to deal with tree domains with pioneering applications to chemistry and extension to directed acyclic graphs. === Unsupervised RNN === A framework for unsupervised RNN has been introduced in 2004. === Tensor === Recursive neural tensor networks use a single tensor-based composition function for all nodes in the tree. == Training == === Stochastic gradient descent === Typically, stochastic gradient descent (SGD) is used to train the network. The gradient is computed using backpropagation through structure (BPTS), a variant of backpropagation through time used for recurrent neural networks. == Properties == The universal approximation capability of RNNs over trees has been proved in literature. == Related models == === Recurrent neural networks === Recurrent neural networks are recursive artificial neural networks with a certain structure: that of a linear chain. Whereas recursive neural networks operate on any hierarchical structure, combining child representations into parent representations, recurrent neural networks operate on the linear progression of time, combining the previous time step and a hidden representation into the representation for the current time step. === Tree Echo State Networks === An efficient approach to implement recursive neural networks is given by the Tree Echo State Network within the reservoir computing paradigm. === Extension to graphs === Extensions to graphs include graph neural network (GNN), Neural Network for Graphs (NN4G), and more recently convolutional neural networks for graphs.

    Read more →
  • LamaH

    LamaH

    LamaH (Large-Sample Data for Hydrology and Environmental Sciences) is a cross-state initiative for unified data preparation and collection in the field of catchment hydrology. Hydrological datasets, for example, are an integral component for creating flood forecasting models. == Features == LamaH datasets always consist of a combination of meteorological time series (e.g., precipitation, temperature) and hydrologically relevant catchment attributes (e.g., elevation, slope, forest area, soil, bedrock) aggregated over the respective catchment as well as associated hydrological time series at the catchment outlet (discharge). By evaluating the large and heterogeneous sample (large-sample) of catchments, it is possible to gain insights into the hydrological cycle that would probably not be achievable with local and small-scale studies. The structure of the dataset allows an evaluation based on machine learning methods (deep learning). The accompanying paper explains not only the data preparation but also any limitations, uncertainties and possible applications. == Difference to CAMELS == The LamaH datasets are quite similar to the CAMELS datasets, but additionally feature: Further basin delineations (based on intermediate catchments) and attributes (e.g. flow distance and altitude difference between two topologically adjacent discharge gauges), enabling the setup of an interconnected hydrological network Attributes for classifying catchments and runoff gauges according to the degree and type of (anthropogenic) influence == Availability == LamaH datasets are available for the following regions: Central Europe (Austria and its hydrological upstream areas in Germany, Czech Republic, Switzerland, Slovakia, Italy, Liechtenstein, Slovenia and Hungary) / 859 catchments CAMELS datasets are available for (ranked by publication date): Contiguous USA (exclusive Alaska and Hawaii) / 671 catchments Chile / 516 catchments Brazil / 897 catchments Great Britain / 671 catchments Australia / 222 catchments Both the CAMELS and LamaH datasets are licensed with Creative Commons and are therefore available barrier-free for the public.

    Read more →
  • MovieRide FX

    MovieRide FX

    MovieRide FX is a patented automated special visual effects video compositing engine used in the MovieRide FX mobile application for Android (requires Android 2.3 or later) and iOS (compatible with iPhone 4 and up, iPad, and iPod Touch (new generation), requires iOS 7 or later). MovieRide FX allows the user to personalize a "Hollywood-style" movie clip by inserting themself into the clip as the "actor". == Features == The MovieRide FX app uses the relevant mobile device's camera to record a video of the user and insert it into a pre-packaged "Hollywood style" movie clip. The "actor" is extracted from their recorded video clip through various known effects such as masking, keying, and motion tracking. The "actor" is then inserted into one of the pre-packaged movie clips created by the MovieRide FX visual effects artists. This is done through an automated process requiring little or no artistic or technical skill from the user. The custom movie clips pre-packaged with MovieRide FX offer the user a variety of movie scenarios. Additional clips based on popular television and movie themes are continually being developed and are available on a freemium basis. == Sharing == Once the user's footage has automatically been composited into a movie clip and rendered as an .mp4 file, it can be shared via social media, such as Facebook, YouTube, and Twitter, and by e-mail. == History == === 2012 === MovieRide FX was created by Grant Waterston and Johann Mynhardt, who started development in 2012. === 2013 === The beta version was released on Google Play in July 2013. In August 2013 MovieRide FX was a New Media Award winner in the "New Media" category of the Accolade International Awards in Los Angeles. In October 2013 MovieRide FX was awarded exhibitor space in the ‘start-up village’ at the Apps-World Expo in London. === 2014 === MovieRide FX reached the 100 000 – 500 000 downloads category on the Google Play Store in June 2014. The official Android version was launched in July 2014. iOS version released in August 2014. MovieRide FX was selected as one of the "Top 150" startups at the Pioneer Festival in Vienna in September 2014. In November 2014 MovieRide FX was shortlisted for the Appster Awards in the "Best Entertainment App" and "Most Innovative App" categories and was awarded exhibitor space at the ‘start-up village’ at the Apps-World Expo in London. Patent applications were filed in South Africa, the EU and USA in April 2014. === 2015 === In September 2015 MovieRide FX was shortlisted for "Best Software innovation" at The Technology Expo Awards in London. === 2016 === In April 2016 MovieRide FX was nominated for a National Science and Technology Forum (NSTF) award for 'Research leading to Innovation by a corporate organization' In August 2016 Movie Ride FX won two Gold Awards at the 2016 Mobile Marketing Awards (MMA Smarties SA). These two Gold awards were for the 'Innovation' and 'Best in Show’ categories. In December 2016 FlicJam Inc. was formed in the US to access the larger global market. EU patent application was published in March 2016. === 2017 === South African patent was granted in February 2017. === 2018 === US patent was granted in March 2018.

    Read more →
  • Multidimensional scaling

    Multidimensional scaling

    Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of n {\textstyle n} objects in a set into a configuration of n {\textstyle n} points mapped into an abstract Cartesian space. More technically, MDS refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction. Given a distance matrix with the distances between each pair of objects in a set, and a chosen number of dimensions, N, an MDS algorithm places each object into N-dimensional space (a lower-dimensional representation) such that the between-object distances are preserved as well as possible. For N = 1, 2, and 3, the resulting points can be visualized on a scatter plot. Core theoretical contributions to MDS were made by James O. Ramsay of McGill University, who is also regarded as the founder of functional data analysis. == Types == MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix: === Classical multidimensional scaling === It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain, which is given by Strain D ( x 1 , x 2 , . . . , x n ) = ( ∑ i , j ( b i j − x i T x j ) 2 ∑ i , j b i j 2 ) 1 / 2 , {\displaystyle {\text{Strain}}_{D}(x_{1},x_{2},...,x_{n})={\Biggl (}{\frac {\sum _{i,j}{\bigl (}b_{ij}-x_{i}^{T}x_{j}{\bigr )}^{2}}{\sum _{i,j}b_{ij}^{2}}}{\Biggr )}^{1/2},} where x i {\displaystyle x_{i}} denote vectors in N-dimensional space, x i T x j {\displaystyle x_{i}^{T}x_{j}} denotes the scalar product between x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} , and b i j {\displaystyle b_{ij}} are the elements of the matrix B {\displaystyle B} defined on step 2 of the following algorithm, which are computed from the distances. Steps of a Classical MDS algorithm: Classical MDS uses the fact that the coordinate matrix X {\displaystyle X} can be derived by eigenvalue decomposition from B = X X ′ {\textstyle B=XX'} . And the matrix B {\textstyle B} can be computed from proximity matrix D {\textstyle D} by using double centering. Set up the squared proximity matrix D ( 2 ) = [ d i j 2 ] {\textstyle D^{(2)}=[d_{ij}^{2}]} Apply double centering: B = − 1 2 C D ( 2 ) C {\textstyle B=-{\frac {1}{2}}CD^{(2)}C} using the centering matrix C = I − 1 n J n {\textstyle C=I-{\frac {1}{n}}J_{n}} , where n {\textstyle n} is the number of objects, I {\textstyle I} is the n × n {\textstyle n\times n} identity matrix, and J n {\textstyle J_{n}} is an n × n {\textstyle n\times n} matrix of all ones. Determine the m {\textstyle m} largest eigenvalues λ 1 , λ 2 , . . . , λ m {\textstyle \lambda _{1},\lambda _{2},...,\lambda _{m}} and corresponding eigenvectors e 1 , e 2 , . . . , e m {\textstyle e_{1},e_{2},...,e_{m}} of B {\textstyle B} (where m {\textstyle m} is the number of dimensions desired for the output). Now, X = E m Λ m 1 / 2 {\textstyle X=E_{m}\Lambda _{m}^{1/2}} , where E m {\textstyle E_{m}} is the matrix of m {\textstyle m} eigenvectors and Λ m {\textstyle \Lambda _{m}} is the diagonal matrix of m {\textstyle m} eigenvalues of B {\textstyle B} . Classical MDS assumes metric distances. So this is not applicable for direct dissimilarity ratings. === Metric multidimensional scaling (mMDS) === It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Metric MDS minimizes the cost function called “stress” which is a residual sum of squares: Stress D ( x 1 , x 2 , . . . , x n ) = ∑ i ≠ j = 1 , . . . , n ( d i j − ‖ x i − x j ‖ ) 2 . {\displaystyle {\text{Stress}}_{D}(x_{1},x_{2},...,x_{n})={\sqrt {\sum _{i\neq j=1,...,n}{\bigl (}d_{ij}-\|x_{i}-x_{j}\|{\bigr )}^{2}}}.} Metric scaling uses a power transformation with a user-controlled exponent p {\textstyle p} : d i j p {\textstyle d_{ij}^{p}} and − d i j 2 p {\textstyle -d_{ij}^{2p}} for distance. In classical scaling p = 1. {\textstyle p=1.} Non-metric scaling is defined by the use of isotonic regression to nonparametrically estimate a transformation of the dissimilarities. === Non-metric multidimensional scaling (NMDS) === In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. Let d i j {\displaystyle d_{ij}} be the dissimilarity between points i , j {\displaystyle i,j} . Let d ^ i j = ‖ x i − x j ‖ {\displaystyle {\hat {d}}_{ij}=\|x_{i}-x_{j}\|} be the Euclidean distance between embedded points x i , x j {\displaystyle x_{i},x_{j}} . Now, for each choice of the embedded points x i {\displaystyle x_{i}} and is a monotonically increasing function f {\displaystyle f} , define the "stress" function: S ( x 1 , . . . , x n ; f ) = ∑ i < j ( f ( d i j ) − d ^ i j ) 2 ∑ i < j d ^ i j 2 . {\displaystyle S(x_{1},...,x_{n};f)={\sqrt {\frac {\sum _{i Read more →

  • Absorbing Markov chain

    Absorbing Markov chain

    In the mathematical theory of probability, an absorbing Markov chain is a Markov chain in which every state can reach an absorbing state. An absorbing state is a state that, once entered, cannot be left. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space. However, this article concentrates on the discrete-time discrete-state-space case. == Formal definition == A Markov chain is an absorbing chain if there is at least one absorbing state and it is possible to go from any state to at least one absorbing state in a finite number of steps. In an absorbing Markov chain, a state that is not absorbing is called transient. === Canonical form === Let an absorbing Markov chain with transition matrix P have t transient states and r absorbing states. The rows of P represent sources, while columns represent destinations. By ordering the transient states before the absorbing states, it can be assumed that P has the form P = [ Q R 0 I r ] , {\displaystyle P={\begin{bmatrix}Q&R\\\mathbf {0} &I_{r}\end{bmatrix}},} where Q is a t-by-t matrix, R is a nonzero t-by-r matrix, 0 is an r-by-t zero matrix, and Ir is the r-by-r identity matrix. Thus, Q describes the probability of transitioning from some transient state to another while R describes the probability of transitioning from some transient state to some absorbing state. The probability of transitioning from i to j in exactly k steps is the (i,j)-entry of Pk, further computed below. When considering only transient states, the probability is found in the upper left of Pk, the (i,j)-entry of Qk. == Fundamental matrix == === Expected number of visits to a transient state === A basic property about an absorbing Markov chain is the expected number of visits to a transient state j starting from a transient state i (before being absorbed). This can be established to be given by the (i, j) entry of so-called fundamental matrix N, obtained by summing Qk for all k (from 0 to ∞). It can be proven that N := ∑ k = 0 ∞ Q k = ( I t − Q ) − 1 , {\displaystyle N:=\sum _{k=0}^{\infty }Q^{k}=(I_{t}-Q)^{-1},} where It is the t-by-t identity matrix. The computation of this formula is the matrix equivalent of the geometric series of scalars, ∑ k = 0 ∞ q k = 1 1 − q {\displaystyle {\textstyle \sum }_{k=0}^{\infty }q^{k}={\tfrac {1}{1-q}}} . With the matrix N in hand, also other properties of the Markov chain are easy to obtain. === Expected number of steps before being absorbed === The expected number of steps before being absorbed in any absorbing state, when starting in transient state i can be computed via a sum over transient states. The value is given by the ith entry of the vector t := N 1 , {\displaystyle \mathbf {t} :=N\mathbf {1} ,} where 1 is a length-t column vector whose entries are all 1. === Absorbing probabilities === By induction, P k = [ Q k ( I t − Q k ) N R 0 I r ] . {\displaystyle P^{k}={\begin{bmatrix}Q^{k}&(I_{t}-Q^{k})NR\\\mathbf {0} &I_{r}\end{bmatrix}}.} The probability of eventually being absorbed in the absorbing state j when starting from transient state i is given by the (i,j)-entry of the matrix B := N R {\displaystyle B:=NR} . The number of columns of this matrix equals the number of absorbing states r. An approximation of those probabilities can also be obtained directly from the (i,j)-entry of P k {\displaystyle P^{k}} for a large enough value of k, when i is the index of a transient, and j the index of an absorbing state. This is because ( lim k → ∞ P k ) i , t + j = B i , j {\displaystyle \left(\lim _{k\to \infty }P^{k}\right)_{i,t+j}=B_{i,j}} . === Transient visiting probabilities === The probability of visiting transient state j when starting at a transient state i is the (i,j)-entry of the matrix H := ( N − I t ) ( N dg ) − 1 , {\displaystyle H:=(N-I_{t})(N_{\operatorname {dg} })^{-1},} where Ndg is the diagonal matrix with the same diagonal as N. === Variance on number of transient visits === The variance on the number of visits to a transient state j with starting at a transient state i (before being absorbed) is the (i,j)-entry of the matrix N 2 := N ( 2 N dg − I t ) − N sq , {\displaystyle N_{2}:=N(2N_{\operatorname {dg} }-I_{t})-N_{\operatorname {sq} },} where Nsq is the Hadamard product of N with itself (i.e. each entry of N is squared). === Variance on number of steps === The variance on the number of steps before being absorbed when starting in transient state i is the ith entry of the vector ( 2 N − I t ) t − t sq , {\displaystyle (2N-I_{t})\mathbf {t} -\mathbf {t} _{\operatorname {sq} },} where tsq is the Hadamard product of t with itself (i.e., as with Nsq, each entry of t is squared). == Examples == === String generation === Consider the process of repeatedly flipping a fair coin until the sequence (heads, tails, heads) appears. This process is modeled by an absorbing Markov chain with transition matrix P = [ 1 / 2 1 / 2 0 0 0 1 / 2 1 / 2 0 1 / 2 0 0 1 / 2 0 0 0 1 ] . {\displaystyle P={\begin{bmatrix}1/2&1/2&0&0\\0&1/2&1/2&0\\1/2&0&0&1/2\\0&0&0&1\end{bmatrix}}.} The first state represents the empty string, the second state the string "H", the third state the string "HT", and the fourth state the string "HTH". Although in reality, the coin flips cease after the string "HTH" is generated, the perspective of the absorbing Markov chain is that the process has transitioned into the absorbing state representing the string "HTH" and, therefore, cannot leave. For this absorbing Markov chain, the fundamental matrix is N = ( I − Q ) − 1 = ( [ 1 0 0 0 1 0 0 0 1 ] − [ 1 / 2 1 / 2 0 0 1 / 2 1 / 2 1 / 2 0 0 ] ) − 1 = [ 1 / 2 − 1 / 2 0 0 1 / 2 − 1 / 2 − 1 / 2 0 1 ] − 1 = [ 4 4 2 2 4 2 2 2 2 ] . {\displaystyle {\begin{aligned}N&=(I-Q)^{-1}=\left({\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}}-{\begin{bmatrix}1/2&1/2&0\\0&1/2&1/2\\1/2&0&0\end{bmatrix}}\right)^{-1}\\[4pt]&={\begin{bmatrix}1/2&-1/2&0\\0&1/2&-1/2\\-1/2&0&1\end{bmatrix}}^{-1}={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}.\end{aligned}}} The expected number of steps starting from each of the transient states is t = N 1 = [ 4 4 2 2 4 2 2 2 2 ] [ 1 1 1 ] = [ 10 8 6 ] . {\displaystyle \mathbf {t} =N\mathbf {1} ={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}={\begin{bmatrix}10\\8\\6\end{bmatrix}}.} Therefore, the expected number of coin flips before observing the sequence (heads, tails, heads) is 10, the entry for the state representing the empty string. === Games of chance === Games based entirely on chance can be modeled by an absorbing Markov chain. A classic example of this is the ancient Indian board game Snakes and Ladders. The graph on the left plots the probability mass in the lone absorbing state that represents the final square as the transition matrix is raised to larger and larger powers. To determine the expected number of turns to complete the game, compute the vector t as described above and examine tstart, which is approximately 39.2. === Infectious disease testing === Infectious disease testing, either of blood products or in medical clinics, is often taught as an example of an absorbing Markov chain. The public U.S. Centers for Disease Control and Prevention (CDC) model for HIV and for hepatitis B, for example, illustrates the property that absorbing Markov chains can lead to the detection of disease, versus the loss of detection through other means. In the standard CDC model, the Markov chain has five states, a state in which the individual is uninfected, then a state with infected but undetectable virus, a state with detectable virus, and absorbing states of having quit/been lost from the clinic, or of having been detected (the goal). The typical rates of transition between the Markov states are the probability p per unit time of being infected with the virus, w for the rate of window period removal (time until virus is detectable), q for quit/loss rate from the system, and d for detection, assuming a typical rate λ {\displaystyle \lambda } at which the health system administers tests of the blood product or patients in question. It follows that we can "walk along" the Markov model to identify the overall probability of detection for a person starting as undetected, by multiplying the probabilities of transition to each next state of the model as: p ( p + q ) w ( w + q ) d ( d + q ) {\displaystyle {\frac {p}{(p+q)}}{\frac {w}{(w+q)}}{\frac {d}{(d+q)}}} . The subsequent total absolute number of false negative tests—the primary CDC concern—would then be the rate of tests, multiplied by the probability of reaching the infected but undetectable state, times the duration of staying in the infected undetectable state: p ( p + q ) 1 ( w + q ) λ {\displaystyle {\frac {p}{(p+q)}}{\frac {1}{(w+q)}}\lambda } .

    Read more →
  • Multifactor dimensionality reduction

    Multifactor dimensionality reduction

    Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. MDR was designed specifically to identify nonadditive interactions among discrete variables that influence a binary outcome and is considered a nonparametric and model-free alternative to traditional statistical methods such as logistic regression. The basis of the MDR method is a constructive induction or feature engineering algorithm that converts two or more variables or attributes to a single attribute. This process of constructing a new attribute changes the representation space of the data. The end goal is to create or discover a representation that facilitates the detection of nonlinear or nonadditive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data. == Illustrative example == Consider the following simple example using the exclusive OR (XOR) function. XOR is a logical operator that is commonly used in data mining and machine learning as an example of a function that is not linearly separable. The table below represents a simple dataset where the relationship between the attributes (X1 and X2) and the class variable (Y) is defined by the XOR function such that Y = X1 XOR X2. Table 1 A machine learning algorithm would need to discover or approximate the XOR function in order to accurately predict Y using information about X1 and X2. An alternative strategy would be to first change the representation of the data using constructive induction to facilitate predictive modeling. The MDR algorithm would change the representation of the data (X1 and X2) in the following manner. MDR starts by selecting two attributes. In this simple example, X1 and X2 are selected. Each combination of values for X1 and X2 are examined and the number of times Y=1 and/or Y=0 is counted. In this simple example, Y=1 occurs zero times and Y=0 occurs once for the combination of X1=0 and X2=0. With MDR, the ratio of these counts is computed and compared to a fixed threshold. Here, the ratio of counts is 0/1 which is less than our fixed threshold of 1. Since 0/1 < 1 we encode a new attribute (Z) as a 0. When the ratio is greater than one we encode Z as a 1. This process is repeated for all unique combinations of values for X1 and X2. Table 2 illustrates our new transformation of the data. Table 2 The machine learning algorithm now has much less work to do to find a good predictive function. In fact, in this very simple example, the function Y = Z has a classification accuracy of 1. A nice feature of constructive induction methods such as MDR is the ability to use any data mining or machine learning method to analyze the new representation of the data. Decision trees, neural networks, or a naive Bayes classifier could be used in combination with measures of model quality such as balanced accuracy and mutual information. == Machine learning with MDR == As illustrated above, the basic constructive induction algorithm in MDR is very simple. However, its implementation for mining patterns from real data can be computationally complex. As with any machine learning algorithm there is always concern about overfitting. That is, machine learning algorithms are good at finding patterns in completely random data. It is often difficult to determine whether a reported pattern is an important signal or just chance. One approach is to estimate the generalizability of a model to independent datasets using methods such as cross-validation. Models that describe random data typically don't generalize. Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the chance to overfit. Permutation testing makes it possible to generate an empirical p-value for the result. Replication in independent data may also provide evidence for an MDR model but can be sensitive to difference in the data sets. These approaches have all been shown to be useful for choosing and evaluating MDR models. An important step in a machine learning exercise is interpretation. Several approaches have been used with MDR including entropy analysis and pathway analysis. Tips and approaches for using MDR to model gene-gene interactions have been reviewed. == Extensions to MDR == Numerous extensions to MDR have been introduced. These include family-based methods, fuzzy methods, covariate adjustment, odds ratios, risk scores, survival methods, robust methods, methods for quantitative traits, and many others. == Applications of MDR == MDR has mostly been applied to detecting gene-gene interactions or epistasis in genetic studies of common human diseases such as atrial fibrillation, autism, bladder cancer, breast cancer, cardiovascular disease, hypertension, obesity, pancreatic cancer, prostate cancer and tuberculosis. It has also been applied to other biomedical problems such as the genetic analysis of pharmacology outcomes. A central challenge is the scaling of MDR to big data such as that from genome-wide association studies (GWAS). Several approaches have been used. One approach is to filter the features prior to MDR analysis. This can be done using biological knowledge through tools such as BioFilter. It can also be done using computational tools such as ReliefF. Another approach is to use stochastic search algorithms such as genetic programming to explore the search space of feature combinations. Yet another approach is a brute-force search using high-performance computing. == Implementations == www.epistasis.org provides an open-source and freely-available MDR software package. An R package for MDR. An sklearn-compatible Python implementation. An R package for Model-Based MDR. MDR in Weka. Generalized MDR.

    Read more →
  • Trebel (music app)

    Trebel (music app)

    Trebel is an on-demand music download and discovery platform developed by M&M Media Inc. The company's business model aims to combat digital music piracy by giving users access to on-demand music at no cost while delivering fair compensation to artists and music rights holders. Trebel has a patent that allows it to market itself as the only international music service in which users can legally download music and listen to it offline for free. As of March 2023, Trebel has a catalog of 75 million songs from record labels such as Universal Music Group, Sony Music Entertainment, Warner Music Group and hundreds of independent labels. Trebel is based in Stamford, Connecticut. with additional locations in Mexico City, Jakarta, Bogota, Los Angeles and Miami. The app is available in the Apple App Store, Google Play Store, and Huawei AppGallery. == History == Trebel was founded in 2014 by Gary Mekikian, who was previously the co-founder of answerFriend, Inc., which commercialized web based question-answering technologies and merged with Electric Knowledge, forming InQuira. This company was eventually acquired by Oracle Corporation in 2011. His co-founders at Trebel include Stanford classmates Corey Jones and Luis Soto Durazo, as well as his daughters Grace and Juliette. Mekikian envisioned Trebel as an alternative to music piracy after a high school classmate of his daughters was targeted by cyberattackers while illegally downloading music online. Trebel was initially released in 2015 under the name Project Carmen to students at Ohio State, Santa Monica College, Cal State Fullerton, UCLA and Long Beach State. In its original incarnation, the service planned to target students at 3,000 universities and 30,000 high schools in the United States. A beta version of the app was introduced in 2016 with content from Universal Music Group and Warner Music Group. Trebel launched commercially in the United States and Mexico in 2018. In 2018, Mexican mass-media corporation Televisa also became a minority investor in Trebel. In May 2020, during the early months of the COVID-19 pandemic, Trebel was a digital broadcast partner for Se Agradece, a concert produced in Mexico by Televisa to honor frontline COVID workers that featured artists such as Rosalia, J Balvin, Maluma and Ricky Martin. In June 2021, Trebel reached 3 million monthly active users. In October 2021, Trebel signed a music licensing agreement with Merlin Network, the licensing agency for the independent music sector that controls an estimated 12% of the global digital recorded music market. In January 2022, Trebel announced a strategic alliance with MNC Corporation, an Indonesian media conglomerate, which also became a minority backer of the company. In March 2022, Trebel reported 5.2 million monthly active users as a result of growth in Latin America. In the same month,, Latin music star Maluma became a backer of Trebel and an advisor to Gary Mekikian, helping expand the service throughout Latin America. On April 18, 2022, Trebel launched in Indonesia during the finale of the music competition show X Factor Indonesia. Trebel also signed a deal that month with Soccer Media Solutions, a sports and entertainment marketing agency in Mexico, to sell Trebel’s premium advertising inventory through Soccer Media. In May 2022, Guillermo Ochoa, goalkeeper for the Mexican national soccer team, invested in Trebel and became an ambassador for the company. On October 2, 2022, Trebel collaborated with Musica Studios, one of the largest music companies in Indonesia, on the production of a music festival in Jakarta titled Trebel Music Fest. The event featured performances by top Indonesian music artists such as Noah, Nidji, and d'Masiv. In October 2022, Trebel launched in Colombia. The service reached 1.2 million monthly active users in Colombia six months after launching. In December 2022, Trebel collaborated with KFC in Indonesia on the release of a KFC digital music program using a product called Trebel Max. As part of the program, KFC customers who bought the Crazy Superstar Combo package at KFC received a subscription to Trebel Max for 30 days. Trebel announced the launch of Trebel AI in May 2023. Trebel AI uses ChatGPT-powered technology to generate playlists based on natural language queries from users. In Indonesia, the Trebel AI feature was announced during a broadcast of the show Indonesian Idol XII that took place on May 8, 2023. In July 2023, Trebel reached more than 13 million monthly active users. In November 2023, Trebel became a featured app on the Discord app directory. Discord users that add the Trebel bot to their servers have access to Trebel's on-demand music library and have the exclusive privilege of being DJ's during server sessions with up to 150 concurrent listeners. == Platform == === Features === Trebel has a patent that allows it to market itself as the only international music service in which users can legally download music and listen to it offline for free. As of March 2023, Trebel has a catalog of 75 million songs from record labels such as Universal Music Group, Sony Music Entertainment, Warner Music Group and hundreds of independent labels. Trebel offers unlimited music downloads that are playable in the app by registered users only. Offline listening is free to all users and not blocked by a paywall. Users can search for music based on song, artist, album, browsing friends' recent activity, and through other users' playlists. The app also offers free cloud storage for downloaded songs. Trebel also contains a feature called SongID, which identifies music being played nearby using a short sample, then offers it for download on the service. Podcasts are available for free listening on the service as well. === Business model === Trebel uses a business model that generates revenue from the sale of digital advertising as well as user interactions with branded experiences, and consumption of virtual goods within the app (akin to mobile games). The app also features a brand takeover feature called Trebel Max, which offers unlimited access in exchange for users engaging with experiences offered by specific brands. Trebel’s brand partners include Uber, KFC, Walmart, Coca-Cola, Amazon and P&G. === Content === In September 2022, Trebel secured an exclusive release of the song “Suara Hatiku” by Indonesian actress Amanda Monopo. As of March 2023, Trebel offers 75 million songs through licensing agreements with Universal Music Group, Sony Music Entertainment, Warner Music Group and global indie rights agency Merlin. == Awards == In 2023, Trebel won three Google Play awards including "Best App of 2023", "Best Everyday Essentials" and "Users' Choice".

    Read more →
  • Absorbing Markov chain

    Absorbing Markov chain

    In the mathematical theory of probability, an absorbing Markov chain is a Markov chain in which every state can reach an absorbing state. An absorbing state is a state that, once entered, cannot be left. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space. However, this article concentrates on the discrete-time discrete-state-space case. == Formal definition == A Markov chain is an absorbing chain if there is at least one absorbing state and it is possible to go from any state to at least one absorbing state in a finite number of steps. In an absorbing Markov chain, a state that is not absorbing is called transient. === Canonical form === Let an absorbing Markov chain with transition matrix P have t transient states and r absorbing states. The rows of P represent sources, while columns represent destinations. By ordering the transient states before the absorbing states, it can be assumed that P has the form P = [ Q R 0 I r ] , {\displaystyle P={\begin{bmatrix}Q&R\\\mathbf {0} &I_{r}\end{bmatrix}},} where Q is a t-by-t matrix, R is a nonzero t-by-r matrix, 0 is an r-by-t zero matrix, and Ir is the r-by-r identity matrix. Thus, Q describes the probability of transitioning from some transient state to another while R describes the probability of transitioning from some transient state to some absorbing state. The probability of transitioning from i to j in exactly k steps is the (i,j)-entry of Pk, further computed below. When considering only transient states, the probability is found in the upper left of Pk, the (i,j)-entry of Qk. == Fundamental matrix == === Expected number of visits to a transient state === A basic property about an absorbing Markov chain is the expected number of visits to a transient state j starting from a transient state i (before being absorbed). This can be established to be given by the (i, j) entry of so-called fundamental matrix N, obtained by summing Qk for all k (from 0 to ∞). It can be proven that N := ∑ k = 0 ∞ Q k = ( I t − Q ) − 1 , {\displaystyle N:=\sum _{k=0}^{\infty }Q^{k}=(I_{t}-Q)^{-1},} where It is the t-by-t identity matrix. The computation of this formula is the matrix equivalent of the geometric series of scalars, ∑ k = 0 ∞ q k = 1 1 − q {\displaystyle {\textstyle \sum }_{k=0}^{\infty }q^{k}={\tfrac {1}{1-q}}} . With the matrix N in hand, also other properties of the Markov chain are easy to obtain. === Expected number of steps before being absorbed === The expected number of steps before being absorbed in any absorbing state, when starting in transient state i can be computed via a sum over transient states. The value is given by the ith entry of the vector t := N 1 , {\displaystyle \mathbf {t} :=N\mathbf {1} ,} where 1 is a length-t column vector whose entries are all 1. === Absorbing probabilities === By induction, P k = [ Q k ( I t − Q k ) N R 0 I r ] . {\displaystyle P^{k}={\begin{bmatrix}Q^{k}&(I_{t}-Q^{k})NR\\\mathbf {0} &I_{r}\end{bmatrix}}.} The probability of eventually being absorbed in the absorbing state j when starting from transient state i is given by the (i,j)-entry of the matrix B := N R {\displaystyle B:=NR} . The number of columns of this matrix equals the number of absorbing states r. An approximation of those probabilities can also be obtained directly from the (i,j)-entry of P k {\displaystyle P^{k}} for a large enough value of k, when i is the index of a transient, and j the index of an absorbing state. This is because ( lim k → ∞ P k ) i , t + j = B i , j {\displaystyle \left(\lim _{k\to \infty }P^{k}\right)_{i,t+j}=B_{i,j}} . === Transient visiting probabilities === The probability of visiting transient state j when starting at a transient state i is the (i,j)-entry of the matrix H := ( N − I t ) ( N dg ) − 1 , {\displaystyle H:=(N-I_{t})(N_{\operatorname {dg} })^{-1},} where Ndg is the diagonal matrix with the same diagonal as N. === Variance on number of transient visits === The variance on the number of visits to a transient state j with starting at a transient state i (before being absorbed) is the (i,j)-entry of the matrix N 2 := N ( 2 N dg − I t ) − N sq , {\displaystyle N_{2}:=N(2N_{\operatorname {dg} }-I_{t})-N_{\operatorname {sq} },} where Nsq is the Hadamard product of N with itself (i.e. each entry of N is squared). === Variance on number of steps === The variance on the number of steps before being absorbed when starting in transient state i is the ith entry of the vector ( 2 N − I t ) t − t sq , {\displaystyle (2N-I_{t})\mathbf {t} -\mathbf {t} _{\operatorname {sq} },} where tsq is the Hadamard product of t with itself (i.e., as with Nsq, each entry of t is squared). == Examples == === String generation === Consider the process of repeatedly flipping a fair coin until the sequence (heads, tails, heads) appears. This process is modeled by an absorbing Markov chain with transition matrix P = [ 1 / 2 1 / 2 0 0 0 1 / 2 1 / 2 0 1 / 2 0 0 1 / 2 0 0 0 1 ] . {\displaystyle P={\begin{bmatrix}1/2&1/2&0&0\\0&1/2&1/2&0\\1/2&0&0&1/2\\0&0&0&1\end{bmatrix}}.} The first state represents the empty string, the second state the string "H", the third state the string "HT", and the fourth state the string "HTH". Although in reality, the coin flips cease after the string "HTH" is generated, the perspective of the absorbing Markov chain is that the process has transitioned into the absorbing state representing the string "HTH" and, therefore, cannot leave. For this absorbing Markov chain, the fundamental matrix is N = ( I − Q ) − 1 = ( [ 1 0 0 0 1 0 0 0 1 ] − [ 1 / 2 1 / 2 0 0 1 / 2 1 / 2 1 / 2 0 0 ] ) − 1 = [ 1 / 2 − 1 / 2 0 0 1 / 2 − 1 / 2 − 1 / 2 0 1 ] − 1 = [ 4 4 2 2 4 2 2 2 2 ] . {\displaystyle {\begin{aligned}N&=(I-Q)^{-1}=\left({\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}}-{\begin{bmatrix}1/2&1/2&0\\0&1/2&1/2\\1/2&0&0\end{bmatrix}}\right)^{-1}\\[4pt]&={\begin{bmatrix}1/2&-1/2&0\\0&1/2&-1/2\\-1/2&0&1\end{bmatrix}}^{-1}={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}.\end{aligned}}} The expected number of steps starting from each of the transient states is t = N 1 = [ 4 4 2 2 4 2 2 2 2 ] [ 1 1 1 ] = [ 10 8 6 ] . {\displaystyle \mathbf {t} =N\mathbf {1} ={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}={\begin{bmatrix}10\\8\\6\end{bmatrix}}.} Therefore, the expected number of coin flips before observing the sequence (heads, tails, heads) is 10, the entry for the state representing the empty string. === Games of chance === Games based entirely on chance can be modeled by an absorbing Markov chain. A classic example of this is the ancient Indian board game Snakes and Ladders. The graph on the left plots the probability mass in the lone absorbing state that represents the final square as the transition matrix is raised to larger and larger powers. To determine the expected number of turns to complete the game, compute the vector t as described above and examine tstart, which is approximately 39.2. === Infectious disease testing === Infectious disease testing, either of blood products or in medical clinics, is often taught as an example of an absorbing Markov chain. The public U.S. Centers for Disease Control and Prevention (CDC) model for HIV and for hepatitis B, for example, illustrates the property that absorbing Markov chains can lead to the detection of disease, versus the loss of detection through other means. In the standard CDC model, the Markov chain has five states, a state in which the individual is uninfected, then a state with infected but undetectable virus, a state with detectable virus, and absorbing states of having quit/been lost from the clinic, or of having been detected (the goal). The typical rates of transition between the Markov states are the probability p per unit time of being infected with the virus, w for the rate of window period removal (time until virus is detectable), q for quit/loss rate from the system, and d for detection, assuming a typical rate λ {\displaystyle \lambda } at which the health system administers tests of the blood product or patients in question. It follows that we can "walk along" the Markov model to identify the overall probability of detection for a person starting as undetected, by multiplying the probabilities of transition to each next state of the model as: p ( p + q ) w ( w + q ) d ( d + q ) {\displaystyle {\frac {p}{(p+q)}}{\frac {w}{(w+q)}}{\frac {d}{(d+q)}}} . The subsequent total absolute number of false negative tests—the primary CDC concern—would then be the rate of tests, multiplied by the probability of reaching the infected but undetectable state, times the duration of staying in the infected undetectable state: p ( p + q ) 1 ( w + q ) λ {\displaystyle {\frac {p}{(p+q)}}{\frac {1}{(w+q)}}\lambda } .

    Read more →
  • SqueezeNet

    SqueezeNet

    SqueezeNet is a deep neural network for image classification released in 2016. SqueezeNet was developed by researchers at DeepScale, University of California, Berkeley, and Stanford University. In designing SqueezeNet, the authors' goal was to create a smaller neural network with fewer parameters while achieving competitive accuracy. Their best-performing model achieved the same accuracy as AlexNet on ImageNet classification, but has a size 510x less than it. == Version history == SqueezeNet was originally released on February 22, 2016. This original version of SqueezeNet was implemented on top of the Caffe deep learning software framework. Shortly thereafter, the open-source research community ported SqueezeNet to a number of other deep learning frameworks. On February 26, 2016, Eddie Bell released a port of SqueezeNet for the Chainer deep learning framework. On March 2, 2016, Guo Haria released a port of SqueezeNet for the Apache MXNet framework. On June 3, 2016, Tammy Yang released a port of SqueezeNet for the Keras framework. In 2017, companies including Baidu, Xilinx, Imagination Technologies, and Synopsys demonstrated SqueezeNet running on low-power processing platforms such as smartphones, FPGAs, and custom processors. As of 2018, SqueezeNet ships "natively" as part of the source code of a number of deep learning frameworks such as PyTorch, Apache MXNet, and Apple CoreML. In addition, third party developers have created implementations of SqueezeNet that are compatible with frameworks such as TensorFlow. Below is a summary of frameworks that support SqueezeNet. == Relationship to other networks == === AlexNet === SqueezeNet was originally described in SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. AlexNet is a deep neural network that has 240 MB of parameters, and SqueezeNet has just 5 MB of parameters. This small model size can more easily fit into computer memory and can more easily be transmitted over a computer network. However, it's important to note that SqueezeNet is not a "squeezed version of AlexNet." Rather, SqueezeNet is an entirely different DNN architecture than AlexNet. What SqueezeNet and AlexNet have in common is that both of them achieve approximately the same level of accuracy when evaluated on the ImageNet image classification validation dataset. === Model compression === Model compression (e.g. quantization and pruning of model parameters) can be applied to a deep neural network after it has been trained. In the SqueezeNet paper, the authors demonstrated that a model compression technique called Deep Compression can be applied to SqueezeNet to further reduce the size of the parameter file from 5 MB to 500 KB. Deep Compression has also been applied to other DNNs, such as AlexNet and VGG. == Variants == Some of the members of the original SqueezeNet team have continued to develop resource-efficient deep neural networks for a variety of applications. A few of these works are noted in the following table. As with the original SqueezeNet model, the open-source research community has ported and adapted these newer "squeeze"-family models for compatibility with multiple deep learning frameworks. In addition, the open-source research community has extended SqueezeNet to other applications, including semantic segmentation of images and style transfer.

    Read more →