AI Detector Like Turnitin Reddit

AI Detector Like Turnitin Reddit — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • PressWise

    PressWise

    PressWise was digital imposition software to quickly and easily impose most any variety of flat and folding layouts. It was acquired by the Aldus Prepress Group affectionately known in the print and publishing industry as the Aldus WiseGuys in August 1991 from Emulation Technologies Inc. of Cleveland, Ohio. It was further developed by the Aldus Press Group and launched as the first of many Aldus prepress products in 1993. It was subsequently owned by Adobe Systems, then Luminous Corporation (Seattle), then Imation, and finally ScenicSoft. PressWise was discontinued by ScenicSoft in 1999 ultimately. == History == In February 2009, the PressWise copyright was acquired by Aethos Technologies and a new print automation product was launched by its creator, Eric Wold of Santa Rosa, California. This new product has no relationship to the old imposition software of the same name. It's notable that Larry Letteney, former President of Creo Americas was a board member and shareholder of Aethos Technologies during its early phase. Datatech SmartSoft acquired exclusive distribution rights to the software in September 2009. In September 2010 Datatech SmartSoft completed the acquisition of the PressWise brand and product.

    Read more →
  • Dynamic Bayesian network

    Dynamic Bayesian network

    A dynamic Bayesian network (DBN) is a Bayesian network (BN) which relates variables to each other over adjacent time steps. == History == A dynamic Bayesian network (DBN) is often called a "two-timeslice" BN (2TBN) because it says that at any point in time T, the value of a variable can be calculated from the internal regressors and the immediate prior value (time T-1). DBNs were developed by Paul Dagum in the early 1990s at Stanford University's Section on Medical Informatics. Dagum developed DBNs to unify and extend traditional linear state-space models such as Kalman filters, linear and normal forecasting models such as ARMA and simple dependency models such as hidden Markov models into a general probabilistic representation and inference mechanism for arbitrary nonlinear and non-normal time-dependent domains. Today, DBNs are common in robotics, and have shown potential for a wide range of data mining applications. For example, they have been used in speech recognition, digital forensics, protein sequencing, and bioinformatics. DBN is a generalization of hidden Markov models and Kalman filters. DBNs are conceptually related to probabilistic Boolean networks and can, similarly, be used to model dynamical systems at steady-state.

    Read more →
  • FastICA

    FastICA

    FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration. == Algorithm == === Prewhitening the data === Let the X := ( x i j ) ∈ R N × M {\displaystyle \mathbf {X} :=(x_{ij})\in \mathbb {R} ^{N\times M}} denote the input data matrix, M {\displaystyle M} the number of columns corresponding with the number of samples of mixed signals and N {\displaystyle N} the number of rows corresponding with the number of independent source signals. The input data matrix X {\displaystyle \mathbf {X} } must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it. Centering the data entails demeaning each component of the input data X {\displaystyle \mathbf {X} } , that is, for each i = 1 , … , N {\displaystyle i=1,\ldots ,N} and j = 1 , … , M {\displaystyle j=1,\ldots ,M} . After centering, each row of X {\displaystyle \mathbf {X} } has an expected value of 0 {\displaystyle 0} . Whitening the data requires a linear transformation L : R N × M → R N × M {\displaystyle \mathbf {L} :\mathbb {R} ^{N\times M}\to \mathbb {R} ^{N\times M}} of the centered data so that the components of L ( X ) {\displaystyle \mathbf {L} (\mathbf {X} )} are uncorrelated and have variance one. More precisely, if X {\displaystyle \mathbf {X} } is a centered data matrix, the covariance of L x := L ( X ) {\displaystyle \mathbf {L} _{\mathbf {x} }:=\mathbf {L} (\mathbf {X} )} is the ( N × N ) {\displaystyle (N\times N)} -dimensional identity matrix, that is, A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data X {\displaystyle \mathbf {X} } , E { X X T } = E D E T {\displaystyle E\left\{\mathbf {X} \mathbf {X} ^{T}\right\}=\mathbf {E} \mathbf {D} \mathbf {E} ^{T}} , where E {\displaystyle \mathbf {E} } is the matrix of eigenvectors and D {\displaystyle \mathbf {D} } is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by === Single component extraction === The iterative algorithm finds the direction for the weight vector w ∈ R N {\displaystyle \mathbf {w} \in \mathbb {R} ^{N}} that maximizes a measure of non-Gaussianity of the projection w T X {\displaystyle \mathbf {w} ^{T}\mathbf {X} } , with X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} denoting a prewhitened data matrix as described above. Note that w {\displaystyle \mathbf {w} } is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function f ( u ) {\displaystyle f(u)} , its first derivative g ( u ) {\displaystyle g(u)} , and its second derivative g ′ ( u ) {\displaystyle g^{\prime }(u)} . Hyvärinen states that the functions are useful for general purposes, while may be highly robust. The steps for extracting the weight vector w {\displaystyle \mathbf {w} } for single component in FastICA are the following: Randomize the initial weight vector w {\displaystyle \mathbf {w} } Let w + ← E { X g ( w T X ) T } − E { g ′ ( w T X ) } w {\displaystyle \mathbf {w} ^{+}\leftarrow E\left\{\mathbf {X} g(\mathbf {w} ^{T}\mathbf {X} )^{T}\right\}-E\left\{g'(\mathbf {w} ^{T}\mathbf {X} )\right\}\mathbf {w} } , where E { . . . } {\displaystyle E\left\{...\right\}} means averaging over all column-vectors of matrix X {\displaystyle \mathbf {X} } Let w ← w + / ‖ w + ‖ {\displaystyle \mathbf {w} \leftarrow \mathbf {w} ^{+}/\|\mathbf {w} ^{+}\|} If not converged, go back to 2 === Multiple component extraction === The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, 1 M {\displaystyle \mathbf {1_{M}} } is a column vector of 1's of dimension M {\displaystyle M} . Algorithm FastICA Input: C {\displaystyle C} Number of desired components Input: X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} Prewhitened matrix, where each column represents an N {\displaystyle N} -dimensional sample, where C <= N {\displaystyle C<=N} Output: W ∈ R N × C {\displaystyle \mathbf {W} \in \mathbb {R} ^{N\times C}} Un-mixing matrix where each column projects X {\displaystyle \mathbf {X} } onto independent component. Output: S ∈ R C × M {\displaystyle \mathbf {S} \in \mathbb {R} ^{C\times M}} Independent components matrix, with M {\displaystyle M} columns representing a sample with C {\displaystyle C} dimensions. for p in 1 to C: w p ← {\displaystyle \mathbf {w_{p}} \leftarrow } Random vector of length N while w p {\displaystyle \mathbf {w_{p}} } changes w p ← 1 M X g ( w p T X ) T − 1 M g ′ ( w p T X ) 1 M w p {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {1}{M}}\mathbf {X} g(\mathbf {w_{p}} ^{T}\mathbf {X} )^{T}-{\frac {1}{M}}g'(\mathbf {w_{p}} ^{T}\mathbf {X} )\mathbf {1_{M}} \mathbf {w_{p}} } w p ← w p − ∑ j = 1 p − 1 ( w p T w j ) w j {\displaystyle \mathbf {w_{p}} \leftarrow \mathbf {w_{p}} -\sum _{j=1}^{p-1}(\mathbf {w_{p}} ^{T}\mathbf {w_{j}} )\mathbf {w_{j}} } w p ← w p ‖ w p ‖ {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {\mathbf {w_{p}} }{\|\mathbf {w_{p}} \|}}} output W ← [ w 1 , … , w C ] {\displaystyle \mathbf {W} \leftarrow {\begin{bmatrix}\mathbf {w_{1}} ,\dots ,\mathbf {w_{C}} \end{bmatrix}}} output S ← W T X {\displaystyle \mathbf {S} \leftarrow \mathbf {W^{T}} \mathbf {X} }

    Read more →
  • XGBoost

    XGBoost

    XGBoost (eXtreme Gradient Boosting) is an open-source software library which provides a regularizing gradient boosting framework for C++, Java, Python, R, Julia, Perl, and Scala. It works on Linux, Microsoft Windows, and macOS. From the project description, it aims to provide a "Scalable, Portable and Distributed Gradient Boosting (GBM, GBRT, GBDT) Library". It runs on a single machine, as well as the distributed processing frameworks Apache Hadoop, Apache Spark, Apache Flink, and Dask. XGBoost gained much popularity and attention in the mid-2010s as the algorithm of choice for many winning teams of machine learning competitions. == History == XGBoost initially started as a research project by Tianqi Chen as part of the Distributed (Deep) Machine Learning Community (DMLC) group at the University of Washington. Initially, it began as a terminal application which could be configured using a libsvm configuration file. It became well known in the ML competition circles after its use in the winning solution of the Higgs Machine Learning Challenge. Soon after, the Python and R packages were built, and XGBoost now has package implementations for Java, Scala, Julia, Perl, and other languages. This brought the library to more developers and contributed to its popularity among the Kaggle community, where it has been used for a large number of competitions. It was soon integrated with a number of other packages making it easier to use in their respective communities. It has now been integrated with scikit-learn for Python users and with the caret package for R users. It can also be integrated into Data Flow frameworks like Apache Spark, Apache Hadoop, and Apache Flink using the abstracted Rabit and XGBoost4J. XGBoost is also available on OpenCL for FPGAs. An efficient, scalable implementation of XGBoost has been published by Tianqi Chen and Carlos Guestrin. While the XGBoost model often achieves higher accuracy than a single decision tree, it sacrifices the intrinsic interpretability of decision trees. For example, following the path that a decision tree takes to make its decision is trivial and self-explained, but following the paths of hundreds or thousands of trees is much harder. == Features == Salient features of XGBoost which make it different from other gradient boosting algorithms include: Clever penalization of trees A proportional shrinking of leaf nodes Newton Boosting Extra randomization parameter Implementation on single, distributed systems and out-of-core computation Automatic feature selection Theoretically justified weighted quantile sketching for efficient computation Parallel tree structure boosting with sparsity Efficient cacheable block structure for decision tree training == The algorithm == XGBoost works as Newton–Raphson in function space unlike gradient boosting that works as gradient descent in function space, a second order Taylor approximation is used in the loss function to make the connection to Newton–Raphson method. A generic unregularized XGBoost algorithm is: == Parameters == XGBoost has parameters which can be specified to affect how it functions and performs. Some parameters include: Learning rate (also known as the "step size" or “"shrinkage"), it is a number between 0 and 1 (default is 0.3), which determines the rate the algorithm learns from each iteration. n_estimators, sets the number of trees to be built in the ensemble, where more trees generally increases the complexity of the model, but can lead to overfitting with too many trees. Gamma (also known as Lagrange multiplier or the minimum loss reduction parameter), controls the minimum amount of loss reduction required to make a further split on a leaf node of the tree. The default in XGBoost is 0 . max_depth, represents how deeply each tree in the boosting process can grow during training, where the default is 6. == Awards == John Chambers Award (2016) High Energy Physics meets Machine Learning award (HEP meets ML) (2016)

    Read more →
  • Kernel embedding of distributions

    Kernel embedding of distributions

    In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space Ω {\displaystyle \Omega } on which a sensible kernel function (measuring similarity between elements of Ω {\displaystyle \Omega } ) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in R d {\displaystyle \mathbb {R} ^{d}} , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in. The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data. Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings. Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables Intermediate density estimation is not needed Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel) If a characteristic kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven. Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. == Definitions == Let X {\displaystyle X} denote a random variable with domain Ω {\displaystyle \Omega } and distribution P {\displaystyle P} . Given a symmetric, positive-definite kernel k : Ω × Ω → R {\displaystyle k:\Omega \times \Omega \rightarrow \mathbb {R} } the Moore–Aronszajn theorem asserts the existence of a unique RKHS H {\displaystyle {\mathcal {H}}} on Ω {\displaystyle \Omega } (a Hilbert space of functions f : Ω → R {\displaystyle f:\Omega \to \mathbb {R} } equipped with an inner product ⟨ ⋅ , ⋅ ⟩ H {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {H}}} and a norm ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} ) for which k {\displaystyle k} is a reproducing kernel, i.e., in which the element k ( x , ⋅ ) {\displaystyle k(x,\cdot )} satisfies the reproducing property ⟨ f , k ( x , ⋅ ) ⟩ H = f ( x ) ∀ f ∈ H , ∀ x ∈ Ω . {\displaystyle \langle f,k(x,\cdot )\rangle _{\mathcal {H}}=f(x)\qquad \forall f\in {\mathcal {H}},\quad \forall x\in \Omega .} One may alternatively consider x ↦ k ( x , ⋅ ) {\displaystyle x\mapsto k(x,\cdot )} as an implicit feature mapping φ : Ω → H {\displaystyle \varphi :\Omega \rightarrow {\mathcal {H}}} (which is therefore also called the feature space), so that k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ H {\displaystyle k(x,x')=\langle \varphi (x),\varphi (x')\rangle _{\mathcal {H}}} can be viewed as a measure of similarity between points x , x ′ ∈ Ω . {\displaystyle x,x'\in \Omega .} While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel. === Kernel embedding === The kernel embedding of the distribution P {\displaystyle P} in H {\displaystyle {\mathcal {H}}} (also called the kernel mean or mean map) is given by: μ X := E [ k ( X , ⋅ ) ] = E [ φ ( X ) ] = ∫ Ω φ ( x ) d P ( x ) {\displaystyle \mu _{X}:=\mathbb {E} [k(X,\cdot )]=\mathbb {E} [\varphi (X)]=\int _{\Omega }\varphi (x)\ \mathrm {d} P(x)} If P {\displaystyle P} allows a square integrable density p {\displaystyle p} , then μ X = E k p {\displaystyle \mu _{X}={\mathcal {E}}_{k}p} , where E k {\displaystyle {\mathcal {E}}_{k}} is the Hilbert–Schmidt integral operator. A kernel is characteristic if the mean embedding μ : { family of distributions over Ω } → H {\displaystyle \mu :\{{\text{family of distributions over }}\Omega \}\to {\mathcal {H}}} is injective. Each distribution can thus be uniquely represented in the RKHS and all statistical features of distributions are preserved by the kernel embedding if a characteristic kernel is used. === Empirical kernel embedding === Given n {\displaystyle n} training examples { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} drawn independently and identically distributed (i.i.d.) from P , {\displaystyle P,} the kernel embedding of P {\displaystyle P} can be empirically estimated as μ ^ X = 1 n ∑ i = 1 n φ ( x i ) {\displaystyle {\widehat {\mu }}_{X}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})} === Joint distribution embedding === If Y {\displaystyle Y} denotes another random variable (for simplicity, assume the co-domain of Y {\displaystyle Y} is also Ω {\displaystyle \Omega } with the same kernel k {\displaystyle k} which satisfies ⟨ φ ( x ) ⊗ φ ( y ) , φ ( x ′ ) ⊗ φ ( y ′ ) ⟩ = k ( x , x ′ ) k ( y , y ′ ) {\displaystyle \langle \varphi (x)\otimes \varphi (y),\varphi (x')\otimes \varphi (y')\rangle =k(x,x')k(y,y')} ), then the joint distribution P ( x , y ) ) {\displaystyle P(x,y))} can be mapped into a tensor product feature space H ⊗ H {\displaystyle {\mathcal {H}}\otimes {\mathcal {H}}} via C X Y = E [ φ ( X ) ⊗ φ ( Y ) ] = ∫ Ω × Ω φ ( x ) ⊗ φ ( y ) d P ( x , y ) {\displaystyle {\mathcal {C}}_{XY}=\mathbb {E} [\varphi (X)\otimes \varphi (Y)]=\int _{\Omega \times \Omega }\varphi (x)\otimes \varphi (y)\ \mathrm {d} P(x,y)} By the equivalence between a tensor and a linear map, this joint embedding may be interpreted as an uncentered cross-covariance operator C X Y : H → H {\displaystyle {\mathcal {C}}_{XY}:{\mathcal {H}}\to {\mathcal {H}}} from which the cross-covariance of functions f , g ∈ H {\displaystyle f,g\in {\mathcal {H}}} can be computed as Cov ⁡ ( f ( X ) , g ( Y ) ) := E [ f ( X ) g ( Y ) ] − E [ f ( X ) ] E [ g ( Y ) ] = ⟨ f , C X Y g ⟩ H = ⟨ f ⊗ g , C X Y ⟩ H ⊗ H {\displaystyle \operatorname {Cov} (f(X),g(Y)):=\mathbb {E} [f(X)g(Y)]-\mathbb {E} [f(X)]\mathbb {E} [g(Y)]=\langle f,{\mathcal {C}}_{XY}g\rangle _{\mathcal {H}}=\langle f\otimes g,{\mathcal {C}}_{XY}\rangle _{{\mathcal {H}}\otimes {\mathcal {H}}}} Given n {\displaystyle n} pairs of training examples { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} drawn i.i.d. from P {\displaystyle P} , we can also empirically estimate the joint distribution kernel embedding via C ^ X Y = 1 n ∑ i = 1 n φ ( x i ) ⊗ φ ( y i ) {\displaystyle {\widehat {\mathcal {C}}}_{XY}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})\otimes \varphi (y_{i})} === Conditional distribution embedding === Given a conditional distribution P ( y ∣ x ) , {\displaystyle P(y\mid x),} one can define the corresponding RKHS embedding as μ Y ∣ x = E [ φ ( Y ) ∣ X ] = ∫ Ω φ ( y ) d P ( y ∣ x ) {\displaystyle \mu _{Y\mid x}=\mathbb {E} [\varphi (Y)\mid X]=\int _{\Omega

    Read more →
  • Semantic mapping (statistics)

    Semantic mapping (statistics)

    Semantic mapping (SM) is a statistical method for dimensionality reduction (the transformation of data from a high-dimensional space into a low-dimensional space). SM can be used in a set of multidimensional vectors of features to extract a few new features that preserves the main data characteristics. SM performs dimensionality reduction by clustering the original features in semantic clusters and combining features mapped in the same cluster to generate an extracted feature. Given a data set, this method constructs a projection matrix that can be used to map a data element from a high-dimensional space into a reduced dimensional space. SM can be applied in construction of text mining and information retrieval systems, as well as systems managing vectors of high dimensionality. SM is an alternative to random mapping, principal components analysis and latent semantic indexing methods.

    Read more →
  • Amazon Rekognition

    Amazon Rekognition

    Amazon Rekognition is a cloud-based software as a service (SaaS) computer vision platform that was launched in 2016. It has been sold to, and used by, a number of United States government agencies, including U.S. Immigration and Customs Enforcement (ICE) and Orlando, Florida police, as well as private entities. == Capabilities == Rekognition provides a number of computer vision capabilities, which can be divided into two categories: Algorithms that are pre-trained on data collected by Amazon or its partners, and algorithms that a user can train on a custom dataset. As of July 2019, Rekognition provides the following computer vision capabilities. === Pre-trained algorithms === Celebrity recognition in images Facial attribute detection in images, including gender, age range, emotions (e.g. happy, calm, disgusted), whether the face has a beard or mustache, whether the face has eyeglasses or sunglasses, whether the eyes are open, whether the mouth is open, whether the person is smiling, and the location of several markers such as the pupils and jaw line. People Pathing enables tracking of people through a video. An advertised use-case of this capability is to track sports players for post-game analysis. Text detection and classification in images Unsafe visual content detection === Algorithms that a user can train on a custom dataset === SearchFaces enables users to import a database of images with pre-labeled faces, to train a machine learning model on this database, and to expose the model as a cloud service with an API. Then, the user can post new images to the API and receive information about the faces in the image. The API can be used to expose a number of capabilities, including identifying faces of known people, comparing faces, and finding similar faces in a database. Face-based user verification == History and use == === 2017 === In late 2017, the Washington County, Oregon Sheriff's Office began using Rekognition to identify suspects' faces. Rekognition was marketed as a general-purpose computer vision tool, and an engineer working for Washington County decided to use the tool for facial analysis of suspects. Rekognition was offered to the department for free, and Washington County became the first US law enforcement agency known to use Rekognition. In 2018, the agency logged over 1,000 facial searches. The county, according to the Washington Post, by 2019 was paying about $7 a month for all of its searches. The relationship was unknown to the public until May 2018. In 2018, Rekognition was also used to help identify celebrities during a royal wedding telecast. === 2018 === In April 2018, it was reported that FamilySearch was using Rekognition to enable their users to "see which of their ancestors they most resemble based on family photographs". In early 2018, the FBI also began using it as a pilot program for analyzing video surveillance. In May 2018, it was reported by the ACLU that Orlando, Florida was running a pilot using Rekognition for facial analysis in law enforcement, with that pilot ending in July 2019. After the report, on June 22, 2018, Gizmodo reported that Amazon workers had written a letter to CEO Jeff Bezos requesting he cease selling Rekognition to US law enforcement, particularly ICE and Homeland Security. A letter was also sent to Bezos by the ACLU. On June 26, 2018, it was reported that the Orlando police force had ceased using Rekognition after their trial contract expired, reserving the right to use it in the future. The Orlando Police Department said that they had "never gotten to the point to test images" due to old infrastructure and low bandwidth. In July 2018, the ACLU released a test showing that Rekognition had falsely matched 28 members of Congress with mugshot photos, particularly Congresspeople of color. 25 House members afterwards sent a letter to Bezos, expressing concern about Rekognition. Amazon responded saying the Rekognition test had generated 80 percent confidence, while it recommended law enforcement only use matches rated at 99 percent confidence. The Washington Post states that Oregon instead has officers pick a "best of five" result, instead of adhering to the recommendation. In September 2018, it was reported that Mapillary was using Rekognition to read the text on parking signs (e.g. no stopping, no parking, or specific parking hours) in cities. In October 2018, it was reported that Amazon had earlier that year pitched Rekognition to U.S. Immigration and Customs Enforcement agency. Amazon defended government use of Rekognition. On December 1, 2018, it was reported that 8 Democratic lawmakers had said in a letter that Amazon had "failed to provide sufficient answers" about Rekognition, writing that they had "serious concerns that this type of product has significant accuracy issues, places disproportionate burdens on communities of color, and could stifle Americans' willingness to exercise their First Amendment rights in public." === 2019 === In January 2019, MIT researchers published a peer-reviewed study asserting that Rekognition had more difficulty in identifying dark-skinned females than competitors such as IBM and Microsoft. In the study, Rekognition misidentified darker-skinned women as men 31% of the time, but made no mistakes for light-skinned men. Amazon called the report "misinterpreted results" of the research with an improper "default confidence threshold." In January 2019, Amazon's shareholders "urged Amazon to stop selling Rekognition software to law enforcement agencies." Amazon in response defended its use of Rekognition, but supported new federal oversight and guidelines to "make sure facial recognition technology cannot be used to discriminate." In February 2019, it was reported that Amazon was collaborating with the National Institute of Standards and Technology (NIST) on developing standardized tests to improve accuracy and remove bias with facial recognition. In March 2019, an open letter regarding Rekognition was sent by a group of prominent AI researchers to Amazon, criticizing its sale to law enforcement with around 50 signatures. In April 2019, Amazon was told by the Securities and Exchange Commission that they had to vote on two shareholder proposals seeking to limit Rekognition. Amazon argued that the proposals were an "insignificant public policy issue for the Company" not related to Amazon's ordinary business, but their appeal was denied. The vote was set for May. The first proposal was tabled by shareholders. On May 24, 2019, 2.4% of shareholders voted to stop selling Rekognition to government agencies, while a second proposal calling for a study into Rekognition and civil rights had 27.5% support. In August 2019, the ACLU again used Rekognition on members of government, with 26 of 120 lawmakers in California flagged as matches to mugshots. Amazon stated the ACLU was "misusing" the software in the tests, by not dismissing results that did not meet Amazon's recommended accuracy threshold of 99%. By August 2019, there had been protests against ICE's use of Rekognition to surveil immigrants. In March 2019, Amazon announced a Rekognition update that would improve emotional detection, and in August 2019, "fear" was added to emotions that Rekognition could detect. === 2020 === In June 2020, Amazon announced it was implementing a one-year moratorium on police use of Rekognition, in response to the George Floyd protests. === 2024 === The Department of Justice disclosed that the FBI is initiating the use of Amazon Rekognition. The DOJ's AI inventory revealed the FBI's "Project Tyr" aims to customize Rekognition to identify nudity, weapons, explosives, and other information from lawfully acquired media. === 2025 === In late 2025, the New York Times reported that scientist, Dr. Jürgen Matthäus, retired from as the head of research at the U.S. Holocaust Memorial Museum in Washington, D.C., used Amazon Rekognition to identify the shooter in the Holocaust photograph known as The Last Jew in Vinnitsa "with more than 99 percent certainty" — as Jakobus Onnen (1906–1943), a teacher from Tichelwarf near Weener in East Frisia who had been a member of the SS since 1934 and was later killed in action near Zhitomir in 1943. The photographer and victim remain unidentified. == Controversy regarding facial analysis == === Racial and gender bias === In 2018, MIT researchers Joy Buolamwini and Timnit Gebru published a study called Gender Shades. In this study, a set of images was collected, and faces in the images were labeled with face position, gender, and skin tone information. The images were run through SaaS facial recognition platforms from Face++, IBM, and Microsoft. In all three of these platforms, the classifiers performed best on male faces (with error rates on female faces being 8.1% to 20.6% higher than error rates on male faces), and they performed worst on dark female faces (with error rates ranging from 20.8% to 30.4%). The authors hypothesized that this discr

    Read more →
  • Markov blanket

    Markov blanket

    In statistics and machine learning, a Markov blanket of a random variable is a set of variables that renders the variable conditionally independent of all other variables in the system. This concept is central in probabilistic graphical models and feature selection. If a Markov blanket is minimal—meaning that no variable in it can be removed without losing this conditional independence—it is called a Markov boundary. Identifying a Markov blanket or boundary allows for efficient inference and helps isolate relevant variables for prediction or causal reasoning. The terms Markov blanket and Markov boundary were coined by Judea Pearl in 1988. A Markov blanket may be derived from the structure of a probabilistic graphical model such as a Bayesian network or Markov random field. == Definition == A Markov blanket of a random variable Y {\displaystyle Y} in a random variable set S = { X 1 , … , X n } {\displaystyle {\mathcal {S}}=\{X_{1},\ldots ,X_{n}\}} is any subset S 1 {\displaystyle {\mathcal {S}}_{1}} of S {\displaystyle {\mathcal {S}}} , conditioned on which other variables are independent with Y {\displaystyle Y} : Y ⊥ ⊥ S ∖ S 1 ∣ S 1 {\displaystyle Y\perp \!\!\!\perp {\mathcal {S}}\smallsetminus {\mathcal {S}}_{1}\mid {\mathcal {S}}_{1}} It means that S 1 {\displaystyle {\mathcal {S}}_{1}} contains at least all the information one needs to infer Y {\displaystyle Y} , where the variables in S ∖ S 1 {\displaystyle {\mathcal {S}}\smallsetminus {\mathcal {S}}_{1}} are redundant. In general, a given Markov blanket is not unique. Any set in S {\displaystyle {\mathcal {S}}} that contains a Markov blanket is also a Markov blanket itself. Specifically, S {\displaystyle {\mathcal {S}}} is a Markov blanket of Y {\displaystyle Y} in S {\displaystyle {\mathcal {S}}} . === Example === In a Bayesian network, the Markov blanket of a node consists of its parents, its children, and its children's other parents (i.e., co-parents). Knowing the values of these nodes makes the target node conditionally independent of the rest of the network. In a Markov random field, the Markov blanket of a node is simply its immediate neighbors. == Markov condition == The concept of a Markov blanket is rooted in the Markov condition, which states that in a probabilistic graphical model, each variable is conditionally independent of its non-descendants given its parents. This condition implies the existence of a minimal separating set — the Markov blanket — that shields a variable from the rest of the network. For instance, when a person holds an object stationary against gravity, the object’s acceleration is fully determined by its direct causes—namely, the upward force from the hand and the downward gravitational pull. Other variables such as air pressure or temperature are causally irrelevant. == Markov boundary == A Markov boundary of Y {\displaystyle Y} in S {\displaystyle {\mathcal {S}}} is a subset S 2 {\displaystyle {\mathcal {S}}_{2}} of S {\displaystyle {\mathcal {S}}} , such that S 2 {\displaystyle {\mathcal {S}}_{2}} itself is a Markov blanket of Y {\displaystyle Y} , but any proper subset of S 2 {\displaystyle {\mathcal {S}}_{2}} is not a Markov blanket of Y {\displaystyle Y} . In other words, a Markov boundary is a minimal Markov blanket. The Markov boundary of a node A {\displaystyle A} in a Bayesian network is the set of nodes composed of A {\displaystyle A} 's parents, A {\displaystyle A} 's children, and A {\displaystyle A} 's children's other parents. In a Markov random field, the Markov boundary for a node is the set of its neighboring nodes. In a dependency network, the Markov boundary for a node is the set of its parents. === Uniqueness of Markov boundary === The Markov boundary always exists. Under some mild conditions, the Markov boundary is unique. However, for most practical and theoretical scenarios multiple Markov boundaries may provide alternative solutions. When there are multiple Markov boundaries, quantities measuring causal effect could fail. == In cognitive science == In the study of consciousness, brain function, and complex adaptive systems, Markov blankets are proposed as a mathematical mechanism which delimits the extent of cognitive entities, whether it be physical or causal.

    Read more →
  • Graphics suite

    Graphics suite

    A graphics suite is a software suite for graphics work that are distributed together. The programs are usually able to interact with each other on a higher level than the operating system would normally allow. There is no hard, fast rule regarding the programs to be included in a graphics application suite, but most will include at least a bitmap graphics editor and a vector graphics editor. In addition to these, the suite may contain VRML editors, animation editors, and morphing tools.

    Read more →
  • Influence diagram

    Influence diagram

    An influence diagram (ID) (also called a relevance diagram, decision diagram or a decision network) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems (following the maximum expected utility criterion) can be modeled and solved. ID was first developed in the mid-1970s by decision analysts with an intuitive semantic that is easy to understand. It is now adopted widely and becoming an alternative to the decision tree which typically suffers from exponential growth in number of branches with each variable modeled. ID is directly applicable in team decision analysis, since it allows incomplete sharing of information among team members to be modeled and solved explicitly. Extensions of ID also find their use in game theory as an alternative representation of the game tree. == Semantics == An ID is a directed acyclic graph with three types (plus one subtype) of node and three types of arc (or arrow) between nodes. Nodes: Decision node (corresponding to each decision to be made) is drawn as a rectangle. Uncertainty node (corresponding to each uncertainty to be modeled) is drawn as an oval. Deterministic node (corresponding to special kind of uncertainty that its outcome is deterministically known whenever the outcome of some other uncertainties are also known) is drawn as a double oval. Value node (corresponding to each component of additively separable Von Neumann-Morgenstern utility function) is drawn as an octagon (or diamond). Arcs: Functional arcs (ending in value node) indicate that one of the components of additively separable utility function is a function of all the nodes at their tails. Conditional arcs (ending in uncertainty node) indicate that the uncertainty at their heads is probabilistically conditioned on all the nodes at their tails. Conditional arcs (ending in deterministic node) indicate that the uncertainty at their heads is deterministically conditioned on all the nodes at their tails. Informational arcs (ending in decision node) indicate that the decision at their heads is made with the outcome of all the nodes at their tails known beforehand. Given a properly structured ID: Decision nodes and incoming information arcs collectively state the alternatives (what can be done when the outcome of certain decisions and/or uncertainties are known beforehand) Uncertainty/deterministic nodes and incoming conditional arcs collectively model the information (what are known and their probabilistic/deterministic relationships) Value nodes and incoming functional arcs collectively quantify the preference (how things are preferred over one another). Alternative, information, and preference are termed decision basis in decision analysis, they represent three required components of any valid decision situation. Formally, the semantic of influence diagram is based on sequential construction of nodes and arcs, which implies a specification of all conditional independencies in the diagram. The specification is defined by the d {\displaystyle d} -separation criterion of Bayesian network. According to this semantic, every node is probabilistically independent on its non-successor nodes given the outcome of its immediate predecessor nodes. Likewise, a missing arc between non-value node X {\displaystyle X} and non-value node Y {\displaystyle Y} implies that there exists a set of non-value nodes Z {\displaystyle Z} , e.g., the parents of Y {\displaystyle Y} , that renders Y {\displaystyle Y} independent of X {\displaystyle X} given the outcome of the nodes in Z {\displaystyle Z} . == Example == Consider the simple influence diagram representing a situation where a decision-maker is planning their vacation. There is 1 decision node (Vacation Activity), 2 uncertainty nodes (Weather Condition, Weather Forecast), and 1 value node (Satisfaction). There are 2 functional arcs (ending in Satisfaction), 1 conditional arc (ending in Weather Forecast), and 1 informational arc (ending in Vacation Activity). Functional arcs ending in Satisfaction indicate that Satisfaction is a utility function of Weather Condition and Vacation Activity. In other words, their satisfaction can be quantified if they know what the weather is like and what their choice of activity is. (Note that they do not value Weather Forecast directly) Conditional arc ending in Weather Forecast indicates their belief that Weather Forecast and Weather Condition can be dependent. Informational arc ending in Vacation Activity indicates that they will only know Weather Forecast, not Weather Condition, when making their choice. In other words, actual weather will be known after they make their choice, and only forecast is what they can count on at this stage. It also follows semantically, for example, that Vacation Activity is independent on (irrelevant to) Weather Condition given Weather Forecast is known. == Applicability to value of information == The above example highlights the power of the influence diagram in representing an extremely important concept in decision analysis known as the value of information. Consider the following three scenarios; Scenario 1: The decision-maker could make their Vacation Activity decision while knowing what Weather Condition will be like. This corresponds to adding extra informational arc from Weather Condition to Vacation Activity in the above influence diagram. Scenario 2: The original influence diagram as shown above. Scenario 3: The decision-maker makes their decision without even knowing the Weather Forecast. This corresponds to removing informational arc from Weather Forecast to Vacation Activity in the above influence diagram. Scenario 1 is the best possible scenario for this decision situation since there is no longer any uncertainty on what they care about (Weather Condition) when making their decision. Scenario 3, however, is the worst possible scenario for this decision situation since they need to make their decision without any hint (Weather Forecast) on what they care about (Weather Condition) will turn out to be. The decision-maker is usually better off (definitely no worse off, on average) to move from scenario 3 to scenario 2 through the acquisition of new information. The most they should be willing to pay for such move is called the value of information on Weather Forecast, which is essentially the value of imperfect information on Weather Condition. The applicability of this simple ID and the value of information concept is tremendous, especially in medical decision making when most decisions have to be made with imperfect information about their patients, diseases, etc. == Related concepts == Influence diagrams are hierarchical and can be defined either in terms of their structure or in greater detail in terms of the functional and numerical relation between diagram elements. An ID that is consistently defined at all levels—structure, function, and number—is a well-defined mathematical representation and is referred to as a well-formed influence diagram (WFID). WFIDs can be evaluated using reversal and removal operations to yield answers to a large class of probabilistic, inferential, and decision questions. More recent techniques have been developed by artificial intelligence researchers concerning Bayesian network inference (belief propagation). An influence diagram having only uncertainty nodes (i.e., a Bayesian network) is also called a relevance diagram. An arc connecting node A to B implies not only that "A is relevant to B", but also that "B is relevant to A" (i.e., relevance is a symmetric relationship).

    Read more →
  • Tucker decomposition

    Tucker decomposition

    In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor. It is named after Ledyard R. Tucker although it goes back to Hitchcock in 1927. Initially described as a three-mode extension of factor analysis and principal component analysis it may actually be generalized to higher mode analysis, which is also called higher-order singular value decomposition (HOSVD) or the M-mode SVD. The algorithm to which the literature typically refers when discussing the Tucker decomposition or the HOSVD is the M-mode SVD algorithm introduced by Vasilescu and Terzopoulos, but misattributed to Tucker or De Lathauwer etal. It may be regarded as a more flexible PARAFAC (parallel factor analysis) model. In PARAFAC the core tensor is restricted to be "diagonal". In practice, Tucker decomposition is used as a modelling tool. For instance, it is used to model three-way (or higher way) data by means of relatively small numbers of components for each of the three or more modes, and the components are linked to each other by a three- (or higher-) way core array. The model parameters are estimated in such a way that, given fixed numbers of components, the modelled data optimally resemble the actual data in the least squares sense. The model gives a summary of the information in the data, in the same way as principal components analysis does for two-way data. For a 3rd-order tensor T ∈ F n 1 × n 2 × n 3 {\displaystyle T\in F^{n_{1}\times n_{2}\times n_{3}}} , where F {\displaystyle F} is either R {\displaystyle \mathbb {R} } or C {\displaystyle \mathbb {C} } , Tucker Decomposition can be denoted as follows, T = T × 1 U ( 1 ) × 2 U ( 2 ) × 3 U ( 3 ) {\displaystyle T={\mathcal {T}}\times _{1}U^{(1)}\times _{2}U^{(2)}\times _{3}U^{(3)}} where T ∈ F d 1 × d 2 × d 3 {\displaystyle {\mathcal {T}}\in F^{d_{1}\times d_{2}\times d_{3}}} is the core tensor, a 3rd-order tensor that contains the 1-mode, 2-mode and 3-mode singular values of T {\displaystyle T} , which are defined as the Frobenius norm of the 1-mode, 2-mode and 3-mode slices of tensor T {\displaystyle {\mathcal {T}}} respectively. U ( 1 ) , U ( 2 ) , U ( 3 ) {\displaystyle U^{(1)},U^{(2)},U^{(3)}} are unitary matrices in F d 1 × n 1 , F d 2 × n 2 , F d 3 × n 3 {\displaystyle F^{d_{1}\times n_{1}},F^{d_{2}\times n_{2}},F^{d_{3}\times n_{3}}} respectively. The k-mode product (k = 1, 2, 3) of T {\displaystyle {\mathcal {T}}} by U ( k ) {\displaystyle U^{(k)}} is denoted as T × U ( k ) {\displaystyle {\mathcal {T}}\times U^{(k)}} with entries as ( T × 1 U ( 1 ) ) ( i 1 , j 2 , j 3 ) = ∑ j 1 = 1 d 1 T ( j 1 , j 2 , j 3 ) U ( 1 ) ( j 1 , i 1 ) ( T × 2 U ( 2 ) ) ( j 1 , i 2 , j 3 ) = ∑ j 2 = 1 d 2 T ( j 1 , j 2 , j 3 ) U ( 2 ) ( j 2 , i 2 ) ( T × 3 U ( 3 ) ) ( j 1 , j 2 , i 3 ) = ∑ j 3 = 1 d 3 T ( j 1 , j 2 , j 3 ) U ( 3 ) ( j 3 , i 3 ) {\displaystyle {\begin{aligned}({\mathcal {T}}\times _{1}U^{(1)})(i_{1},j_{2},j_{3})&=\sum _{j_{1}=1}^{d_{1}}{\mathcal {T}}(j_{1},j_{2},j_{3})U^{(1)}(j_{1},i_{1})\\({\mathcal {T}}\times _{2}U^{(2)})(j_{1},i_{2},j_{3})&=\sum _{j_{2}=1}^{d_{2}}{\mathcal {T}}(j_{1},j_{2},j_{3})U^{(2)}(j_{2},i_{2})\\({\mathcal {T}}\times _{3}U^{(3)})(j_{1},j_{2},i_{3})&=\sum _{j_{3}=1}^{d_{3}}{\mathcal {T}}(j_{1},j_{2},j_{3})U^{(3)}(j_{3},i_{3})\end{aligned}}} Altogether, the decomposition may also be written more directly as T ( i 1 , i 2 , i 3 ) = ∑ j 1 = 1 d 1 ∑ j 2 = 1 d 2 ∑ j 3 = 1 d 3 T ( j 1 , j 2 , j 3 ) U ( 1 ) ( j 1 , i 1 ) U ( 2 ) ( j 2 , i 2 ) U ( 3 ) ( j 3 , i 3 ) {\displaystyle T(i_{1},i_{2},i_{3})=\sum _{j_{1}=1}^{d_{1}}\sum _{j_{2}=1}^{d_{2}}\sum _{j_{3}=1}^{d_{3}}{\mathcal {T}}(j_{1},j_{2},j_{3})U^{(1)}(j_{1},i_{1})U^{(2)}(j_{2},i_{2})U^{(3)}(j_{3},i_{3})} Taking d i = n i {\displaystyle d_{i}=n_{i}} for all i {\displaystyle i} is always sufficient to represent T {\displaystyle T} exactly, but often T {\displaystyle T} can be compressed or efficiently approximately by choosing d i < n i {\displaystyle d_{i} Read more →

  • Lattice Miner

    Lattice Miner

    Lattice Miner is a formal concept analysis software tool for the construction, visualization and manipulation of concept lattices. It allows the generation of formal concepts and association rules as well as the transformation of formal contexts via apposition, subposition, reduction and object/attribute generalization, and the manipulation of concept lattices via approximation, projection and selection. Lattice Miner allows also the drawing of nested line diagrams. == Introduction == Formal concept analysis (FCA) is a branch of applied mathematics based on the formalization of concept and concept hierarchy and mainly used as a framework for conceptual clustering and rule mining. Over the last two decades, a collection of tools have emerged to help FCA users visualize and analyze concept lattices. They range from the earliest DOS-based implementations (e.g., ConImp and GLAD) to more recent implementations in Java like ToscanaJ, Galicia, ConExp and Coron. A main issue in the development of FCA tools is to visualize large concept lattices and provide efficient mechanisms to highlight patterns (e.g., concepts, associations) that could be relevant to the user. The initial objective of the FCA tool called Lattice Miner was to focus on visualization mechanisms for the representation of concept lattices, including nested line diagrams. Later on, many other interesting features were integrated into the tool. == Functional architecture of Lattice Miner == Lattice Miner is a Java-based platform whose functions are articulated around a core. The Lattice Miner core provides all low-level operations and structures for the representation and manipulation of contexts, lattices and association rules. Mainly, the core of Lattice Miner consists of three modules: context, concept and association rule modules. The user interface offers a context editor and concept lattice manipulator to assist the user in a set of tasks. The architecture of Lattice Miner is open and modular enough to allow the integration of new features and facilities in each one of its components. === Context module === The context module offers all the basic operations and structures to manipulate binary and valued contexts as well as context decomposition to produce nested line diagrams. Basic context operations include apposition, subposition, generalization, clarification, reduction as well as the complementary context computation. The module provides also the arrow relations (for context reduction and decomposition) [2]. The tool has an input LMB format and recognizes the binary format SLF found in Galicia and the format CEX produced by ConExp. === Concept module === The main function of the concept module is to generate the concepts of the current binary context and construct the corresponding lattice and nested structure (see Figures 2 and 3). It provides the user with basic operators such as projection, selection, and exact search as well as advanced features like pair approximation. Some known algorithms are included in this module such as Bordat’s procedure, Godin’s algorithm and NextClosure algorithm. The approximation feature implemented in Lattice Miner is based on the following idea: given a pair (X,Y) where X ⊆ G, and Y ⊆ M, is there a set of formal concepts (Ai,Bi) which are “close to” (X,Y)? To answer this question, The tool starts to identify the type of couple that the pair (X,Y) represents. It can be a formal concept, a protoconcept, a semiconcept or a preconcept. In the last case, the approximation is given by the interval [(X",X′),(Y′,Y")] and highlighted in the line diagram. === Association rule module === This module includes procedures for computing the (stem) Guigues–Duquenne base using NextClosure algorithm [3], as well as the generic and informative bases. Implications with negation can be obtained using the apposition of a context and its complementary. This module embeds also procedures for the computation of a non-redundant family C of implications and the closure of a set Y of attributes for the given implication set C. === User interface === The initial objective of Lattice Miner was to focus on lattice drawing and visualization either as a flat or nested structure by taking into account the cognitive process of human beings and known principles for lattice drawing (e.g., reducing the number of edge intersections, ensuring diagram symmetry). Some well-known visualization techniques were implemented such as focus & context and fisheye view. The basic idea behind focus & context visualization paradigm is to allow a viewer to see key (important) objects in full detail in the foreground (focus) while at the same time an overview of all the surrounding information (context) remains available in the background. Lattice Miner translates the focus & context paradigm into clear and blurred elements while the size of nodes and the intensity of their color were used to indicate their importance. Various forms of highlighting, labelling and animation are also provided. In order to better handle the display of large lattices, nested line diagrams are offered in the tool. Figure 3 shows the third level of the nested line diagram corresponding to the binary context of Figure 1 where three levels of nesting are defined. Each one of the inner nodes of this diagram represents a combination of attributes from the previous two (outer) levels. Real inner concepts (see the node on the left hand-side of the diagram) are identified by colored nodes while void elements are in grey color. Each node of levels 1 and 2 can be expanded to exhibit its internal line diagram. Both flat and nested diagrams can be saved as an image. Simple (flat) lattices can also be saved as an XML format file.

    Read more →
  • Aikuma

    Aikuma

    Aikuma is an Android app for collecting speech recordings with time-aligned translations. The app includes a text-free interface for consecutive interpretation, designed for users who are not literate. The Aikuma won Grand Prize in the Open Source Software World Challenge (2013). == Name == Aikuma means "meeting place" in Usarufa, a Papuan language where this software was first used in 2012. == History == Aikuma was developed with sponsorship from the National Science Foundation, including a $101,501 (US) project, "to use mobile telephones to collect larger amounts of data on undocumented endangered languages than would never be possible through usual fieldwork." Aikuma and its modified version (Lig-Aikuma) have been used for collecting substantial quantities of audio in remote indigenous villages. A modified version of the app, called Lig-Aikuma, has been developed at the Université Grenoble Alpes (LIG laboratory) and implements new features such as elicitation of speech from text, images and videos. == Similar Software == Lingua Libre is an online collaborative project and tool by the Wikimedia France association, which can be used as a tool for Language Preservation. Lingua Libre enables to record words, phrases, or sentences of any language, oral (audio recording) or signed (video recording). It is a highly efficient method to record endangered languages since up to 1000 words can be recorded per hour. All the content is under Free License, and speakers of minority languages are encouraged to record their own dialects.

    Read more →
  • Concept class

    Concept class

    In computational learning theory in mathematics, a concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept classes are a subject of computational learning theory. Concept class terminology frequently appears in model theory associated with probably approximately correct (PAC) learning. In this setting, if one takes a set Y as a set of (classifier output) labels, and X is a set of examples, the map c : X → Y {\displaystyle c:X\to Y} , i.e. from examples to classifier labels (where Y = { 0 , 1 } {\displaystyle Y=\{0,1\}} and where c is a subset of X), c is then said to be a concept. A concept class C {\displaystyle C} is then a collection of such concepts. Given a class of concepts C, a subclass D is reachable if there exists a sample s such that D contains exactly those concepts in C that are extensions to s. Not every subclass is reachable. == Background == A sample s {\displaystyle s} is a partial function from X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} . Identifying a concept with its characteristic function mapping X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} , it is a special case of a sample. Two samples are consistent if they agree on the intersection of their domains. A sample s ′ {\displaystyle s'} extends another sample s {\displaystyle s} if the two are consistent and the domain of s {\displaystyle s} is contained in the domain of s ′ {\displaystyle s'} . == Examples == Suppose that C = S + ( X ) {\displaystyle C=S^{+}(X)} . Then: the subclass { { x } } {\displaystyle \{\{x\}\}} is reachable with the sample s = { ( x , 1 ) } {\displaystyle s=\{(x,1)\}} ; the subclass S + ( Y ) {\displaystyle S^{+}(Y)} for Y ⊆ X {\displaystyle Y\subseteq X} are reachable with a sample that maps the elements of X − Y {\displaystyle X-Y} to zero; the subclass S ( X ) {\displaystyle S(X)} , which consists of the singleton sets, is not reachable. == Applications == Let C {\displaystyle C} be some concept class. For any concept c ∈ C {\displaystyle c\in C} , we call this concept 1 / d {\displaystyle 1/d} -good for a positive integer d {\displaystyle d} if, for all x ∈ X {\displaystyle x\in X} , at least 1 / d {\displaystyle 1/d} of the concepts in C {\displaystyle C} agree with c {\displaystyle c} on the classification of x {\displaystyle x} . The fingerprint dimension F D ( C ) {\displaystyle FD(C)} of the entire concept class C {\displaystyle C} is the least positive integer d {\displaystyle d} such that every reachable subclass C ′ ⊆ C {\displaystyle C'\subseteq C} contains a concept that is 1 / d {\displaystyle 1/d} -good for it. This quantity can be used to bound the minimum number of equivalence queries needed to learn a class of concepts according to the following inequality: F D ( C ) − 1 ≤ # E Q ( C ) ≤ ⌈ F D ( C ) ln ⁡ ( | C | ) ⌉ {\textstyle FD(C)-1\leq \#EQ(C)\leq \lceil FD(C)\ln(|C|)\rceil } .

    Read more →
  • Relief (feature selection)

    Relief (feature selection)

    Relief is an algorithm developed by Kenji Kira and Larry Rendell in 1992 that takes a filter-method approach to feature selection that is notably sensitive to feature interactions. It was originally designed for application to binary classification problems with discrete or numerical features. Relief calculates a feature score for each feature which can then be applied to rank and select top scoring features for feature selection. Alternatively, these scores may be applied as feature weights to guide downstream modeling. Relief feature scoring is based on the identification of feature value differences between nearest neighbor instance pairs. If a feature value difference is observed in a neighboring instance pair with the same class (a 'hit'), the feature score decreases. Alternatively, if a feature value difference is observed in a neighboring instance pair with different class values (a 'miss'), the feature score increases. The original Relief algorithm has since inspired a family of Relief-based feature selection algorithms (RBAs), including the ReliefF algorithm. Beyond the original Relief algorithm, RBAs have been adapted to (1) perform more reliably in noisy problems, (2) generalize to multi-class problems (3) generalize to numerical outcome (i.e. regression) problems, and (4) to make them robust to incomplete (i.e. missing) data. To date, the development of RBA variants and extensions has focused on four areas; (1) improving performance of the 'core' Relief algorithm, i.e. examining strategies for neighbor selection and instance weighting, (2) improving scalability of the 'core' Relief algorithm to larger feature spaces through iterative approaches, (3) methods for flexibly adapting Relief to different data types, and (4) improving Relief run efficiency. Their strengths are that they are not dependent on heuristics, they run in low-order polynomial time, and they are noise-tolerant and robust to feature interactions, as well as being applicable for binary or continuous data; however, it does not discriminate between redundant features, and low numbers of training instances fool the algorithm. == Relief Algorithm == Take a data set with n instances of p features, belonging to two known classes. Within the data set, each feature should be scaled to the interval [0 1] (binary data should remain as 0 and 1). The algorithm will be repeated m times. Start with a p-long weight vector (W) of zeros. At each iteration, take the feature vector (X) belonging to one random instance, and the feature vectors of the instance closest to X (by Euclidean distance) from each class. The closest same-class instance is called 'near-hit', and the closest different-class instance is called 'near-miss'. Update the weight vector such that W i = W i − ( x i − n e a r H i t i ) 2 + ( x i − n e a r M i s s i ) 2 , {\displaystyle W_{i}=W_{i}-(x_{i}-\mathrm {nearHit} _{i})^{2}+(x_{i}-\mathrm {nearMiss} _{i})^{2},} where i {\displaystyle i} indexes the components and runs from 1 to p. Thus the weight of any given feature decreases if it differs from that feature in nearby instances of the same class more than nearby instances of the other class, and increases in the reverse case. After m iterations, divide each element of the weight vector by m. This becomes the relevance vector. Features are selected if their relevance is greater than a threshold τ. Kira and Rendell's experiments showed a clear contrast between relevant and irrelevant features, allowing τ to be determined by inspection. However, it can also be determined by Chebyshev's inequality for a given confidence level (α) that a τ of 1/sqrt(αm) is good enough to make the probability of a Type I error less than α, although it is stated that τ can be much smaller than that. Relief was also described as generalizable to multinomial classification by decomposition into a number of binary problems. == ReliefF Algorithm == Kononenko et al. propose a number of updates to Relief. Firstly, they find the near-hit and near-miss instances using the Manhattan (L1) norm rather than the Euclidean (L2) norm, although the rationale is not specified. Furthermore, they found taking the absolute differences between xi and near-hiti, and xi and near-missi to be sufficient when updating the weight vector (rather than the square of those differences). === Reliable probability estimation === Rather than repeating the algorithm m times, implement it exhaustively (i.e. n times, once for each instance) for relatively small n (up to one thousand). Furthermore, rather than finding the single nearest hit and single nearest miss, which may cause redundant and noisy attributes to affect the selection of the nearest neighbors, ReliefF searches for k nearest hits and misses and averages their contribution to the weights of each feature. k can be tuned for any individual problem. === Incomplete data === In ReliefF, the contribution of missing values to the feature weight is determined using the conditional probability that two values should be the same or different, approximated with relative frequencies from the data set. This can be calculated if one or both features are missing. === Multi-class problems === Rather than use Kira and Rendell's proposed decomposition of a multinomial classification into a number of binomial problems, ReliefF searches for k near misses from each different class and averages their contributions for updating W, weighted with the prior probability of each class. == Other Relief-based Algorithm Extensions/Derivatives == The following RBAs are arranged chronologically from oldest to most recent. They include methods for improving (1) the core Relief algorithm concept, (2) iterative approaches for scalability, (3) adaptations to different data types, (4) strategies for computational efficiency, or (5) some combination of these goals. For more on RBAs see these book chapters or this most recent review paper. === RRELIEFF === Robnik-Šikonja and Kononenko propose further updates to ReliefF, making it appropriate for regression. === Relieved-F === Introduced deterministic neighbor selection approach and a new approach for incomplete data handling. === Iterative Relief === Implemented method to address bias against non-monotonic features. Introduced the first iterative Relief approach. For the first time, neighbors were uniquely determined by a radius threshold and instances were weighted by their distance from the target instance. === I-RELIEF === Introduced sigmoidal weighting based on distance from target instance. All instance pairs (not just a defined subset of neighbors) contributed to score updates. Proposed an on-line learning variant of Relief. Extended the iterative Relief concept. Introduced local-learning updates between iterations for improved convergence. === TuRF (a.k.a. Tuned ReliefF) === Specifically sought to address noise in large feature spaces through the recursive elimination of features and the iterative application of ReliefF. === Evaporative Cooling ReliefF === Similarly seeking to address noise in large feature spaces. Utilized an iterative `evaporative' removal of lowest quality features using ReliefF scores in association with mutual information. === EReliefF (a.k.a. Extended ReliefF) === Addressing issues related to incomplete and multi-class data. === VLSReliefF (a.k.a. Very Large Scale ReliefF) === Dramatically improves the efficiency of detecting 2-way feature interactions in very large feature spaces by scoring random feature subsets rather than the entire feature space. === ReliefMSS === Introduced calculation of feature weights relative to average feature 'diff' between instance pairs. === SURF === SURF identifies nearest neighbors (both hits and misses) based on a distance threshold from the target instance defined by the average distance between all pairs of instances in the training data. Results suggest improved power to detect 2-way epistatic interactions over ReliefF. === SURF (a.k.a. SURFStar) === SURF extends the SURF algorithm to not only utilized 'near' neighbors in scoring updates, but 'far' instances as well, but employing inverted scoring updates for 'far instance pairs. Results suggest improved power to detect 2-way epistatic interactions over SURF, but an inability to detect simple main effects (i.e. univariate associations). === SWRF === SWRF extends the SURF algorithm adopting sigmoid weighting to take distance from the threshold into account. Also introduced a modular framework for further developing RBAs called MoRF. === MultiSURF (a.k.a. MultiSURFStar) === MultiSURF extends the SURF algorithm adapting the near/far neighborhood boundaries based on the average and standard deviation of distances from the target instance to all others. MultiSURF uses the standard deviation to define a dead-band zone where 'middle-distance' instances do not contribute to scoring. Evidence suggests MultiSURF performs best in detecting pure 2-way feature interactions. === Reli

    Read more →