AI Generator Song Maker

AI Generator Song Maker — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Linux Trace Toolkit

    Linux Trace Toolkit

    The Linux Trace Toolkit (LTT) is a set of tools that is designed to log program execution details from a patched Linux kernel and then perform various analyses on them, using console-based and graphical tools. LTT has been mostly superseded by its successor LTTng (Linux Trace Toolkit Next Generation). LTT allows the user to see in-depth information about the processes that were running during the trace period, including when context switches occurred, how long the processes were blocked for, and how much time the processes spent executing vs. how much time the processes were blocked. The data is logged to a text file and various console-based and graphical (GTK+) tools are provided for interpreting that data. In order to do data collection, LTT requires a patched Linux kernel. The authors of LTT claim that the performance hit for a patched kernel compared to a regular kernel is minimal; Their testing has reportedly shown that this is less than 2.5% on a "normal use" system (measured using batches of kernel makes) and less than 5% on a file I/O intensive system (measured using batches of tar). == Usage == === Collecting trace data === Data collection is Started by: trace 15 foo This command will cause the LTT tracedaemon to do a trace that lasts for 15 seconds, writing trace data to foo.trace and process information from the /proc filesystem to foo.proc. The trace command is actually a script which runs the program tracedaemon with some common options. It is possible to run tracedaemon directly and in that case, the user can use a number of command-line options to control the data which is collected. For the complete list of options supported by tracedaemon, see the online manual page for tracedaemon. === Viewing the results === Viewing the results of a trace can be accomplished with: traceview foo This command will launch a graphical (GTK+) traceview tool that will read from foo.trace and foo.proc. This tool can show information in various interesting ways, including Event Graph, Process Analysis, and Raw Trace. The Event Graph is perhaps the most interesting view, showing the exact timing of events like page faults, interrupts, and context switches, in a simple graphical way. The traceview command is a wrapper for a program called tracevisualizer. For the complete list of options supported by tracevisualizer, see the online manual page for tracevisualizer.

    Read more →
  • Hinge loss

    Hinge loss

    In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs). For an intended output t = ±1 and a classifier score y, the hinge loss of the prediction y is defined as ℓ ( y ) = max ( 0 , 1 − t ⋅ y ) {\displaystyle \ell (y)=\max(0,1-t\cdot y)} Note that y {\displaystyle y} should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, y = w ⋅ x + b {\displaystyle y=\mathbf {w} \cdot \mathbf {x} +b} , where ( w , b ) {\displaystyle (\mathbf {w} ,b)} are the parameters of the hyperplane and x {\displaystyle \mathbf {x} } is the input variable(s). When t and y have the same sign (meaning y predicts the right class) and | y | ≥ 1 {\displaystyle |y|\geq 1} , the hinge loss ℓ ( y ) = 0 {\displaystyle \ell (y)=0} . When they have opposite signs, ℓ ( y ) {\displaystyle \ell (y)} increases linearly with y, and similarly if | y | < 1 {\displaystyle |y|<1} , even if it has the same sign (correct prediction, but not by enough margin). The Hinge loss is not a proper scoring rule. == Extensions == While binary SVMs are commonly extended to multiclass classification in a one-vs.-all or one-vs.-one fashion, it is also possible to extend the hinge loss itself for such an end. Several different variations of multiclass hinge loss have been proposed. For example, Crammer and Singer defined it for a linear classifier as ℓ ( y ) = max ( 0 , 1 + max y ≠ t w y x − w t x ) {\displaystyle \ell (y)=\max(0,1+\max _{y\neq t}\mathbf {w} _{y}\mathbf {x} -\mathbf {w} _{t}\mathbf {x} )} , where t {\displaystyle t} is the target label, w t {\displaystyle \mathbf {w} _{t}} and w y {\displaystyle \mathbf {w} _{y}} are the model parameters. Weston and Watkins provided a similar definition, but with a sum rather than a max: ℓ ( y ) = ∑ y ≠ t max ( 0 , 1 + w y x − w t x ) {\displaystyle \ell (y)=\sum _{y\neq t}\max(0,1+\mathbf {w} _{y}\mathbf {x} -\mathbf {w} _{t}\mathbf {x} )} . In structured prediction, the hinge loss can be further extended to structured output spaces. Structured SVMs with margin rescaling use the following variant, where w denotes the SVM's parameters, y the SVM's predictions, φ the joint feature function, and Δ the Hamming loss: ℓ ( y ) = max ( 0 , Δ ( y , t ) + ⟨ w , ϕ ( x , y ) ⟩ − ⟨ w , ϕ ( x , t ) ⟩ ) = max ( 0 , max y ∈ Y ( Δ ( y , t ) + ⟨ w , ϕ ( x , y ) ⟩ ) − ⟨ w , ϕ ( x , t ) ⟩ ) {\displaystyle {\begin{aligned}\ell (\mathbf {y} )&=\max(0,\Delta (\mathbf {y} ,\mathbf {t} )+\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {y} )\rangle -\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {t} )\rangle )\\&=\max(0,\max _{y\in {\mathcal {Y}}}\left(\Delta (\mathbf {y} ,\mathbf {t} )+\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {y} )\rangle \right)-\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {t} )\rangle )\end{aligned}}} . == Optimization == The hinge loss is a convex function, so many of the usual convex optimizers used in machine learning can work with it. It is not differentiable, but has a subgradient with respect to model parameters w of a linear SVM with score function y = w ⋅ x {\displaystyle y=\mathbf {w} \cdot \mathbf {x} } that is given by ∂ ℓ ∂ w i = { − t ⋅ x i if t ⋅ y < 1 , 0 otherwise . {\displaystyle {\frac {\partial \ell }{\partial w_{i}}}={\begin{cases}-t\cdot x_{i}&{\text{if }}t\cdot y<1,\\0&{\text{otherwise}}.\end{cases}}} However, since the derivative of the hinge loss at t y = 1 {\displaystyle ty=1} is undefined, smoothed versions may be preferred for optimization, such as Rennie and Srebro's ℓ ( y ) = { 1 2 − t y if t y ≤ 0 , 1 2 ( 1 − t y ) 2 if 0 < t y < 1 , 0 if 1 ≤ t y {\displaystyle \ell (y)={\begin{cases}{\frac {1}{2}}-ty&{\text{if}}~~ty\leq 0,\\{\frac {1}{2}}(1-ty)^{2}&{\text{if}}~~0 Read more →

  • Multifactor dimensionality reduction

    Multifactor dimensionality reduction

    Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. MDR was designed specifically to identify nonadditive interactions among discrete variables that influence a binary outcome and is considered a nonparametric and model-free alternative to traditional statistical methods such as logistic regression. The basis of the MDR method is a constructive induction or feature engineering algorithm that converts two or more variables or attributes to a single attribute. This process of constructing a new attribute changes the representation space of the data. The end goal is to create or discover a representation that facilitates the detection of nonlinear or nonadditive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data. == Illustrative example == Consider the following simple example using the exclusive OR (XOR) function. XOR is a logical operator that is commonly used in data mining and machine learning as an example of a function that is not linearly separable. The table below represents a simple dataset where the relationship between the attributes (X1 and X2) and the class variable (Y) is defined by the XOR function such that Y = X1 XOR X2. Table 1 A machine learning algorithm would need to discover or approximate the XOR function in order to accurately predict Y using information about X1 and X2. An alternative strategy would be to first change the representation of the data using constructive induction to facilitate predictive modeling. The MDR algorithm would change the representation of the data (X1 and X2) in the following manner. MDR starts by selecting two attributes. In this simple example, X1 and X2 are selected. Each combination of values for X1 and X2 are examined and the number of times Y=1 and/or Y=0 is counted. In this simple example, Y=1 occurs zero times and Y=0 occurs once for the combination of X1=0 and X2=0. With MDR, the ratio of these counts is computed and compared to a fixed threshold. Here, the ratio of counts is 0/1 which is less than our fixed threshold of 1. Since 0/1 < 1 we encode a new attribute (Z) as a 0. When the ratio is greater than one we encode Z as a 1. This process is repeated for all unique combinations of values for X1 and X2. Table 2 illustrates our new transformation of the data. Table 2 The machine learning algorithm now has much less work to do to find a good predictive function. In fact, in this very simple example, the function Y = Z has a classification accuracy of 1. A nice feature of constructive induction methods such as MDR is the ability to use any data mining or machine learning method to analyze the new representation of the data. Decision trees, neural networks, or a naive Bayes classifier could be used in combination with measures of model quality such as balanced accuracy and mutual information. == Machine learning with MDR == As illustrated above, the basic constructive induction algorithm in MDR is very simple. However, its implementation for mining patterns from real data can be computationally complex. As with any machine learning algorithm there is always concern about overfitting. That is, machine learning algorithms are good at finding patterns in completely random data. It is often difficult to determine whether a reported pattern is an important signal or just chance. One approach is to estimate the generalizability of a model to independent datasets using methods such as cross-validation. Models that describe random data typically don't generalize. Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the chance to overfit. Permutation testing makes it possible to generate an empirical p-value for the result. Replication in independent data may also provide evidence for an MDR model but can be sensitive to difference in the data sets. These approaches have all been shown to be useful for choosing and evaluating MDR models. An important step in a machine learning exercise is interpretation. Several approaches have been used with MDR including entropy analysis and pathway analysis. Tips and approaches for using MDR to model gene-gene interactions have been reviewed. == Extensions to MDR == Numerous extensions to MDR have been introduced. These include family-based methods, fuzzy methods, covariate adjustment, odds ratios, risk scores, survival methods, robust methods, methods for quantitative traits, and many others. == Applications of MDR == MDR has mostly been applied to detecting gene-gene interactions or epistasis in genetic studies of common human diseases such as atrial fibrillation, autism, bladder cancer, breast cancer, cardiovascular disease, hypertension, obesity, pancreatic cancer, prostate cancer and tuberculosis. It has also been applied to other biomedical problems such as the genetic analysis of pharmacology outcomes. A central challenge is the scaling of MDR to big data such as that from genome-wide association studies (GWAS). Several approaches have been used. One approach is to filter the features prior to MDR analysis. This can be done using biological knowledge through tools such as BioFilter. It can also be done using computational tools such as ReliefF. Another approach is to use stochastic search algorithms such as genetic programming to explore the search space of feature combinations. Yet another approach is a brute-force search using high-performance computing. == Implementations == www.epistasis.org provides an open-source and freely-available MDR software package. An R package for MDR. An sklearn-compatible Python implementation. An R package for Model-Based MDR. MDR in Weka. Generalized MDR.

    Read more →
  • Probably approximately correct learning

    Probably approximately correct learning

    In computational learning theory, probably approximately correct (PAC) learning is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant. In this framework, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions. The goal is that, with high probability (the "probably" part), the selected function will have low generalization error (the "approximately correct" part). The learner must be able to learn the concept given any arbitrary approximation ratio, probability of success, or distribution of the samples. The model was later extended to treat noise (misclassified samples). An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning. In particular, the learner is expected to find efficient functions (time and space requirements bounded to a polynomial of the example size), and the learner itself must implement an efficient procedure (requiring an example count bounded to a polynomial of the concept size, modified by the approximation and likelihood bounds). == Definitions and terminology == In order to give the definition for something that is PAC-learnable, we first have to introduce some terminology. For the following definitions, two examples will be used. The first is the problem of character recognition given an array of n {\displaystyle n} bits encoding a binary-valued image. The other example is the problem of finding an interval that will correctly classify points within the interval as positive and the points outside of the range as negative. Let X {\displaystyle X} be a set called the instance space or the encoding of all the samples. In the character recognition problem, the instance space is X = { 0 , 1 } n {\displaystyle X=\{0,1\}^{n}} . In the interval problem the instance space, X {\displaystyle X} , is the set of all bounded intervals in R {\displaystyle \mathbb {R} } , where R {\displaystyle \mathbb {R} } denotes the set of all real numbers. A concept is a subset c ⊂ X {\displaystyle c\subset X} . One concept is the set of all patterns of bits in X = { 0 , 1 } n {\displaystyle X=\{0,1\}^{n}} that encode a picture of the letter "P". An example concept from the second example is the set of open intervals, { ( a , b ) ∣ 0 ≤ a ≤ π / 2 , π ≤ b ≤ 13 } {\displaystyle \{(a,b)\mid 0\leq a\leq \pi /2,\pi \leq b\leq {\sqrt {13}}\}} , each of which contains only the positive points. A concept class C {\displaystyle C} is a collection of concepts over X {\displaystyle X} . This could be the set of all subsets of the array of bits that are skeletonized 4-connected (width of the font is 1). Let EX ⁡ ( c , D ) {\displaystyle \operatorname {EX} (c,D)} be a procedure that draws an example, x {\displaystyle x} , using a probability distribution D {\displaystyle D} and gives the correct label c ( x ) {\displaystyle c(x)} , that is 1 if x ∈ c {\displaystyle x\in c} and 0 otherwise. Now, given 0 < ϵ , δ < 1 {\displaystyle 0<\epsilon ,\delta <1} , assume there is an algorithm A {\displaystyle A} and a polynomial p {\displaystyle p} in 1 / ϵ , 1 / δ {\displaystyle 1/\epsilon ,1/\delta } (and other relevant parameters of the class C {\displaystyle C} ) such that, given a sample of size p {\displaystyle p} drawn according to EX ⁡ ( c , D ) {\displaystyle \operatorname {EX} (c,D)} , then, with probability of at least 1 − δ {\displaystyle 1-\delta } , A {\displaystyle A} outputs a hypothesis h ∈ C {\displaystyle h\in C} that has an average error less than or equal to ϵ {\displaystyle \epsilon } on X {\displaystyle X} with the same distribution D {\displaystyle D} . Further if the above statement for algorithm A {\displaystyle A} is true for every concept c ∈ C {\displaystyle c\in C} and for every distribution D {\displaystyle D} over X {\displaystyle X} , and for all 0 < ϵ , δ < 1 {\displaystyle 0<\epsilon ,\delta <1} then C {\displaystyle C} is (efficiently) PAC learnable (or distribution-free PAC learnable). We can also say that A {\displaystyle A} is a PAC learning algorithm for C {\displaystyle C} . == Equivalence == Under some regularity conditions these conditions are equivalent: The concept class C is PAC learnable. The VC dimension of C is finite. C is a uniformly Glivenko-Cantelli class. C is compressible in the sense of Littlestone and Warmuth

    Read more →
  • EfficientNet

    EfficientNet

    EfficientNet is a family of convolutional neural networks (CNNs) for computer vision published by researchers at Google AI in 2019. Its key innovation is compound scaling, which uniformly scales all dimensions of depth, width, and resolution using a single parameter. EfficientNet models have been adopted in various computer vision tasks, including image classification, object detection, and segmentation. == Compound scaling == EfficientNet introduces compound scaling, which, instead of scaling one dimension of the network at a time, such as depth (number of layers), width (number of channels), or resolution (input image size), uses a compound coefficient ϕ {\displaystyle \phi } to scale all three dimensions simultaneously. Specifically, given a baseline network, the depth, width, and resolution are scaled according to the following equations: depth multiplier: d = α ϕ width multiplier: w = β ϕ resolution multiplier: r = γ ϕ {\displaystyle {\begin{aligned}{\text{depth multiplier: }}d&=\alpha ^{\phi }\\{\text{width multiplier: }}w&=\beta ^{\phi }\\{\text{resolution multiplier: }}r&=\gamma ^{\phi }\end{aligned}}} subject to α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} and α ≥ 1 , β ≥ 1 , γ ≥ 1 {\displaystyle \alpha \geq 1,\beta \geq 1,\gamma \geq 1} . The α ⋅ β 2 ⋅ γ 2 ≈ 2 {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2} condition is such that increasing ϕ {\displaystyle \phi } by a factor of ϕ 0 {\displaystyle \phi _{0}} would increase the total FLOPs of running the network on an image approximately 2 ϕ 0 {\displaystyle 2^{\phi _{0}}} times. The hyperparameters α {\displaystyle \alpha } , β {\displaystyle \beta } , and γ {\displaystyle \gamma } are determined by a small grid search. The original paper suggested 1.2, 1.1, and 1.15, respectively. Architecturally, they optimized the choice of modules by neural architecture search (NAS), and found that the inverted bottleneck convolution (which they called MBConv) used in MobileNet worked well. The EfficientNet family is a stack of MBConv layers, with shapes determined by the compound scaling. The original publication consisted of 8 models, from EfficientNet-B0 to EfficientNet-B7, with increasing model size and accuracy. EfficientNet-B0 is the baseline network, and subsequent models are obtained by scaling the baseline network by increasing ϕ {\displaystyle \phi } . == Variants == EfficientNet has been adapted for fast inference on edge TPUs and centralized TPU or GPU clusters by NAS. EfficientNet V2 was published in June 2021. The architecture was improved by further NAS search with more types of convolutional layers. It also introduced a training method, which progressively increases image size during training, and uses regularization techniques like dropout, RandAugment, and Mixup. The authors claim this approach mitigates accuracy drops often associated with progressive resizing.

    Read more →
  • Curriculum learning

    Curriculum learning

    Curriculum learning is a technique in machine learning in which a model is trained on examples of increasing difficulty, where the definition of "difficulty" may be provided externally or discovered as part of the training process. This is intended to attain good performance more quickly, or to converge to a better local optimum if the global optimum is not found. == Approach == Most generally, curriculum learning is the technique of successively increasing the difficulty of examples in the training set that is presented to a model over multiple training iterations. This can produce better results than exposing the model to the full training set immediately under some circumstances; most typically, when the model is able to learn general principles from easier examples, and then gradually incorporate more complex and nuanced information as harder examples are introduced, such as edge cases. This has been shown to work in many domains, most likely as a form of regularization. There are several major variations in how the technique is applied: A concept of "difficulty" must be defined. This may come from human annotation or an external heuristic; for example in language modeling, shorter sentences might be classified as easier than longer ones. Another approach is to use the performance of another model, with examples accurately predicted by that model being classified as easier (providing a connection to boosting). Difficulty can be increased steadily or in distinct epochs, and in a deterministic schedule or according to a probability distribution. This may also be moderated by a requirement for diversity at each stage, in cases where easier examples are likely to be disproportionately similar to each other. Applications must also decide the schedule for increasing the difficulty. Simple approaches may use a fixed schedule, such as training on easy examples for half of the available iterations and then all examples for the second half. Other approaches use self-paced learning to increase the difficulty in proportion to the performance of the model on the current set. Since curriculum learning only concerns the selection and ordering of training data, it can be combined with many other techniques in machine learning. The success of the method assumes that a model trained for an easier version of the problem can generalize to harder versions, so it can be seen as a form of transfer learning. Some authors also consider curriculum learning to include other forms of progressively increasing complexity, such as increasing the number of model parameters. It is frequently combined with reinforcement learning, such as learning a simplified version of a game first. Some domains have shown success with anti-curriculum learning: training on the most difficult examples first. One example is the ACCAN method for speech recognition, which trains on the examples with the lowest signal-to-noise ratio first. == History == The term "curriculum learning" was introduced by Yoshua Bengio et al in 2009, with reference to the psychological technique of shaping in animals and structured education for humans: beginning with the simplest concepts and then building on them. The authors also note that the application of this technique in machine learning has its roots in the early study of neural networks such as Jeffrey Elman's 1993 paper Learning and development in neural networks: the importance of starting small. Bengio et al showed good results for problems in image classification, such as identifying geometric shapes with progressively more complex forms, and language modeling, such as training with a gradually expanding vocabulary. They conclude that, for curriculum strategies, "their beneficial effect is most pronounced on the test set", suggesting good generalization. The technique has since been applied to many other domains: Natural language processing: Part-of-speech tagging Intent detection Sentiment analysis Machine translation Speech recognition Language model pre-training Image recognition: Facial recognition Object detection Reinforcement learning: Game-playing Graph learning Matrix factorization

    Read more →
  • FastICA

    FastICA

    FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology. Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration. == Algorithm == === Prewhitening the data === Let the X := ( x i j ) ∈ R N × M {\displaystyle \mathbf {X} :=(x_{ij})\in \mathbb {R} ^{N\times M}} denote the input data matrix, M {\displaystyle M} the number of columns corresponding with the number of samples of mixed signals and N {\displaystyle N} the number of rows corresponding with the number of independent source signals. The input data matrix X {\displaystyle \mathbf {X} } must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it. Centering the data entails demeaning each component of the input data X {\displaystyle \mathbf {X} } , that is, for each i = 1 , … , N {\displaystyle i=1,\ldots ,N} and j = 1 , … , M {\displaystyle j=1,\ldots ,M} . After centering, each row of X {\displaystyle \mathbf {X} } has an expected value of 0 {\displaystyle 0} . Whitening the data requires a linear transformation L : R N × M → R N × M {\displaystyle \mathbf {L} :\mathbb {R} ^{N\times M}\to \mathbb {R} ^{N\times M}} of the centered data so that the components of L ( X ) {\displaystyle \mathbf {L} (\mathbf {X} )} are uncorrelated and have variance one. More precisely, if X {\displaystyle \mathbf {X} } is a centered data matrix, the covariance of L x := L ( X ) {\displaystyle \mathbf {L} _{\mathbf {x} }:=\mathbf {L} (\mathbf {X} )} is the ( N × N ) {\displaystyle (N\times N)} -dimensional identity matrix, that is, A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data X {\displaystyle \mathbf {X} } , E { X X T } = E D E T {\displaystyle E\left\{\mathbf {X} \mathbf {X} ^{T}\right\}=\mathbf {E} \mathbf {D} \mathbf {E} ^{T}} , where E {\displaystyle \mathbf {E} } is the matrix of eigenvectors and D {\displaystyle \mathbf {D} } is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by === Single component extraction === The iterative algorithm finds the direction for the weight vector w ∈ R N {\displaystyle \mathbf {w} \in \mathbb {R} ^{N}} that maximizes a measure of non-Gaussianity of the projection w T X {\displaystyle \mathbf {w} ^{T}\mathbf {X} } , with X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} denoting a prewhitened data matrix as described above. Note that w {\displaystyle \mathbf {w} } is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function f ( u ) {\displaystyle f(u)} , its first derivative g ( u ) {\displaystyle g(u)} , and its second derivative g ′ ( u ) {\displaystyle g^{\prime }(u)} . Hyvärinen states that the functions are useful for general purposes, while may be highly robust. The steps for extracting the weight vector w {\displaystyle \mathbf {w} } for single component in FastICA are the following: Randomize the initial weight vector w {\displaystyle \mathbf {w} } Let w + ← E { X g ( w T X ) T } − E { g ′ ( w T X ) } w {\displaystyle \mathbf {w} ^{+}\leftarrow E\left\{\mathbf {X} g(\mathbf {w} ^{T}\mathbf {X} )^{T}\right\}-E\left\{g'(\mathbf {w} ^{T}\mathbf {X} )\right\}\mathbf {w} } , where E { . . . } {\displaystyle E\left\{...\right\}} means averaging over all column-vectors of matrix X {\displaystyle \mathbf {X} } Let w ← w + / ‖ w + ‖ {\displaystyle \mathbf {w} \leftarrow \mathbf {w} ^{+}/\|\mathbf {w} ^{+}\|} If not converged, go back to 2 === Multiple component extraction === The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, 1 M {\displaystyle \mathbf {1_{M}} } is a column vector of 1's of dimension M {\displaystyle M} . Algorithm FastICA Input: C {\displaystyle C} Number of desired components Input: X ∈ R N × M {\displaystyle \mathbf {X} \in \mathbb {R} ^{N\times M}} Prewhitened matrix, where each column represents an N {\displaystyle N} -dimensional sample, where C <= N {\displaystyle C<=N} Output: W ∈ R N × C {\displaystyle \mathbf {W} \in \mathbb {R} ^{N\times C}} Un-mixing matrix where each column projects X {\displaystyle \mathbf {X} } onto independent component. Output: S ∈ R C × M {\displaystyle \mathbf {S} \in \mathbb {R} ^{C\times M}} Independent components matrix, with M {\displaystyle M} columns representing a sample with C {\displaystyle C} dimensions. for p in 1 to C: w p ← {\displaystyle \mathbf {w_{p}} \leftarrow } Random vector of length N while w p {\displaystyle \mathbf {w_{p}} } changes w p ← 1 M X g ( w p T X ) T − 1 M g ′ ( w p T X ) 1 M w p {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {1}{M}}\mathbf {X} g(\mathbf {w_{p}} ^{T}\mathbf {X} )^{T}-{\frac {1}{M}}g'(\mathbf {w_{p}} ^{T}\mathbf {X} )\mathbf {1_{M}} \mathbf {w_{p}} } w p ← w p − ∑ j = 1 p − 1 ( w p T w j ) w j {\displaystyle \mathbf {w_{p}} \leftarrow \mathbf {w_{p}} -\sum _{j=1}^{p-1}(\mathbf {w_{p}} ^{T}\mathbf {w_{j}} )\mathbf {w_{j}} } w p ← w p ‖ w p ‖ {\displaystyle \mathbf {w_{p}} \leftarrow {\frac {\mathbf {w_{p}} }{\|\mathbf {w_{p}} \|}}} output W ← [ w 1 , … , w C ] {\displaystyle \mathbf {W} \leftarrow {\begin{bmatrix}\mathbf {w_{1}} ,\dots ,\mathbf {w_{C}} \end{bmatrix}}} output S ← W T X {\displaystyle \mathbf {S} \leftarrow \mathbf {W^{T}} \mathbf {X} }

    Read more →
  • Ilastik

    Ilastik

    ilastik is free open source software for image classification and segmentation. No previous experience in image processing is required to run the software. Since 2018 ilastik is further developed and maintained by Anna Kreshuk's group at European Molecular Biology Laboratory. == Features == ilastik allows user to annotate an arbitrary number of classes in images with a mouse interface. Using these user annotations and the generic (nonlinear) image features, the user can train a random forest classifier. Trained ilastik classifiers can be applied new data not included in the training set in ilastik via its batch processing functionality, or without using the graphical user interface, in headless mode. ilastik can be integrated into various related tools: Pre-trained workflows can be executed directly from ImageJ/Fiji using the ilastik-ImageJ plugin. Pre-trained ilastik Pixel Classification workflows can be run directly in Python with the ilastik Python package, which is available via conda. ilastik has a CellProfiler module to use ilastik classifiers to process images within a CellProfiler framework. == History == ilastik was first released in 2011 by scientists at the Heidelberg Collaboratory for Image Processing (HCI), University of Heidelberg. == Application == The Interactive Learning and Segmentation Toolkit Carving Cell classification and neuron classification Synapse detection Cell tracking Neural Network Classification == Resources == ilastik project is hosted on GitHub. It is a collaborative project, any contributions such as comments, bug reports, bug fixes or code contributions are welcome. The ilastik team can be contacted for user support on the image.sc forum.

    Read more →
  • Mozilla VPN

    Mozilla VPN

    Mozilla VPN is an open-source virtual private network developed by Mozilla. It launched in beta as Firefox Private Network on September 10, 2019, and officially launched on July 15, 2020, as Mozilla VPN. Mozilla VPN should not be confused with the built-in VPN in Firefox since version 149 released in March 2026, which is free with a monthly data limit of 50 GB but only masks traffic that originates in Firefox unlike Mozilla VPN that protects the entire device. == History == The Firefox Private Network web browser extension beta version was released on September 10, 2019, as part of the relaunch of Mozilla's Test Pilot Program, a program that allowed Firefox users to test experimental new features which had been shuttered in January 2019. The beta of the subscription-based standalone virtual private network for Android, Microsoft Windows, and Chromebook launched on February 19, 2020, with the iOS version following soon after. Firefox Private Network was rebranded as "Mozilla VPN" on June 18, 2020, and officially launched as Mozilla VPN on July 15, 2020. At launch, Mozilla VPN was available in six countries (the United States, Canada, the United Kingdom, Singapore, Malaysia, and New Zealand) for Windows 10, Android, and iOS (beta). Over time, the service also launched in Germany, France, Italy, Spain, Switzerland, Austria, Belgium, Netherlands, Ireland, Finland, Sweden, Poland, Czechia, Hungary, Romania, Bulgaria, Slovakia, Portugal, Denmark, Croatia, Lithuania, Slovenia, Latvia, Luxembourg, Estonia, Cyprus, and Malta. == Audits history == Cybersecurity firm Cure53 conducted a security audit for Mozilla VPN in August 2020 and identified multiple vulnerabilities, including one critical-severity vulnerability. In March 2021, Cure53 conducted a second security audit, which noted significant improvements since the 2020 audit. The second audit identified multiple issues, including two medium-severity and one high-severity vulnerability, but concluded that by the time of publication, only one vulnerability remained unresolved, and that it would require "a strong state-funded attacker-model" to be exploitable. Mozilla disclosed most of the vulnerabilities in July 2021 and released the full report by Cure53 in August 2021. In April 2023, Cure53 conducted a third security audit, the results of which Mozilla disclosed in December that year, along with the full report by Cure53. == Features == Mozilla VPN masks the user's IP address, hiding the user's location data from the websites accessed by the user, and encrypts all network activity. The service allows for up to 5 simultaneous connections, to any of more than 500 servers in 30+ countries, and is available on the mobile operating systems iOS and Android and the desktop operating systems Microsoft Windows, macOS and Linux. Mozilla VPN's infrastructure is provided by the Swedish Mullvad VPN service, which uses the WireGuard VPN protocol. The VPN software comes with additional features, like recommended server locations, the ability to block ads, block ad trackers and malware, the ability to exclude certain applications from protection, the ability to set multi-hop connections, and to set custom DNS servers. When used with Firefox and the official extension, Mozilla VPN allows the use of different settings per container as well as bypassing the VPN for specific websites.

    Read more →
  • Tensor sketch

    Tensor sketch

    In statistics, machine learning and algorithms, a tensor sketch is a type of dimensionality reduction that is particularly efficient when applied to vectors that have tensor structure. Such a sketch can be used to speed up explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms. == Mathematical definition == Mathematically, a dimensionality reduction or sketching matrix is a matrix M ∈ R k × d {\displaystyle M\in \mathbb {R} ^{k\times d}} , where k < d {\displaystyle k Read more →

  • Generalized blockmodeling

    Generalized blockmodeling

    In generalized blockmodeling, the blockmodeling is done by "the translation of an equivalence type into a set of permitted block types", which differs from the conventional blockmodeling, which is using the indirect approach. It's a special instance of the direct blockmodeling approach. Generalized blockmodeling was introduced in 1994 by Patrick Doreian, Vladimir Batagelj and Anuška Ferligoj. == Definition == Generalized blockmodeling approach is a direct one, "where the optimal partition(s) is (are) identified based on minimal values of a compatible criterion function defined by the difference between empirical blocks and corresponding ideal blocks". At the same time, the much broader set of block types is introduced (while in conventional blockmodeling only certain types are used). The conventional blockmodeling is inductive due to nonspecification of neither the clusters or the location of block types, while in generalized blockmodeling the blockmodel is specified with more detail than just the permition of certain block types (e.g., prespecification). Further, it's possible to define departures from the permitted (ideal) blocktype, using criterion function. Using local optimization procedure, firstly the initial clustering (with specified number of clusters is done, based on random creation. How the clusters are neighboring to each other, is based on two transformations: 1) a vertex is moved from one to another cluster or 2) a pair of vertices is interchanged between two different clusters. This process of transformation steps is repeated many times, until only the best fitting partitions (with the minimized value of the criterion function) are kept as blockmodels for the future exploration of the network. Different types of generalized blockmodeling are: generalized binary blockmodeling, generalized valued blockmodeling and generalized homogeneity blockmodeling. == Benefits == According to Patrick Doreian, the benefits of generalized blockmodeling, are as follows: usage of explicit criterion function, compatible with a given type of equivalence, results to in-built measure of fit, which is integral to the establishment of the blockmodels (in conventional blockmodeling, there is no compelling and coherent measures of fit); partitions, based on generalized blockmodeling, regularly outperform and never perform less well than the partitions, based on conventional approach; with generalized blockmodeling it's possible to specify new types of blockmodels; this potentially unlimited set of new block types also results in permittion of inclusion of substantively driven blockmodels; in generalized blockmodeling, the specification of the block types and the location of some of them in the blockmodel is possible; researcher can speficy which (pair of) vertices must be (not) clustered together; this approach also allows the imposition of penalties, resulting into identification of empirical null blocks without inconsistencies with a corresponding ideal null block. == Problems == According to Doreian, the problems of generalized blockmodeling, are as follows: unknown sensitivity to particular data features, examination of boundary problems, computationally burdensome, which results in a constraint regarding practical network size (generalized blockmodeling is thus primarily used to analyse smaller networks (below 100 units)), identifying structure from incomplete network information, most of generalized blockmodeling is based on binary networks, but there is also development in the field of valued networks, criterion function is minimized for a specified blockmodel, with results in issues of evaluating statistically, based on the structural data alone, problems regarding three dimensional network data, problems regarding the evolution of fundamental network structure. == Book == The book with the same title, Generalized blockmodeling, written by Patrick Doreian, Vladimir Batagelj and Anuška Ferligoj, was in 2007 awarded the Harrison White Outstanding Book Award by the Mathematical Sociology Section of American Sociological Association.

    Read more →
  • Sammon mapping

    Sammon mapping

    Sammon mapping or Sammon projection is an algorithm that maps a high-dimensional space to a space of lower dimensionality (see multidimensional scaling) by trying to preserve the structure of inter-point distances in high-dimensional space in the lower-dimension projection. It is particularly suited for use in exploratory data analysis. The method was proposed by John W. Sammon in 1969. It is considered a non-linear approach as the mapping cannot be represented as a linear combination of the original variables as possible in techniques such as principal component analysis, which also makes it more difficult to use for classification applications. Denote the distance between ith and jth objects in the original space by d i j ∗ {\displaystyle \scriptstyle d_{ij}^{}} , and the distance between their projections by d i j {\displaystyle \scriptstyle d_{ij}^{}} . Sammon's mapping aims to minimize the following error function, which is often referred to as Sammon's stress or Sammon's error: E = 1 ∑ i < j d i j ∗ ∑ i < j ( d i j ∗ − d i j ) 2 d i j ∗ . {\displaystyle E={\frac {1}{\sum \limits _{i Read more →

  • Visual analytics

    Visual analytics

    Visual analytics is a multidisciplinary science and technology field that emerged from information visualization and scientific visualization. It focuses on how analytical reasoning can be facilitated by interactive visual interfaces. == Overview == Visual analytics is "the science of analytical reasoning facilitated by interactive visual interfaces." It can address problems whose size, complexity, and need for closely coupled human and machine analysis may make them otherwise intractable. Visual analytics advances scientific and technological development across multiple domains, including analytical reasoning, human–computer interaction, data transformations, visual representation for computation and analysis, analytic reporting, and the transition of new technologies into practice. As a research agenda, visual analytics brings together several scientific and technical communities from computer science, information visualization, cognitive and perceptual sciences, interactive design, graphic design, and social sciences. Visual analytics integrates new computational and theory-based tools with innovative interactive techniques and visual representations to enable human-information discourse. The design of the tools and techniques is based on cognitive, design, and perceptual principles. This science of analytical reasoning provides the reasoning framework upon which one can build both strategic and tactical visual analytics technologies for threat analysis, prevention, and response. Analytical reasoning is central to the analyst's task of applying human judgments to reach conclusions from a combination of evidence and assumptions. Visual analytics has some overlapping goals and techniques with information visualization and scientific visualization. There is currently no clear consensus on the boundaries between these fields, but broadly speaking the three areas can be distinguished as follows: Scientific visualization deals with data that has a natural geometric structure (e.g., MRI data, wind flows). Information visualization handles abstract data structures such as trees or graphs. Visual analytics is especially concerned with coupling interactive visual representations with underlying analytical processes (e.g., statistical procedures, data mining techniques) such that high-level, complex activities can be effectively performed (e.g., sense making, reasoning, decision making). Visual analytics seeks to marry techniques from information visualization with techniques from computational transformation and analysis of data. Information visualization forms part of the direct interface between user and machine, amplifying human cognitive capabilities in six basic ways: by increasing cognitive resources, such as by using a visual resource to expand human working memory, by reducing search, such as by representing a large amount of data in a small space, by enhancing the recognition of patterns, such as when information is organized in space by its time relationships, by supporting the easy perceptual inference of relationships that are otherwise more difficult to induce, by perceptual monitoring of a large number of potential events, and by providing a manipulable medium that, unlike static diagrams, enables the exploration of a space of parameter values These capabilities of information visualization, combined with computational data analysis, can be applied to analytic reasoning to support the sense-making process. == History == As an interdisciplinary approach, visual analytics has its roots in information visualization, cognitive sciences, and computer science. The term and scope of the field was defined in the early 2000s through researchers such as Jim Thomas, Kristin A. Cook, John Stasko, Pak Chung Wong, Daniel A. Keim and David S. Ebert. As a reaction to the September 11, 2001 attacks the United States Department of Homeland Security was established in late 2002, combining dozens of previously separated government agencies. Building upon earlier work on visual data mining by Daniel A. Keim starting in the late 1990s, this simultaneously lead to the development of a research agenda for visual analytics. As part of these efforts the National Visualization and Analytics Center (NVAC) at Pacific Northwest National Laboratory was established in 2004, whose charter was to develop system to mitigate information overload after the September 11, 2001 attacks in the intelligence community. Their research work determined core challenges, posed open research questions, and positioned visual analytics as a new research domain, in particular through the 2005 research agenda Illuminating the Path. In 2006, the IEEE VIS community led by Pak Chung Wong and Daniel A. Keim launched the annual IEEE Conference on Visual Analytics Science and Technology (VAST), providing a dedicated venue for research into visual analytics, which in 2020 merged to form the IEEE Visualization conference. In 2008, scope and challenges of visual analytics were conceptually defined by Daniel A. Keim and Jim Thomas in their influential book about visual data mining. The domain was further refined as part of the European Commissions FP7 VisMaster program in the late 2000s. == Topics == === Scope === Visual analytics is a multidisciplinary field that includes the following focus areas: Analytical reasoning techniques that enable users to obtain deep insights that directly support assessment, planning, and decision making Data representations and transformations that convert all types of conflicting and dynamic data in ways that support visualization and analysis Techniques to support production, presentation, and dissemination of the results of an analysis to communicate information in the appropriate context to a variety of audiences. Visual representations and interaction techniques that take advantage of the human eye's broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once. === Analytical reasoning techniques === Analytical reasoning techniques are the method by which users obtain deep insights that directly support situation assessment, planning, and decision making. Visual analytics must facilitate high-quality human judgment with a limited investment of the analysts’ time. Visual analytics tools must enable diverse analytical tasks such as: Understanding past and present situations quickly, as well as the trends and events that have produced current conditions Identifying possible alternative futures and their warning signs Monitoring current events for emergence of warning signs as well as unexpected events Determining indicators of the intent of an action or an individual Supporting the decision maker in times of crisis. These tasks will be conducted through a combination of individual and collaborative analysis, often under extreme time pressure. Visual analytics must enable hypothesis-based and scenario-based analytical techniques, providing support for the analyst to reason based on the available evidence. === Data representations === Data representations are structured forms suitable for computer-based transformations. These structures must exist in the original data or be derivable from the data themselves. They must retain the information and knowledge content and the related context within the original data to the greatest degree possible. The structures of underlying data representations are generally neither accessible nor intuitive to the user of the visual analytics tool. They are frequently more complex in nature than the original data and are not necessarily smaller in size than the original data. The structures of the data representations may contain hundreds or thousands of dimensions and be unintelligible to a person, but they must be transformable into lower-dimensional representations for visualization and analysis. === Theories of visualization === Theories of visualization include: Jacques Bertin's Semiology of Graphics (1967) Nelson Goodman's Languages of Art (1977) Jock D. Mackinlay's Automated design of optimal visualization (APT) (1986) Leland Wilkinson's Grammar of Graphics (1998) Hadley Wickham's Layered Grammar of Graphics (2010) === Visual representations === Visual representations translate data into a visible form that highlights important features, including commonalities and anomalies. These visual representations make it easy for users to perceive salient aspects of their data quickly. Augmenting the cognitive reasoning process with perceptual reasoning through visual representations permits the analytical reasoning process to become faster and more focused. == Process == The input for the data sets used in the visual analytics process are heterogeneous data sources (i.e., the internet, newspapers, books, scientific experiments, expert systems). From these rich sources, the data sets S = S1, ..., Sm are chosen, whereas each Si , i ∈ (1, ..., m) consists of attrib

    Read more →
  • Mean squared prediction error

    Mean squared prediction error

    In statistics the mean squared prediction error (MSPE), also known as mean squared error of the predictions, of a smoothing, curve fitting, or regression procedure is the expected value of the squared prediction errors (PE), the square difference between the fitted values implied by the predictive function g ^ {\displaystyle {\widehat {g}}} and the values of the (unobservable) true value g. It is an inverse measure of the explanatory power of g ^ , {\displaystyle {\widehat {g}},} and can be used in the process of cross-validation of an estimated model. Knowledge of g would be required in order to calculate the MSPE exactly; in practice, MSPE is estimated. == Formulation == If the smoothing or fitting procedure has projection matrix (i.e., hat matrix) L, which maps the observed values vector y {\displaystyle y} to predicted values vector y ^ = L y , {\displaystyle {\hat {y}}=Ly,} then PE and MSPE are formulated as: P E i = g ( x i ) − g ^ ( x i ) , {\displaystyle \operatorname {PE_{i}} =g(x_{i})-{\widehat {g}}(x_{i}),} MSPE = E ⁡ [ PE i 2 ] = ∑ i = 1 n PE i 2 ⁡ / n . {\displaystyle \operatorname {MSPE} =\operatorname {E} \left[\operatorname {PE} _{i}^{2}\right]=\sum _{i=1}^{n}\operatorname {PE} _{i}^{2}/n.} The MSPE can be decomposed into two terms: the squared bias (mean error) of the fitted values and the variance of the fitted values: MSPE = ME 2 + VAR , {\displaystyle \operatorname {MSPE} =\operatorname {ME} ^{2}+\operatorname {VAR} ,} ME = E ⁡ [ g ^ ( x i ) − g ( x i ) ] {\displaystyle \operatorname {ME} =\operatorname {E} \left[{\widehat {g}}(x_{i})-g(x_{i})\right]} VAR = E ⁡ [ ( g ^ ( x i ) − E ⁡ [ g ( x i ) ] ) 2 ] . {\displaystyle \operatorname {VAR} =\operatorname {E} \left[\left({\widehat {g}}(x_{i})-\operatorname {E} \left[{g}(x_{i})\right]\right)^{2}\right].} The quantity SSPE=nMSPE is called sum squared prediction error. The root mean squared prediction error is the square root of MSPE: RMSPE=√MSPE. == Computation of MSPE over out-of-sample data == The mean squared prediction error can be computed exactly in two contexts. First, with a data sample of length n, the data analyst may run the regression over only q of the data points (with q < n), holding back the other n – q data points with the specific purpose of using them to compute the estimated model’s MSPE out of sample (i.e., not using data that were used in the model estimation process). Since the regression process is tailored to the q in-sample points, normally the in-sample MSPE will be smaller than the out-of-sample one computed over the n – q held-back points. If the increase in the MSPE out of sample compared to in sample is relatively slight, that results in the model being viewed favorably. And if two models are to be compared, the one with the lower MSPE over the n – q out-of-sample data points is viewed more favorably, regardless of the models’ relative in-sample performances. The out-of-sample MSPE in this context is exact for the out-of-sample data points that it was computed over, but is merely an estimate of the model’s MSPE for the mostly unobserved population from which the data were drawn. Second, as time goes on more data may become available to the data analyst, and then the MSPE can be computed over these new data. == Estimation of MSPE over the population == When the model has been estimated over all available data with none held back, the MSPE of the model over the entire population of mostly unobserved data can be estimated as follows. For the model y i = g ( x i ) + σ ε i {\displaystyle y_{i}=g(x_{i})+\sigma \varepsilon _{i}} where ε i ∼ N ( 0 , 1 ) {\displaystyle \varepsilon _{i}\sim {\mathcal {N}}(0,1)} , one may write n ⋅ MSPE ⁡ ( L ) = g T ( I − L ) T ( I − L ) g + σ 2 tr ⁡ [ L T L ] . {\displaystyle n\cdot \operatorname {MSPE} (L)=g^{\text{T}}(I-L)^{\text{T}}(I-L)g+\sigma ^{2}\operatorname {tr} \left[L^{\text{T}}L\right].} Using in-sample data values, the first term on the right side is equivalent to ∑ i = 1 n ( E ⁡ [ g ( x i ) − g ^ ( x i ) ] ) 2 = E ⁡ [ ∑ i = 1 n ( y i − g ^ ( x i ) ) 2 ] − σ 2 tr ⁡ [ ( I − L ) T ( I − L ) ] . {\displaystyle \sum _{i=1}^{n}\left(\operatorname {E} \left[g(x_{i})-{\widehat {g}}(x_{i})\right]\right)^{2}=\operatorname {E} \left[\sum _{i=1}^{n}\left(y_{i}-{\widehat {g}}(x_{i})\right)^{2}\right]-\sigma ^{2}\operatorname {tr} \left[\left(I-L\right)^{T}\left(I-L\right)\right].} Thus, n ⋅ MSPE ⁡ ( L ) = E ⁡ [ ∑ i = 1 n ( y i − g ^ ( x i ) ) 2 ] − σ 2 ( n − tr ⁡ [ L ] ) . {\displaystyle n\cdot \operatorname {MSPE} (L)=\operatorname {E} \left[\sum _{i=1}^{n}\left(y_{i}-{\widehat {g}}(x_{i})\right)^{2}\right]-\sigma ^{2}\left(n-\operatorname {tr} \left[L\right]\right).} If σ 2 {\displaystyle \sigma ^{2}} is known or well-estimated by σ ^ 2 {\displaystyle {\widehat {\sigma }}^{2}} , it becomes possible to estimate MSPE by n ⋅ M S P E ^ ⁡ ( L ) = ∑ i = 1 n ( y i − g ^ ( x i ) ) 2 − σ ^ 2 ( n − tr ⁡ [ L ] ) . {\displaystyle n\cdot \operatorname {\widehat {MSPE}} (L)=\sum _{i=1}^{n}\left(y_{i}-{\widehat {g}}(x_{i})\right)^{2}-{\widehat {\sigma }}^{2}\left(n-\operatorname {tr} \left[L\right]\right).} Colin Mallows advocated this method in the construction of his model selection statistic Cp, which is a normalized version of the estimated MSPE: C p = ∑ i = 1 n ( y i − g ^ ( x i ) ) 2 σ ^ 2 − n + 2 p . {\displaystyle C_{p}={\frac {\sum _{i=1}^{n}\left(y_{i}-{\widehat {g}}(x_{i})\right)^{2}}{{\widehat {\sigma }}^{2}}}-n+2p.} where p the number of estimated parameters p and σ ^ 2 {\displaystyle {\widehat {\sigma }}^{2}} is computed from the version of the model that includes all possible regressors. That concludes this proof.

    Read more →
  • Aleph (ILP)

    Aleph (ILP)

    Aleph (A Learning Engine for Proposing Hypotheses) is an inductive logic programming system introduced by Ashwin Srinivasan in 2001. As of 2022 it is still one of the most widely used inductive logic programming systems. It is based on the earlier system Progol. == Learning task == The input to Aleph is background knowledge, specified as a logic program, a language bias in the form of mode declarations, as well as positive and negative examples specified as ground facts. As output it returns a logic program which, together with the background knowledge, entails all of the positive examples and none of the negative examples. == Basic algorithm == Starting with an empty hypothesis, Aleph proceeds as follows: It chooses a positive example to generalise; if none are left, it aborts and outputs the current hypothesis. Then it constructs the bottom clause, that is, the most specific clause that is allowed by the mode declarations and covers the example. It then searches for a generalisation of the bottom clause that scores better on the chosen metric. It then adds the new clause to the hypothesis program and removes all examples that are covered by the new clause. == Search algorithm == Aleph searches for clauses in a top-down manner, using the bottom clause constructed in the preceding step to bound the search from below. It searches the refinement graph in a breadth-first manner, with tunable parameters to bound the maximal clause size and proof depth. It scores each clause using one of 13 different evaluation metrics, as chosen in advance by the user.

    Read more →