AI Generator App Free

AI Generator App Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • List of .NET libraries and frameworks

    List of .NET libraries and frameworks

    This article contains a list of libraries that can be used in .NET languages. These languages require .NET Framework, Mono, or .NET, which provide a basis for software development, platform independence, language interoperability and extensive framework libraries. Standard Libraries (including the Base Class Library) are not included in this article. == Introduction == Apps created with .NET Framework or .NET run in a software environment known as the Common Language Runtime (CLR), an application virtual machine that provides services such as security, memory management, and exception handling. The framework includes a large class library called Framework Class Library (FCL). Thanks to the hosting virtual machine, different languages that are compliant with the .NET Common Language Infrastructure (CLI) can operate on the same kind of data structures. These languages can therefore use the FCL and other .NET libraries that are also written in one of the CLI compliant languages. When the source code of such languages are compiled, the compiler generates platform-independent code in the Common Intermediate Language (CIL, also referred to as bytecode), which is stored in CLI assemblies. When a .NET app runs, the just-in-time compiler (JIT) turns the CIL code into platform-specific machine code. To improve performance, .NET Framework also comes with the Native Image Generator (NGEN), which performs ahead-of-time compilation to machine code. This architecture provides language interoperability. Each language can use code written in other languages. Calls from one language to another are exactly the same as would be within a single programming language. If a library is written in one CLI language, it can be used in other CLI languages. Moreover, apps that consist only of pure .NET assemblies, can be transferred to any platform that contains an implementation of CLI and run on that platform. For example, apps written using .NET can run on Windows, macOS, and various versions of Linux. .NET apps or their libraries, however, may depend on native platform features, e.g. COM. As such, platform independence of .NET apps depends on the ability to transfer necessary native libraries to target platforms. In 2019, the Windows Forms and Windows Presentation Foundation portions of .NET Framework were made open source. === .NET implementations === There are four primary .NET implementations that are actively developed and maintained: .NET Framework: The original .NET implementation that has existed since 2002. While not yet discontinued, Microsoft does not plan on releasing its next major version, 5.0. Mono: A cross-platform implementation of .NET Framework by Ximian, introduced in 2004. It is free and open-source. It is now developed by Xamarin, a subsidiary of Microsoft. Universal Windows Platform (UWP): An implementation of .NET used for building UWP apps. It's designed to unify development for different targeted types of devices, including PCs, tablets, phablets, phones, and the Xbox. .NET: A cross-platform re-implementation of .NET Framework, introduced in 2016 and initially called .NET Core. It is free and open-source. .NET superseded .NET Framework with the release of .NET 5. Each implementation of .NET includes the following components: One or more runtime environments, e.g. Common Language Runtime (CLR) for .NET Framework and CoreCLR for .NET A class library The .NET Standard is a set of common APIs that are implemented in the Base Class Library of any .NET implementation. The class library of each implementation must implement the .NET Standard, but may also implement additional APIs. Traditionally, .NET apps targeted a certain version of a .NET implementation, e.g. .NET Framework 4.6. Starting with the .NET Standard, an app can target a version of the .NET Standard and then it could be used (without recompiling) by any implementation that supports that level of the standard. This enables portability across different .NET implementations. The following table lists the .NET implementations that adhere to the .NET Standard and the version number at which each implementation became compliant with a given version of .NET Standard. For example, according to this table, .NET Core 3.0 was the first version of .NET Core that adhered to .NET Standard 2.1. This means that any version of .NET Core bigger than 3.0 (e.g. .NET Core 3.1) also adheres to .NET Standard 2.1. == Web frameworks == === ASP.NET === First released in 2002, ASP.NET is an open-source server-side web application framework designed for web development to produce dynamic web pages. It is the successor to Microsoft's Active Server Pages (ASP) technology, built on the Common Language Runtime (CLR). === ASP.NET Core === ASP.NET was completely rewritten in 2016 as a modular web framework, together with other frameworks like Entity Framework. The re-written framework uses the new open-source .NET Compiler Platform (also known by its codename "Roslyn") and is cross platform. The programming models ASP.NET MVC, ASP.NET Web API, and ASP.NET Web Pages (a model using only Razor pages) were merged into a unified MVC 6. === Blazor === Blazor is a free and open-source web framework that enables developers to create Single-page Web apps using C# and HTML in ASP.NET Razor pages ("components"). Blazor is part of the ASP.NET Core framework. Blazor Server apps are hosted on a web server, while Blazor WebAssembly apps are downloaded to the client's web browser before running. In addition, a Blazor Hybrid framework is available with server-based and client-based application components. == Numerical libraries == === Open-source numerical libraries === ==== AForge.NET ==== This is a computer vision and artificial intelligence library. It implements a number of genetic, fuzzy logic and machine learning algorithms with several architectures of artificial neural networks with corresponding training algorithms. ==== ALGLIB ==== This is a cross-platform open source numerical analysis and data processing library. It consists of algorithm collections written in different programming languages (C++, C#, FreePascal, Delphi, VBA) and has dual licensing – commercial and GPL. ==== Math.NET Numerics ==== This library aims to provide methods and algorithms for numerical computations in science, engineering and everyday use. Covered topics include special functions, linear algebra, probability models, random numbers, interpolation, integral transforms and more. MIT/X11 license. ==== Meta.Numerics ==== This is a library for advanced scientific computation in the .NET Framework. ==== ML.NET ==== This is a free software machine learning library. The preview release of ML.NET included transforms for feature engineering like n-gram creation, and learners to handle binary classification, multi-class classification, and regression tasks. Additional ML tasks like anomaly detection and recommendation systems have since been added, and other approaches like deep learning will be included in future versions. === Proprietary numerical libraries === ==== ILNumerics.Net ==== This is a high performance, typesafe numerical array set of classes and functions for general math, FFT and linear algebra. The library, developed for .NET/Mono, aims to provide 32- and 64-bit script-like syntax in C#, 2D & 3D plot controls, and efficient memory management. It is released under GPLv3 or commercial license. ==== Measurement Studio ==== This is an integrated suite of UI controls and class libraries for use in developing test and measurement applications. The analysis class libraries provide various digital signal processing, signal filtering, signal generation, peak detection, and other general mathematical functionality. ==== NMath ==== This is a numerical component library for the .NET platform developed by CenterSpace Software. It includes signal processing (FFT) classes, a linear algebra (LAPACK & BLAS) framework, and a statistics package. == 3D graphics == === Open-source 3D graphics === ==== Open Toolkit (OpenTK) ==== This is a low-level C# binding for OpenGL, OpenGL ES and OpenAL. It runs on Windows, Linux, Mac OS X, BSD, Android and iOS. It can be used standalone or integrated into a GUI. ==== Windows Presentation Foundation (WPF) ==== This is a graphical subsystem for rendering user interfaces, developed by Microsoft. It also contains a 3D rendering engine. In addition, interactive 2D content can be overlaid on 3D surfaces natively. It only runs on Windows operating systems. === Proprietary 3D graphics === ==== Unity ==== This is a cross-platform game engine developed by Unity Technologies and used to develop video games for PC, consoles, mobile devices and websites. == Image processing == === AForge.NET === This is a computer vision and artificial intelligence library. It implements a number of image processing algorithms and filters. It is released under the LGPLv3 and partly GPLv3 license. Majority of the library is written in C# and th

    Read more →
  • Extremal Ensemble Learning

    Extremal Ensemble Learning

    Extremal Ensemble Learning (EEL) is a machine learning algorithmic paradigm for graph partitioning. EEL creates an ensemble of partitions and then uses information contained in the ensemble to find new and improved partitions. The ensemble evolves and learns how to form improved partitions through extremal updating procedure. The final solution is found by achieving consensus among its member partitions about what the optimal partition is. == Reduced-Network Extremal Ensemble Learning (RenEEL) == A particular implementation of the EEL paradigm is the Reduced-Network Extremal Ensemble Learning (RenEEL) scheme for partitioning a graph. RenEEL uses consensus across many partitions in an ensemble to create a reduced network that can be efficiently analyzed to find more accurate partitions. These better quality partitions are subsequently used to update the ensemble. An algorithm that utilizes the RenEEL scheme is currently the best algorithm for finding the graph partition with maximum modularity, which is an NP-hard problem.

    Read more →
  • Training, validation, and test data sets

    Training, validation, and test data sets

    In machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and testing sets. The model is initially fit on a training data set, which is a set of examples used to fit the parameters (e.g. weights of connections between neurons in artificial neural networks) of the model. The model (e.g. a naive Bayes classifier) is trained on the training data set using a supervised learning method, for example using optimization methods such as gradient descent or stochastic gradient descent. In practice, the training data set often consists of pairs of an input vector (or scalar) and the corresponding output vector (or scalar), where the answer key is commonly denoted as the target (or label). The current model is run with the training data set and produces a result, which is then compared with the target, for each input vector in the training data set. Based on the result of the comparison and the specific learning algorithm being used, the parameters of the model are adjusted. The model fitting can include both variable selection and parameter estimation. Successively, the fitted model is used to predict the responses for the observations in a second data set called the validation data set. The validation data set provides an unbiased evaluation of a model fit on the training data set while tuning the model's hyperparameters (e.g. the number of hidden units—layers and layer widths—in a neural network). Validation data sets can be used for regularization by early stopping (stopping training when the error on the validation data set increases, as this is a sign of over-fitting to the training data set). This simple procedure is complicated in practice by the fact that the validation data set's error may fluctuate during training, producing multiple local minima. This complication has led to the creation of many ad-hoc rules for deciding when over-fitting has truly begun. Finally, the test data set is a data set used to provide an unbiased evaluation of a model fit on the training data set. When the data in the test data set has never been used (for example in cross-validation), the test data set is called a holdout data set. The term "validation set" is sometimes used instead of "test set" in some literature (e.g., if the original data set was partitioned into only two subsets, the test set might be referred to as the validation set). Deciding the sizes and strategies for data set division in training, test and validation sets is very dependent on the problem and data available. == Training data set == A training data set is a data set of examples used during the learning process and is used to fit the parameters (e.g., weights) of, for example, a classifier. For classification tasks, a supervised learning algorithm looks at the training data set to determine, or learn, the optimal combinations of variables that will generate a good predictive model. The goal is to produce a trained (fitted) model that generalizes well to new, unknown data. The fitted model is evaluated using “new” examples from the held-out data sets (validation and test data sets) to estimate the model’s accuracy in classifying new data. To reduce the risk of issues such as over-fitting, the examples in the validation and test data sets should not be used to train the model. Most approaches that search through training data for empirical relationships tend to overfit the data, meaning that they can identify and exploit apparent relationships in the training data that do not hold in general. When a training set is continuously expanded with new data, then this is incremental learning. == Validation data set == A validation data set is a data set of examples used to tune the hyperparameters (i.e. the architecture) of a model. It is sometimes also called the development set or the "dev set". An example of a hyperparameter for artificial neural networks includes the number of hidden units in each layer. It, as well as the testing set (as mentioned below), should follow the same probability distribution as the training data set. In order to avoid overfitting, when any classification parameter needs to be adjusted, it is necessary to have a validation data set in addition to the training and test data sets. For example, if the most suitable classifier for the problem is sought, the training data set is used to train the different candidate classifiers, the validation data set is used to compare their performances and decide which one to take and, finally, the test data set is used to obtain the performance characteristics such as accuracy, sensitivity, specificity, F-measure, and so on. The validation data set functions as a hybrid: it is training data used for testing, but neither as part of the low-level training nor as part of the final testing. The basic process of using a validation data set for model selection (as part of training data set, validation data set, and test data set) is: Since our goal is to find the network having the best performance on new data, the simplest approach to the comparison of different networks is to evaluate the error function using data which is independent of that used for training. Various networks are trained by minimization of an appropriate error function defined with respect to a training data set. The performance of the networks is then compared by evaluating the error function using an independent validation set, and the network having the smallest error with respect to the validation set is selected. This approach is called the hold out method. Since this procedure can itself lead to some overfitting to the validation set, the performance of the selected network should be confirmed by measuring its performance on a third independent set of data called a test set. An application of this process is in early stopping, where the candidate models are successive iterations of the same network, and training stops when the error on the validation set grows, choosing the previous model (the one with minimum error). == Test data set == A test data set is a data set that is independent of the training data set, but that follows the same probability distribution as the training data set. A test set is therefore a set of examples used only to assess the performance (i.e. generalization) of a specified classifier on unseen data. To do this, the model is used to predict classifications of examples in the test set. Those predictions are compared to the examples' true classifications to assess the model's accuracy. If a model fit to the training and validation data set also fits the test data set well, minimal overfitting has taken place (see figure below). A better fitting of the training or validation data sets as opposed to the test data set usually points to overfitting. In the scenario where a data set has a low number of samples, it is usually partitioned into a training set and a validation data set, where the model is trained on the training set and refined using the validation set to improve accuracy, but this approach will lead to overfitting. The holdout method can also be employed, where the test set is used at the end, after training on the training set. Other techniques, such as cross-validation and bootstrapping, are used on small data sets. The bootstrap method generates numerous simulated data sets of the same size by randomly sampling with replacement from the original data, allowing the random data points to serve as test sets for evaluating model performance. Cross-validation splits the data set into multiple folds, with a single sub-fold used as test data; the model is trained on the remaining folds, and all folds are cross-validated (with results averaged and models consolidated) to estimate final model performance. Note that some sources advise against using a single split, as it can lead to overfitting as well as biased model performance estimates. For this reason, data sets are split into three partitions: training, validation and test data sets. The standard machine learning practice is to train on the training set and tune hyperparameters using the validation set, where the validation process selects the model with the lowest validation loss, which is then tested on the test data set (normally held out) to assess the final model. The holdout method for the test set reduces computation by avoiding using the test set after each epoch. The test data set should never be used for validating the training model or fine-tuning hyperparameters, as it provides an accurate and honest evaluation of the model's final performance on unseen dat

    Read more →
  • Random neural network

    Random neural network

    The Random Neural Network (RNN) is a mathematical representation of an interconnected network of neurons or cells which exchange spiking signals. It was invented by Erol Gelenbe and is linked to the G-network model of queueing networks which Erol Gelenbe also invented, and with his Gene Regulatory Network models. In this model, each neuronal cell state is represented by an integer whose value rises when the cell receives an excitatory spike and drops when it receives an inhibitory spike. The spikes can originate outside the network itself, or they can come from other cells in the networks. Cells whose internal excitatory state has a positive value are allowed to send out spikes of either kind to other cells in the network according to specific cell-dependent spiking rates. The model has a mathematical solution in steady-state which provides the joint probability distribution of the network in terms of the individual probabilities that each cell is excited and able to send out spikes. Computing this solution is based on solving a set of non-linear algebraic equations whose parameters are related to the spiking rates of individual cells and their connectivity to other cells, as well as the arrival rates of spikes from outside the network. The RNN is a recurrent model, i.e. a neural network that is allowed to have complex feedback loops. A highly energy-efficient implementation of random neural networks was demonstrated by Krishna Palem et al. using the Probabilistic CMOS or PCMOS technology and was shown to be c. 226–300 times more efficient in terms of Energy-Performance-Product. RNNs are also related to artificial neural networks, which (like the random neural network) have gradient-based learning algorithms. The learning algorithm for an n-node random neural network that includes feedback loops (it is also a recurrent neural network) is of computational complexity O(n^3) (the number of computations is proportional to the cube of n, the number of neurons). The random neural network can also be used with other learning algorithms such as reinforcement learning. The RNN has been shown to be a universal approximator for bounded and continuous functions.

    Read more →
  • Embodied cognitive science

    Embodied cognitive science

    Embodied cognitive science is an interdisciplinary field of research, the aim of which is to explain the mechanisms underlying intelligent behavior. It comprises three main methodologies: the modeling of psychological and biological systems in a holistic manner that considers the mind and body as a single entity; the formation of a common set of general principles of intelligent behavior; and the experimental use of robotic agents in controlled environments. == Contributors == Embodied cognitive science borrows heavily from embodied philosophy and the related research fields of cognitive science, psychology, neuroscience and artificial intelligence. Contributors to the field include: From the perspective of neuroscience, Gerald Edelman of the Neurosciences Institute at La Jolla, Francisco Varela of CNRS in France, and J. A. Scott Kelso of Florida Atlantic University From the perspective of psychology, Lawrence Barsalou, Michael Turvey, Vittorio Guidano and Eleanor Rosch From the perspective of linguistics, Gilles Fauconnier, George Lakoff, Mark Johnson, Leonard Talmy and Mark Turner From the perspective of language acquisition, Eric Lenneberg and Philip Rubin at Haskins Laboratories From the perspective of anthropology, Edwin Hutchins, Bradd Shore, James Wertsch and Merlin Donald. From the perspective of autonomous agent design, early work is sometimes attributed to Rodney Brooks or Valentino Braitenberg From the perspective of artificial intelligence, Understanding Intelligence by Rolf Pfeifer and Christian Scheier or How the Body Shapes the Way We Think, by Rolf Pfeifer and Josh C. Bongard From the perspective of philosophy, Andy Clark, Dan Zahavi, Shaun Gallagher, and Evan Thompson In 1950, Alan Turing proposed that a machine may need a human-like body to think and speak: It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English. That process could follow the normal teaching of a child. Things would be pointed out and named, etc. Again, I do not know what the right answer is, but I think both approaches should be tried. == Traditional cognitive theory == Embodied cognitive science is an alternative theory to cognition in which it minimizes appeals to computational theory of mind in favor of greater emphasis on how an organism's body determines how and what it thinks. Traditional cognitive theory is based mainly around symbol manipulation, in which certain inputs are fed into a processing unit that produces an output. These inputs follow certain rules of syntax, from which the processing unit finds semantic meaning. Thus, an appropriate output is produced. For example, a human's sensory organs are its input devices, and the stimuli obtained from the external environment are fed into the nervous system which serves as the processing unit. From here, the nervous system is able to read the sensory information because it follows a syntactic structure, thus an output is created. This output then creates bodily motions and brings forth behavior and cognition. Of particular note is that cognition is sealed away in the brain, meaning that mental cognition is cut off from the external world and is only possible by the input of sensory information. == The embodied cognitive approach == Embodied cognitive science differs from the traditionalist approach in that it denies the input-output system. This is chiefly due to the problems presented by the Homunculus argument, which concluded that semantic meaning could not be derived from symbols without some kind of inner interpretation. If some little man in a person's head interpreted incoming symbols, then who would interpret the little man's inputs? Because of the specter of an infinite regress, the traditionalist model began to seem less plausible. Thus, embodied cognitive science aims to avoid this problem by defining cognition in three ways. === Physical attributes of the body === The first aspect of embodied cognition examines the role of the physical body, particularly how its properties affect its ability to think. This part attempts to overcome the symbol manipulation component that is a feature of the traditionalist model. Depth perception, for instance, can be better explained under the embodied approach due to the sheer complexity of the action. Depth perception requires that the brain detect the disparate retinal images obtained by the distance of the two eyes. In addition, body and head cues complicate this further. When the head is turned in a given direction, objects in the foreground will appear to move against objects in the background. From this, it is said that some kind of visual processing is occurring without the need of any kind of symbol manipulation. This is because the objects appearing to move the foreground are simply appearing to move. This observation concludes then that depth can be perceived with no intermediate symbol manipulation necessary. A more poignant example exists through examining auditory perception. Generally speaking the greater the distance between the ears, the greater the possible auditory acuity. Also relevant is the amount of density in between the ears, for the strength of the frequency wave alters as it passes through a given medium. The brain's auditory system takes these factors into account as it process information, but again without any need for a symbolic manipulation system. This is because the distance between the ears for example does not need symbols to represent it. The distance itself creates the necessary opportunity for greater auditory acuity. The amount of density between the ears is similar, in that it is the actual amount itself that simply forms the opportunity for frequency alteration. Thus under consideration of the physical properties of the body, a symbolic system is unnecessary and an unhelpful metaphor. === The body's role in the cognitive process === The second aspect draws heavily from George Lakoff's and Mark Johnson's work on concepts. They argued that humans use metaphors whenever possible to better explain their external world. Humans also have a basic stock of concepts in which other concepts can be derived from. These basic concepts include spatial orientations such as up, down, front, and back. Humans can understand what these concepts mean because they can directly experience them from their own bodies. For example, because human movement revolves around standing erect and moving the body in an up-down motion, humans innately have these concepts of up and down. Lakoff and Johnson contend this is similar with other spatial orientations such as front and back too. As mentioned earlier, these basic stocks of spatial concepts are the basis in which other concepts are constructed. Happy and sad for instance are seen now as being up or down respectively. When someone says they are feeling down, what they are really saying is that they feel sad for example. Thus the point here is that true understanding of these concepts is contingent on whether one can have an understanding of the human body. So the argument goes that if one lacked a human body, they could not possibly know what up or down could mean, or how it could relate to emotional states. [I]magine a spherical being living outside of any gravitational field, with no knowledge or imagination of any other kind of experience. What could UP possibly mean to such a being? While this does not mean that such beings would be incapable of expressing emotions in other words, it does mean that they would express emotions differently from humans. Human concepts of happiness and sadness would be different because human would have different bodies. So then an organism's body directly affects how it can think, because it uses metaphors related to its body as the basis of concepts. === Interaction of local environment === A third component of the embodied approach looks at how agents use their immediate environment in cognitive processing. Meaning, the local environment is seen as an actual extension of the body's cognitive process. The example of a personal digital assistant (PDA) is used to better imagine this. Echoing functionalism (philosophy of mind), this point claims that mental states are individuated by their role in a much larger system. So under this premise, the information on a PDA is similar to the information stored in the brain. So then if one thinks information in the brain constitutes mental states, then it must follow that information in the PDA is a cognitive state too. Consider also the role of pen and paper in a complex multiplication problem. The pen and paper are so involved in the cognitive process of solving the problem that it seems ridiculous to say they are somehow different from the process, in very much the same way the PDA is used for information like the brain. Another example examines how humans control and manipulate their environment

    Read more →
  • Determining the number of clusters in a data set

    Determining the number of clusters in a data set

    Determining the number of clusters in a data set, a quantity often labelled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem. For a certain class of clustering algorithms (in particular k-means, k-medoids and expectation–maximization algorithm), there is a parameter commonly referred to as k that specifies the number of clusters to detect. Other algorithms such as DBSCAN and OPTICS algorithm do not require the specification of this parameter; hierarchical clustering avoids the problem altogether. The correct choice of k is often ambiguous, with interpretations depending on the shape and scale of the distribution of points in a data set and the desired clustering resolution of the user. In addition, increasing k without penalty will always reduce the amount of error in the resulting clustering, to the extreme case of zero error if each data point is considered its own cluster (i.e., when k equals the number of data points, n). Intuitively then, the optimal choice of k will strike a balance between maximum compression of the data using a single cluster, and maximum accuracy by assigning each data point to its own cluster. If an appropriate value of k is not apparent from prior knowledge of the properties of the data set, it must be chosen somehow. There are several categories of methods for making this decision. == Elbow method == The elbow method looks at the percentage of explained variance as a function of the number of clusters: One should choose a number of clusters so that adding another cluster does not give much better modeling of the data. More precisely, if one plots the percentage of variance explained by the clusters against the number of clusters, the first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph. The number of clusters is chosen at this point, hence the "elbow criterion". In most datasets, this "elbow" is ambiguous, making this method subjective and unreliable. Because the scale of the axes is arbitrary, the concept of an angle is not well-defined, and even on uniform random data, the curve produces an "elbow", making the method rather unreliable. Percentage of variance explained is the ratio of the between-group variance to the total variance, also known as an F-test. A slight variation of this method plots the curvature of the within group variance. The method can be traced to speculation by Robert L. Thorndike in 1953. While the idea of the elbow method sounds simple and straightforward, other methods (as detailed below) give better results. == X-means clustering == In statistics and data mining, X-means clustering is a variation of k-means clustering that refines cluster assignments by repeatedly attempting subdivision, and keeping the best resulting splits, until a criterion such as the Akaike information criterion (AIC) or Bayesian information criterion (BIC) is reached. == Information criterion approach == Another set of methods for determining the number of clusters are information criteria, such as the Akaike information criterion (AIC), Bayesian information criterion (BIC), or the deviance information criterion (DIC) — if it is possible to make a likelihood function for the clustering model. For example: The k-means model is "almost" a Gaussian mixture model and one can construct a likelihood for the Gaussian mixture model and thus also determine information criterion values. == Information–theoretic approach == Rate distortion theory has been applied to choosing k called the "jump" method, which determines the number of clusters that maximizes efficiency while minimizing error by information-theoretic standards. The strategy of the algorithm is to generate a distortion curve for the input data by running a standard clustering algorithm such as k-means for all values of k between 1 and n, and computing the distortion (described below) of the resulting clustering. The distortion curve is then transformed by a negative power chosen based on the dimensionality of the data. Jumps in the resulting values then signify reasonable choices for k, with the largest jump representing the best choice. The distortion of a clustering of some input data is formally defined as follows: Let the data set be modeled as a p-dimensional random variable, X, consisting of a mixture distribution of G components with common covariance, Γ. If we let c 1 … c K {\displaystyle c_{1}\ldots c_{K}} be a set of K cluster centers, with c X {\displaystyle c_{X}} the closest center to a given sample of X, then the minimum average distortion per dimension when fitting the K centers to the data is: d K = 1 p min c 1 … c K E [ ( X − c X ) T Γ − 1 ( X − c X ) ] {\displaystyle d_{K}={\frac {1}{p}}\min _{c_{1}\ldots c_{K}}{E[(X-c_{X})^{T}\Gamma ^{-1}(X-c_{X})]}} This is also the average Mahalanobis distance per dimension between X and the closest cluster center c X {\displaystyle c_{X}} . Because the minimization over all possible sets of cluster centers is prohibitively complex, the distortion is computed in practice by generating a set of cluster centers using a standard clustering algorithm and computing the distortion using the result. The pseudo-code for the jump method with an input set of p-dimensional data points X is: JumpMethod(X): Let Y = (p/2) Init a list D, of size n+1 Let D[0] = 0 For k = 1 ... n: Cluster X with k clusters (e.g., with k-means) Let d = Distortion of the resulting clustering D[k] = d^(-Y) Define J(i) = D[i] - D[i-1] Return the k between 1 and n that maximizes J(k) The choice of the transform power Y = ( p / 2 ) {\displaystyle Y=(p/2)} is motivated by asymptotic reasoning using results from rate distortion theory. Let the data X have a single, arbitrarily p-dimensional Gaussian distribution, and let fixed K = ⌊ α p ⌋ {\displaystyle K=\lfloor \alpha ^{p}\rfloor } , for some α greater than zero. Then the distortion of a clustering of K clusters in the limit as p goes to infinity is α − 2 {\displaystyle \alpha ^{-2}} . It can be seen that asymptotically, the distortion of a clustering to the power ( − p / 2 ) {\displaystyle (-p/2)} is proportional to α p {\displaystyle \alpha ^{p}} , which by definition is approximately the number of clusters K. In other words, for a single Gaussian distribution, increasing K beyond the true number of clusters, which should be one, causes a linear growth in distortion. This behavior is important in the general case of a mixture of multiple distribution components. Let X be a mixture of G p-dimensional Gaussian distributions with common covariance. Then for any fixed K less than G, the distortion of a clustering as p goes to infinity is infinite. Intuitively, this means that a clustering of less than the correct number of clusters is unable to describe asymptotically high-dimensional data, causing the distortion to increase without limit. If, as described above, K is made an increasing function of p, namely, K = ⌊ α p ⌋ {\displaystyle K=\lfloor \alpha ^{p}\rfloor } , the same result as above is achieved, with the value of the distortion in the limit as p goes to infinity being equal to α − 2 {\displaystyle \alpha ^{-2}} . Correspondingly, there is the same proportional relationship between the transformed distortion and the number of clusters, K. Putting the results above together, it can be seen that for sufficiently high values of p, the transformed distortion d K − p / 2 {\displaystyle d_{K}^{-p/2}} is approximately zero for K < G, then jumps suddenly and begins increasing linearly for K ≥ G. The jump algorithm for choosing K makes use of these behaviors to identify the most likely value for the true number of clusters. Although the mathematical support for the method is given in terms of asymptotic results, the algorithm has been empirically verified to work well in a variety of data sets with reasonable dimensionality. In addition to the localized jump method described above, there exists a second algorithm for choosing K using the same transformed distortion values known as the broken line method. The broken line method identifies the jump point in the graph of the transformed distortion by doing a simple least squares error line fit of two line segments, which in theory will fall along the x-axis for K < G, and along the linearly increasing phase of the transformed distortion plot for K ≥ G. The broken line method is more robust than the jump method in that its decision is global rather than local, but it also relies on the assumption of Gaussian mixture components, whereas the jump method is fully non-parametric and has been shown to be viable for general mixture distributions. == Silhouette method == The average silhouette of the data is another useful criterion for assessing the natural number of clusters. The silhouette of a data instance is a measure of how closely it is match

    Read more →
  • Absorbing Markov chain

    Absorbing Markov chain

    In the mathematical theory of probability, an absorbing Markov chain is a Markov chain in which every state can reach an absorbing state. An absorbing state is a state that, once entered, cannot be left. Like general Markov chains, there can be continuous-time absorbing Markov chains with an infinite state space. However, this article concentrates on the discrete-time discrete-state-space case. == Formal definition == A Markov chain is an absorbing chain if there is at least one absorbing state and it is possible to go from any state to at least one absorbing state in a finite number of steps. In an absorbing Markov chain, a state that is not absorbing is called transient. === Canonical form === Let an absorbing Markov chain with transition matrix P have t transient states and r absorbing states. The rows of P represent sources, while columns represent destinations. By ordering the transient states before the absorbing states, it can be assumed that P has the form P = [ Q R 0 I r ] , {\displaystyle P={\begin{bmatrix}Q&R\\\mathbf {0} &I_{r}\end{bmatrix}},} where Q is a t-by-t matrix, R is a nonzero t-by-r matrix, 0 is an r-by-t zero matrix, and Ir is the r-by-r identity matrix. Thus, Q describes the probability of transitioning from some transient state to another while R describes the probability of transitioning from some transient state to some absorbing state. The probability of transitioning from i to j in exactly k steps is the (i,j)-entry of Pk, further computed below. When considering only transient states, the probability is found in the upper left of Pk, the (i,j)-entry of Qk. == Fundamental matrix == === Expected number of visits to a transient state === A basic property about an absorbing Markov chain is the expected number of visits to a transient state j starting from a transient state i (before being absorbed). This can be established to be given by the (i, j) entry of so-called fundamental matrix N, obtained by summing Qk for all k (from 0 to ∞). It can be proven that N := ∑ k = 0 ∞ Q k = ( I t − Q ) − 1 , {\displaystyle N:=\sum _{k=0}^{\infty }Q^{k}=(I_{t}-Q)^{-1},} where It is the t-by-t identity matrix. The computation of this formula is the matrix equivalent of the geometric series of scalars, ∑ k = 0 ∞ q k = 1 1 − q {\displaystyle {\textstyle \sum }_{k=0}^{\infty }q^{k}={\tfrac {1}{1-q}}} . With the matrix N in hand, also other properties of the Markov chain are easy to obtain. === Expected number of steps before being absorbed === The expected number of steps before being absorbed in any absorbing state, when starting in transient state i can be computed via a sum over transient states. The value is given by the ith entry of the vector t := N 1 , {\displaystyle \mathbf {t} :=N\mathbf {1} ,} where 1 is a length-t column vector whose entries are all 1. === Absorbing probabilities === By induction, P k = [ Q k ( I t − Q k ) N R 0 I r ] . {\displaystyle P^{k}={\begin{bmatrix}Q^{k}&(I_{t}-Q^{k})NR\\\mathbf {0} &I_{r}\end{bmatrix}}.} The probability of eventually being absorbed in the absorbing state j when starting from transient state i is given by the (i,j)-entry of the matrix B := N R {\displaystyle B:=NR} . The number of columns of this matrix equals the number of absorbing states r. An approximation of those probabilities can also be obtained directly from the (i,j)-entry of P k {\displaystyle P^{k}} for a large enough value of k, when i is the index of a transient, and j the index of an absorbing state. This is because ( lim k → ∞ P k ) i , t + j = B i , j {\displaystyle \left(\lim _{k\to \infty }P^{k}\right)_{i,t+j}=B_{i,j}} . === Transient visiting probabilities === The probability of visiting transient state j when starting at a transient state i is the (i,j)-entry of the matrix H := ( N − I t ) ( N dg ) − 1 , {\displaystyle H:=(N-I_{t})(N_{\operatorname {dg} })^{-1},} where Ndg is the diagonal matrix with the same diagonal as N. === Variance on number of transient visits === The variance on the number of visits to a transient state j with starting at a transient state i (before being absorbed) is the (i,j)-entry of the matrix N 2 := N ( 2 N dg − I t ) − N sq , {\displaystyle N_{2}:=N(2N_{\operatorname {dg} }-I_{t})-N_{\operatorname {sq} },} where Nsq is the Hadamard product of N with itself (i.e. each entry of N is squared). === Variance on number of steps === The variance on the number of steps before being absorbed when starting in transient state i is the ith entry of the vector ( 2 N − I t ) t − t sq , {\displaystyle (2N-I_{t})\mathbf {t} -\mathbf {t} _{\operatorname {sq} },} where tsq is the Hadamard product of t with itself (i.e., as with Nsq, each entry of t is squared). == Examples == === String generation === Consider the process of repeatedly flipping a fair coin until the sequence (heads, tails, heads) appears. This process is modeled by an absorbing Markov chain with transition matrix P = [ 1 / 2 1 / 2 0 0 0 1 / 2 1 / 2 0 1 / 2 0 0 1 / 2 0 0 0 1 ] . {\displaystyle P={\begin{bmatrix}1/2&1/2&0&0\\0&1/2&1/2&0\\1/2&0&0&1/2\\0&0&0&1\end{bmatrix}}.} The first state represents the empty string, the second state the string "H", the third state the string "HT", and the fourth state the string "HTH". Although in reality, the coin flips cease after the string "HTH" is generated, the perspective of the absorbing Markov chain is that the process has transitioned into the absorbing state representing the string "HTH" and, therefore, cannot leave. For this absorbing Markov chain, the fundamental matrix is N = ( I − Q ) − 1 = ( [ 1 0 0 0 1 0 0 0 1 ] − [ 1 / 2 1 / 2 0 0 1 / 2 1 / 2 1 / 2 0 0 ] ) − 1 = [ 1 / 2 − 1 / 2 0 0 1 / 2 − 1 / 2 − 1 / 2 0 1 ] − 1 = [ 4 4 2 2 4 2 2 2 2 ] . {\displaystyle {\begin{aligned}N&=(I-Q)^{-1}=\left({\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}}-{\begin{bmatrix}1/2&1/2&0\\0&1/2&1/2\\1/2&0&0\end{bmatrix}}\right)^{-1}\\[4pt]&={\begin{bmatrix}1/2&-1/2&0\\0&1/2&-1/2\\-1/2&0&1\end{bmatrix}}^{-1}={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}.\end{aligned}}} The expected number of steps starting from each of the transient states is t = N 1 = [ 4 4 2 2 4 2 2 2 2 ] [ 1 1 1 ] = [ 10 8 6 ] . {\displaystyle \mathbf {t} =N\mathbf {1} ={\begin{bmatrix}4&4&2\\2&4&2\\2&2&2\end{bmatrix}}{\begin{bmatrix}1\\1\\1\end{bmatrix}}={\begin{bmatrix}10\\8\\6\end{bmatrix}}.} Therefore, the expected number of coin flips before observing the sequence (heads, tails, heads) is 10, the entry for the state representing the empty string. === Games of chance === Games based entirely on chance can be modeled by an absorbing Markov chain. A classic example of this is the ancient Indian board game Snakes and Ladders. The graph on the left plots the probability mass in the lone absorbing state that represents the final square as the transition matrix is raised to larger and larger powers. To determine the expected number of turns to complete the game, compute the vector t as described above and examine tstart, which is approximately 39.2. === Infectious disease testing === Infectious disease testing, either of blood products or in medical clinics, is often taught as an example of an absorbing Markov chain. The public U.S. Centers for Disease Control and Prevention (CDC) model for HIV and for hepatitis B, for example, illustrates the property that absorbing Markov chains can lead to the detection of disease, versus the loss of detection through other means. In the standard CDC model, the Markov chain has five states, a state in which the individual is uninfected, then a state with infected but undetectable virus, a state with detectable virus, and absorbing states of having quit/been lost from the clinic, or of having been detected (the goal). The typical rates of transition between the Markov states are the probability p per unit time of being infected with the virus, w for the rate of window period removal (time until virus is detectable), q for quit/loss rate from the system, and d for detection, assuming a typical rate λ {\displaystyle \lambda } at which the health system administers tests of the blood product or patients in question. It follows that we can "walk along" the Markov model to identify the overall probability of detection for a person starting as undetected, by multiplying the probabilities of transition to each next state of the model as: p ( p + q ) w ( w + q ) d ( d + q ) {\displaystyle {\frac {p}{(p+q)}}{\frac {w}{(w+q)}}{\frac {d}{(d+q)}}} . The subsequent total absolute number of false negative tests—the primary CDC concern—would then be the rate of tests, multiplied by the probability of reaching the infected but undetectable state, times the duration of staying in the infected undetectable state: p ( p + q ) 1 ( w + q ) λ {\displaystyle {\frac {p}{(p+q)}}{\frac {1}{(w+q)}}\lambda } .

    Read more →
  • Calibration (statistics)

    Calibration (statistics)

    There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. Calibration can mean a reverse process to regression, where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable; procedures in statistical classification to determine class membership probabilities which assess the uncertainty of a given new observation belonging to each of the already established classes. In addition, calibration is used in statistics with the usual general meaning of calibration. For example, model calibration can be also used to refer to Bayesian inference about the value of a model's parameters, given some data set, or more generally to any type of fitting of a statistical model. As Philip Dawid puts it, "a forecaster is well calibrated if, for example, of those events to which he assigns a probability 30 percent, the long-run proportion that actually occurs turns out to be 30 percent." == In classification == Calibration in classification means transforming classifier scores into class membership probabilities. An overview of calibration methods for two-class and multi-class classification tasks is given by Gebel (2009). A classifier might separate the classes well, but be poorly calibrated, meaning that the estimated class probabilities are far from the true class probabilities. In this case, a calibration step may help improve the estimated probabilities. A variety of metrics exist that are aimed to measure the extent to which a classifier produces well-calibrated probabilities. Foundational work includes the Expected Calibration Error (ECE). Into the 2020s, variants include the Adaptive Calibration Error (ACE) and the Test-based Calibration Error (TCE), which address limitations of the ECE metric that may arise when classifier scores concentrate on narrow subset of the [0,1] range. A 2020s advancement in calibration assessment is the introduction of the Estimated Calibration Index (ECI). The ECI extends the concepts of the Expected Calibration Error (ECE) to provide a more nuanced measure of a model's calibration, particularly addressing overconfidence and underconfidence tendencies. Originally formulated for binary settings, the ECI has been adapted for multiclass settings, offering both local and global insights into model calibration. This framework aims to overcome some of the theoretical and interpretative limitations of existing calibration metrics. Through a series of experiments, Famiglini et al. demonstrate the framework's effectiveness in delivering a more accurate understanding of model calibration levels and discuss strategies for mitigating biases in calibration assessment. An online tool has been proposed to compute both ECE and ECI. The following univariate calibration methods exist for transforming classifier scores into class membership probabilities in the two-class case: Assignment value approach, see Garczarek (2002) Bayes approach, see Bennett (2002) Isotonic regression, see Zadrozny and Elkan (2002) Platt scaling (a form of logistic regression), see Lewis and Gale (1994) and Platt (1999) Bayesian Binning into Quantiles (BBQ) calibration, see Naeini, Cooper, Hauskrecht (2015) Beta calibration, see Kull, Filho, Flach (2017) === In probability prediction and forecasting === In prediction and forecasting, a Brier score is sometimes used to assess prediction accuracy of a set of predictions, specifically that the magnitude of the assigned probabilities track the relative frequency of the observed outcomes. Philip E. Tetlock employs the term "calibration" in this sense in his 2015 book Superforecasting. This differs from accuracy and precision. For example, as expressed by Daniel Kahneman, "if you give all events that happen a probability of .6 and all the events that don't happen a probability of .4, your discrimination is perfect but your calibration is miserable". In meteorology, in particular, as concerns weather forecasting, a related mode of assessment is known as forecast skill. == In regression == The calibration problem in regression is the use of known data on the observed relationship between a dependent variable and an independent variable to make estimates of other values of the independent variable from new observations of the dependent variable. This can be known as "inverse regression"; there is also sliced inverse regression. The following multivariate calibration methods exist for transforming classifier scores into class membership probabilities in the case with classes count greater than two: Reduction to binary tasks and subsequent pairwise coupling, see Hastie and Tibshirani (1998) Dirichlet calibration, see Gebel (2009) === Example === One example is that of dating objects, using observable evidence such as tree rings for dendrochronology or carbon-14 for radiometric dating. The observation is caused by the age of the object being dated, rather than the reverse, and the aim is to use the method for estimating dates based on new observations. The problem is whether the model used for relating known ages with observations should aim to minimise the error in the observation, or minimise the error in the date. The two approaches will produce different results, and the difference will increase if the model is then used for extrapolation at some distance from the known results.

    Read more →
  • Colloquis

    Colloquis

    Colloquis, previously known as ActiveBuddy and Conversagent, was a company that created conversation-based interactive agents originally distributed via instant messaging platforms. The company had offices in New York, New York, and Sunnyvale, California. == History == Founded in 2000, the company was the brainchild of Robert Hoffer, Timothy Kay, and Peter Levitan. The idea for interactive agents (also known as Internet bots) came from the team's vision to add functionality to increasingly popular instant messaging services. The original implementation took shape as a word-based adventure game but quickly grew to include a wide range of database applications, including access to news, weather, stock information, movie times, Yellow Pages listings, and detailed sports data, as well as a variety of tools (calculators, translator, etc.). These various applications were bundled into one entity and launched as SmarterChild in 2001. SmarterChild acted as a showcase for the quick data access and possibilities for fun conversation that the company planned to turn into customized, niche-specific products. The rapid success of SmarterChild led to targeted promotional products for Radiohead, Austin Powers, The Sporting News, and others. ActiveBuddy sought to strengthen its hold on the interactive agent market for the future by filing for, and receiving, a controversial patent on their creation in 2002. The company also released the BuddyScript SDK, a free developer kit that allow programmers to design and launch their own interactive agents using ActiveBuddy's proprietary scripting language, in 2002. Ultimately, however, the decline in ad spending in 2001 and 2002 led to a shift in corporate strategy towards business focused Automated Service Agents, building products for clients including Cingular, Comcast and Cox Communications. The company subsequently changed its name from ActiveBuddy to Conversagent in 2003, and then again to Colloquis in 2006. Colloquis was purchased by Microsoft in October 2006.

    Read more →
  • Homogeneity blockmodeling

    Homogeneity blockmodeling

    In mathematics applied to analysis of social structures, homogeneity blockmodeling is an approach in blockmodeling, which is best suited for a preliminary or main approach to valued networks, when a prior knowledge about these networks is not available. This is because homogeneity blockmodeling emphasizes the similarity of link (tie) strengths within the blocks over the pattern of links. In this approach, tie (link) values (or statistical data computed on them) are assumed to be equal (homogenous) within blocks. This approach to the generalized blockmodeling of valued networks was first proposed by Aleš Žiberna in 2007 with the basic idea, "that the inconsistency of an empirical block with its ideal block can be measured by within block variability of appropriate values". The newly–formed ideal blocks, which are appropriate for blockmodeling of valued networks, are then presented together with the definitions of their block inconsistencies. Similar approach to the homogeneity blockmodeling, dealing with direct approach for structural equivalence, was previously suggested by Stephen P. Borgatti and Martin G. Everett (1992).

    Read more →
  • KNIME

    KNIME

    KNIME ( ), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of Java Database Connectivity (JDBC) allows assembly of nodes blending different data sources, including preprocessing (extract, transform, load, or ETL), for modeling, data analysis and visualization with minimal, or no, programming. It is free and open-source software released under a GNU General Public License. Since 2006, KNIME has been used in pharmaceutical research, and in other areas including customer relationship management (CRM) and data analysis, business intelligence, text mining and financial data analysis. Recently, attempts were made to use KNIME as robotic process automation (RPA) tool. KNIME's headquarters are based in Zurich, with other offices in Konstanz, Berlin, and Austin (USA). == History == Development of KNIME began in January 2004, with a team of software engineers at the University of Konstanz, as an open-source platform. The original team, headed by Michael Berthold, came from a Silicon Valley pharmaceutical industry software company. The initial goal was to create a modular, highly scalable and open data processing platform that allows easy integration of different data loading, processing, transforming, analyzing, and visual exploring modules, without focus on any one application area. The platform was intended for collaborating, research, and for integrating various other data analysis projects. In 2006, the first version of KNIME was released. Several pharmaceutical companies began using KNIME, and several life science software vendors began integrating their tools into the platform. Later that year, after an article in the German magazine c't, users from a number of other areas joined ship. As of 2012, KNIME is in use by over 15,000 actual users (i.e. not counting downloads, but users regularly retrieving updates) in the life sciences and at banks, publishers, car manufacturer, telcos, consulting firms, and various other industries, and a large number of research groups, worldwide. Latest updates to KNIME Server and KNIME Big Data Extensions, provide support for Apache Spark 2.3, Parquet and HDFS-type storage. For the sixth year in a row, KNIME has been placed as a leader for data science and machine learning platforms in Gartner's Magic Quadrant. == Design philosophy, features == These are the design principles and features that KNIME software follows: Visual, Interactive Framework: KNIME Software prioritizes a user-friendly and intuitive approach to data analysis. This is achieved through a visual and interactive framework where data flows can be combined using a drag-and-drop interface. Users can develop customized and interactive applications by creating simple to advanced and highly-automated data pipelines. These may include, for example, access to databases, machine learning libraries, logic for workflow control (e.g., loops, switches, etc.), abstraction (e.g., interactive widgets), invocation, dynamic data apps, integrated deployment, or error handling. Modularity: processing units and data containers should remain independent of each other. This design choice enables easy distribution of computation and allows for the independent development of different algorithms. Data types within KNIME are encapsulated, meaning no types are predefined. This design choice facilitates adding new data types, and integrating them with extant types, while including type-specific renderers and comparators. This principle also enables inspecting results at the end of each single data operation. Extensibility: KNIME Software is designed to be extensible. Adding new processing nodes or views is made simple through a plug-in mechanism. This mechanism ensures that users can distribute their custom functionalities without the need for complicated install or uninstall procedures. Interleaving No-Code with Code: the platform supports integrating both visual programming (no-code) and script-based programming (e.g., Python, R, JavaScript) approaches to data analysis. This design principle is termed low-code. Automation and Scalability: for example, the use of parameterization via flow variables, or the encapsulation of workflow segments in components contribute to reduce manual work and errors in analyses. Further, the scheduling of workflow execution (available in KNIME Business Hub and KNIME Community Hub for Teams) reduces dependency on human resources. In terms of scalability, a few examples include the ability to handle large datasets (millions of rows), execute multiple processes simultaneously out of the box and reuse workflow segments. Full Usability: due to the open source nature, KNIME Analytics Platform provides free full usability with no limited trial periods. == Internals == KNIME allows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and later inspect the results, models, using interactive widgets and views. KNIME is written in Java and based on Eclipse. It makes use of an extension mechanism to add plug-ins providing added functions. The core version includes hundreds of modules for data integration (file input/output (I/O), database nodes supporting all common database management systems through JDBC or native connectors: SQLite, MS-Access, SQL Server, MySQL, Oracle, PostgreSQL, Vertica and H2), data transformation (filter, converter, splitter, combiner, joiner), and the commonly used methods of statistics, data mining, analysis and text analytics. Visualization is supported with the Report Designer extension. KNIME workflows can be used as data sets to create report templates that can be exported to document formats such as doc, ppt, xls, pdf and others. Other KNIME abilities are: KNIMEs core-architecture allows processing of large data volumes that are only limited by the available hard disk space (not limited to the available RAM). E.g., KNIME allows analyzing 300 million customer addresses, 20 million cell images, and 10 million molecular structures. Added plug-ins allow integrating methods for text mining, image mining, time series analysis, and networking. KNIME integrates various other open-source projects, e.g., machine learning algorithms from Weka, H2O, Keras, Spark, the R project and LIBSVM; plotly, JFreeChart, ImageJ, and the Chemistry Development Kit. KNIME is implemented in Java, allows for wrappers calling other code, in addition to providing nodes that allow it to run Java, Python, R, Ruby and other code fragments. Since 2021, KNIME's Python Integration utilizes Anaconda for Python distribution and environment management. == License == In 2024, KNIME version 5.3 is released under the same GPLv3 license as previous versions. As of version 2.1, KNIME is released under the GPLv3 license, with an exception that allow commercial software vendors to use the well-defined node application programming interface (API) to add proprietary extensions, or wrappers calling their tools from KNIME. == Courses == KNIME allows the performance of data analysis without programming skills. Several free, online courses are provided.

    Read more →
  • Charge based boundary element fast multipole method

    Charge based boundary element fast multipole method

    The charge-based formulation of the boundary element method (BEM) is a dimensionality reduction numerical technique that is used to model quasistatic electromagnetic phenomena in highly complex conducting media (targeting, e.g., the human brain) with a very large (up to approximately 1 billion) number of unknowns. The charge-based BEM solves an integral equation of the potential theory written in terms of the induced surface charge density. This formulation is naturally combined with fast multipole method (FMM) acceleration, and the entire method is known as charge-based BEM-FMM. The combination of BEM and FMM is a common technique in different areas of computational electromagnetics and, in the context of bioelectromagnetism, it provides improvements over the finite element method. == Historical development == Along with more common electric potential-based BEM, the quasistatic charge-based BEM, derived in terms of the single-layer (charge) density, for a single-compartment medium has been known in the potential theory since the beginning of the 20th century. For multi-compartment conducting media, the surface charge density formulation first appeared in discretized form (for faceted interfaces) in the 1964 paper by Gelernter and Swihart. A subsequent continuous form, including time-dependent and dielectric effects, appeared in the 1967 paper by Barnard, Duck, and Lynn. The charge-based BEM has also been formulated for conducting, dielectric, and magnetic media, and used in different applications. In 2009, Greengard et al. successfully applied the charge-based BEM with fast multipole acceleration to molecular electrostatics of dielectrics. A similar approach to realistic modeling of the human brain with multiple conducting compartments was first described by Makarov et al. in 2018. Along with this, the BEM-based multilevel fast multipole method has been widely used in radar and antenna studies at microwave frequencies as well as in acoustics. == Physical background - surface charges in biological media == The charge-based BEM is based on the concept of an impressed (or primary) electric field E i {\displaystyle \mathbf {E} ^{i}} and a secondary electric field E s {\displaystyle \mathbf {E} ^{s}} . The impressed field is usually known a priori or is trivial to find. For the human brain, the impressed electric field can be classified as one of the following: A conservative field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed density of EEG or MEG current sources in a homogeneous infinite medium with the conductivity σ {\displaystyle \sigma } at the source location; An instantaneous solenoidal field E i {\displaystyle \mathbf {E} ^{i}} of an induction coil obtained from Faraday's law of induction in a homogeneous infinite medium (air), when transcranial magnetic stimulation (TMS) problems are concerned; A surface field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed surface current density J i = σ E i {\displaystyle \mathbf {J} ^{i}=\sigma \mathbf {E} ^{i}} of current electrodes injecting electric current at a boundary of a compartment with conductivity σ {\displaystyle \sigma } when transcranial direct-current stimulation (tDCS) or deep brain stimulation (DBS) are concerned; A conservative field E i {\displaystyle \mathbf {E} ^{i}} of charges deposited on voltage electrodes for tDCS or DBS. This specific problem requires a coupled treatment since these charges will depend on the environment; In application to multiscale modeling, a field E i {\displaystyle \mathbf {E} ^{i}} obtained from any other macroscopic numerical solution in a small (mesoscale or microscale) spatial domain within the brain. For example, a constant field can be used. When the impressed field is "turned on", free charges located within a conducting volume D immediately begin to redistribute and accumulate at the boundaries (interfaces) of regions of different conductivity in D. A surface charge density ρ ( r ) {\displaystyle \rho (\mathbf {r} )} appears on the conductivity interfaces. This charge density induces a secondary conservative electric field E s {\displaystyle \mathbf {E} ^{s}} following Coulomb's law. One example is a human under a direct current powerline with the known field E i {\displaystyle \mathbf {E} ^{i}} directed down. The superior surface of the human's conducting body will be charged negatively while its inferior portion is charged positively. These surface charges create a secondary electric field that effectively cancels or blocks the primary field everywhere in the body so that no current will flow within the body under DC steady state conditions. Another example is a human head with electrodes attached. At any conductivity interface with a normal vector n {\displaystyle \mathbf {n} } pointing from an "inside" (-) compartment of conductivity σ − {\displaystyle \sigma ^{-}} to an "outside" (+) compartment of conductivity σ + {\displaystyle \sigma ^{+}} , Kirchhoff's current law requires continuity of the normal component of the electric current density. This leads to the interfacial boundary condition in the form for every facet at a triangulated interface. As long as σ ± {\displaystyle \sigma ^{\pm }} are different from each other, the two normal components of the electric field, E ± ⋅ n {\displaystyle \mathbf {E} ^{\pm }\cdot \mathbf {n} } , must also be different. Such a jump across the interface is only possible when a sheet of surface charge exists at that interface. Thus, if an electric current or voltage is applied, the surface charge density follows. The goal of the numerical analysis is to find the unknown surface charge distribution and thus the total electric field E = E i + E s {\displaystyle \mathbf {E} =\mathbf {E} ^{i}+\mathbf {E} ^{s}} (and the total electric potential if required) anywhere in space. == System of equations for surface charges == Below, a derivation is given based on Gauss's law and Coulomb's law. All conductivity interfaces, denoted by S, are discretized into planar triangular facets t m {\displaystyle t_{m}} with centers r m {\displaystyle \mathbf {r} _{m}} . Assume that an m-th facet with the normal vector n m {\displaystyle \mathbf {n} _{m}} and area A m {\displaystyle A_{m}} carries a uniform surface charge density ρ m {\displaystyle \rho _{m}} . If a volumetric tetrahedral mesh were present, the charged facets would belong to tetrahedra with different conductivity values. We first compute the electric field E m + {\displaystyle \mathbf {E} _{m}^{+}} at the point r m + δ n m {\displaystyle \mathbf {r} _{m}+\delta \mathbf {n} _{m}} , for δ → 0 + {\displaystyle \delta \rightarrow 0^{+}} i.e., just outside facet 𝑚 at its center. This field contains three contributions: The continuous impressed electric field E i {\displaystyle \mathbf {E} ^{i}} itself; An electric field of the m-th charged facet itself. Very close to the facet, it can be approximated as the electric field of an infinite sheet of uniform surface charge ρ m {\displaystyle \rho _{m}} . By Gauss's law, it is given by + ρ m / 2 ε 0 ⋅ n m {\displaystyle +\rho _{m}/2\varepsilon _{0}\cdot \mathbf {n} _{m}} where ε 0 {\displaystyle \varepsilon _{0}} is a background electrical permittivity; An electric field generated by all other facets t n {\displaystyle t_{n}} , which we approximate as point charges of charge A n ρ n {\displaystyle A_{n}\rho _{n}} at each center r n {\displaystyle \mathbf {r} _{n}} . A similar treatment holds for the electric field E m − {\displaystyle \mathbf {E} _{m}^{-}} just inside facet 𝑚, but the electric field of the flat sheet of charge changes its sign. Using Coulomb's law to calculate the contribution of facets different from t m {\displaystyle t_{m}} , we find From this equation, we see that the normal component of the electric field indeed undergoes a jump through the charged interface. This is equivalent to a jump relation of the potential theory. As a second step, the two expressions for E m ± {\displaystyle \mathbf {E} _{m}^{\pm }} are substituted into the interfacial boundary condition σ − E m − ⋅ n m = σ + E m + ⋅ n m {\displaystyle \sigma ^{-}\mathbf {E} _{m}^{-}\cdot \mathbf {n} _{m}=\sigma ^{+}\mathbf {E} _{m}^{+}\cdot \mathbf {n} _{m}} , applied to every facet 𝑚. This operation leads to a system of linear equations for unknown charge densities ρ m {\displaystyle \rho _{m}} which solves the problem: where K m = σ − − σ + σ − + σ + {\displaystyle K_{m}={\frac {\sigma ^{-}-\sigma ^{+}}{\sigma ^{-}+\sigma ^{+}}}} is the electric conductivity contrast at the m-th facet. The normalization constant ε 0 {\displaystyle \varepsilon _{0}} will cancel out after the solution is substituted in the expression for E s {\displaystyle \mathbf {E} ^{s}} and becomes redundant. == Application of fast multipole method == For modern characterizations of brain topologies with ever-increasing levels of complexity, the above system of equations for ρ m {\displaystyle \rho _{m}} is very large; it is t

    Read more →
  • GNU Binutils

    GNU Binutils

    The GNU Binary Utilities, or binutils, is a collection of programming tools maintained by the GNU Project for working with executable code including assembly, linking and many other development operations. The tools are originally from Cygnus Solutions. The tools are typically used along with other GNU tools such as GNU Compiler Collection, and the GNU Debugger. == Tools == The tools include: == elfutils == Ulrich Drepper wrote elfutils, to partially replace GNU Binutils, purely for Linux and with support only for ELF and DWARF. It distributes three libraries with it for programmatic access.

    Read more →
  • Expectation–maximization algorithm

    Expectation–maximization algorithm

    In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem. == History == The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin. They pointed out that the method had been "proposed many times in special circumstances" by earlier authors. One of the earliest is the gene-counting method for estimating allele frequencies by Cedric Smith. Another was proposed by H.O. Hartley in 1958, and Hartley and Hocking in 1977, from which many of the ideas in the Dempster–Laird–Rubin paper originated. Another one by S.K Ng, Thriyambakam Krishnan and G.J McLachlan in 1977. Hartley's ideas can be broadened to any grouped discrete distribution. A very detailed treatment of the EM method for exponential families was published by Rolf Sundberg in his thesis and several papers, following his collaboration with Per Martin-Löf and Anders Martin-Löf. The Dempster–Laird–Rubin paper in 1977 generalized the method and sketched a convergence analysis for a wider class of problems. The Dempster–Laird–Rubin paper established the EM method as an important tool of statistical analysis. See also Meng and van Dyk (1997). The convergence analysis of the Dempster–Laird–Rubin algorithm was flawed and a correct convergence analysis was published by C. F. Jeff Wu in 1983. Wu's proof established the EM method's convergence also outside of the exponential family, as claimed by Dempster–Laird–Rubin. == Introduction == The EM algorithm is used to find (local) maximum likelihood parameters of a statistical model in cases where the equations cannot be solved directly. Typically these models involve latent variables in addition to unknown parameters and known data observations. That is, either missing values exist among the data, or the model can be formulated more simply by assuming the existence of further unobserved data points. For example, a mixture model can be described more simply by assuming that each observed data point has a corresponding unobserved data point, or latent variable, specifying the mixture component to which each data point belongs. Finding a maximum likelihood solution typically requires taking the derivatives of the likelihood function with respect to all the unknown values, the parameters and the latent variables, and simultaneously solving the resulting equations. In statistical models with latent variables, this is usually impossible. Instead, the result is typically a set of interlocking equations in which the solution to the parameters requires the values of the latent variables and vice versa, but substituting one set of equations into the other produces an unsolvable equation. The EM algorithm proceeds from the observation that there is a way to solve these two sets of equations numerically. One can simply pick arbitrary values for one of the two sets of unknowns, use them to estimate the second set, then use these new values to find a better estimate of the first set, and then keep alternating between the two until the resulting values both converge to fixed points. It's not obvious that this will work, but it can be proven in this context. Additionally, it can be proven that the derivative of the likelihood is (arbitrarily close to) zero at that point, which in turn means that the point is either a local maximum or a saddle point. In general, multiple maxima may occur, with no guarantee that the global maximum will be found. Some likelihoods also have singularities in them, i.e., nonsensical maxima. For example, one of the solutions that may be found by EM in a mixture model involves setting one of the components to have zero variance and the mean parameter for the same component to be equal to one of the data points. == Description == === The symbols === Given the statistical model which generates a set X {\displaystyle \mathbf {X} } of observed data, a set of unobserved latent data or missing values Z {\displaystyle \mathbf {Z} } , and a vector of unknown parameters θ {\displaystyle {\boldsymbol {\theta }}} , along with a likelihood function L ( θ ; X , Z ) = p ( X , Z ∣ θ ) {\displaystyle L({\boldsymbol {\theta }};\mathbf {X} ,\mathbf {Z} )=p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})} , the maximum likelihood estimate (MLE) of the unknown parameters is determined by maximizing the marginal likelihood of the observed data L ( θ ; X ) = p ( X ∣ θ ) = ∫ p ( X , Z ∣ θ ) d Z = ∫ p ( X ∣ Z , θ ) p ( Z ∣ θ ) d Z {\displaystyle {\begin{aligned}L({\boldsymbol {\theta }};\mathbf {X} )=p(\mathbf {X} \mid {\boldsymbol {\theta }})&=\int p(\mathbf {X} ,\mathbf {Z} \mid {\boldsymbol {\theta }})\,d\mathbf {Z} \\&=\int p(\mathbf {X} \mid \mathbf {Z} ,{\boldsymbol {\theta }})p(\mathbf {Z} \mid {\boldsymbol {\theta }})\,d\mathbf {Z} \end{aligned}}} However, this quantity is often intractable since Z {\displaystyle \mathbf {Z} } is unobserved and the distribution of Z {\displaystyle \mathbf {Z} } is unknown before attaining θ {\displaystyle {\boldsymbol {\theta }}} . === The EM algorithm === The EM algorithm seeks to find the maximum likelihood estimate of the marginal likelihood by iteratively applying these two steps: More succinctly, we can write it as one equation: θ ( t + 1 ) = arg ⁡ max θ ⁡ E Z ∼ p ( ⋅ | X , θ ( t ) ) ⁡ [ log ⁡ p ( X , Z | θ ) ] {\displaystyle {\boldsymbol {\theta }}^{(t+1)}=\mathop {\arg \max } _{\boldsymbol {\theta }}\operatorname {E} _{\mathbf {Z} \sim p(\cdot |\mathbf {X} ,{\boldsymbol {\theta }}^{(t)})}\left[\log p(\mathbf {X} ,\mathbf {Z} |{\boldsymbol {\theta }})\right]\,} === Interpretation of the variables === The typical models to which EM is applied use Z {\displaystyle \mathbf {Z} } as a latent variable indicating membership in one of a set of groups: The observed data points X {\displaystyle \mathbf {X} } may be discrete (taking values in a finite or countably infinite set) or continuous (taking values in an uncountably infinite set). Associated with each data point may be a vector of observations. The missing values (aka latent variables) Z {\displaystyle \mathbf {Z} } are discrete, drawn from a fixed number of values, and with one latent variable per observed unit. The parameters are continuous, and are of two kinds: Parameters that are associated with all data points, and those associated with a specific value of a latent variable (i.e., associated with all data points whose corresponding latent variable has that value). However, it is possible to apply EM to other sorts of models. The motivation is as follows. If the value of the parameters θ {\displaystyle {\boldsymbol {\theta }}} is known, usually the value of the latent variables Z {\displaystyle \mathbf {Z} } can be found by maximizing the log-likelihood over all possible values of Z {\displaystyle \mathbf {Z} } , either simply by iterating over Z {\displaystyle \mathbf {Z} } or through an algorithm such as the Viterbi algorithm for hidden Markov models. Conversely, if we know the value of the latent variables Z {\displaystyle \mathbf {Z} } , we can find an estimate of the parameters θ {\displaystyle {\boldsymbol {\theta }}} fairly easily, typically by simply grouping the observed data points according to the value of the associated latent variable and averaging the values, or some function of the values, of the points in each group. This suggests an iterative algorithm, in the case where both θ {\displaystyle {\boldsymbol {\theta }}} and Z {\displaystyle \mathbf {Z} } are unknown: First, initialize the parameters θ {\displaystyle {\boldsymbol {\theta }}} to some random values. Compute the probability of each possible value of ⁠ Z {\displaystyle \mathbf {Z} } ⁠, given ⁠ θ {\displaystyle {\boldsymbol {\theta }}} ⁠. Then, use the just-computed values of Z {\displaystyle \mathbf {Z} } to compute a better estimate for the parameters ⁠ θ {\displaystyle {\boldsymbol {\theta }}} ⁠. Iterate steps 2 and 3 until convergence. The algorithm as just described monotonically approaches a local minimum of the cost function. == Properties == Although an EM iteration does increase the observed data (i.e., marginal) likelihood function, no guarantee exists that the sequence converges to a maximum likelihood estimator. For multimodal distributions, this means that an EM algorithm may co

    Read more →
  • Mixture model

    Mixture model

    In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information. Mixture models are used for clustering, under the name model-based clustering, and also for density estimation. Mixture models should not be confused with models for compositional data, i.e., data whose components are constrained to sum to a constant value (1, 100%, etc.). However, compositional models can be thought of as mixture models, where members of the population are sampled at random. Conversely, mixture models can be thought of as compositional models, where the total size reading population has been normalized to 1. == Structure == === General mixture model === A typical finite-dimensional mixture model is a hierarchical model consisting of the following components: N random variables that are observed, each distributed according to a mixture of K components, with the components belonging to the same parametric family of distributions (e.g., all normal, all Zipfian, etc.) but with different parameters. However, it is also possible to have a finite mixture model where each component belongs to a different parametric family of distributions, for example, a mixture of a multivariate normal distribution and a generalized hyperbolic distribution. N random latent variables specifying the identity of the mixture component of each observation, each distributed according to a K-dimensional categorical distribution A set of K mixture weights, which are probabilities that sum to 1. A set of K parameters, each specifying the parameter of the corresponding mixture component. In many cases, each "parameter" is actually a set of parameters. For example, if the mixture components are Gaussian distributions, there will be a mean and variance for each component. If the mixture components are categorical distributions (e.g., when each observation is a token from a finite alphabet of size V), there will be a vector of V probabilities summing to 1. In addition, in a Bayesian setting, the mixture weights and parameters will themselves be random variables, and prior distributions will be placed over the variables. In such a case, the weights are typically viewed as a K-dimensional random vector drawn from a Dirichlet distribution (the conjugate prior of the categorical distribution), and the parameters will be distributed according to their respective conjugate priors. Mathematically, a basic parametric mixture model can be described as follows: K = number of mixture components N = number of observations θ i = 1 … K = parameter of distribution of observation associated with component i ϕ i = 1 … K = mixture weight, i.e., prior probability of a particular component i ϕ = K -dimensional vector composed of all the individual ϕ 1 … K ; must sum to 1 z i = 1 … N = component of observation i x i = 1 … N = observation i F ( x | θ ) = probability distribution of an observation, parametrized on θ z i = 1 … N ∼ Categorical ⁡ ( ϕ ) x i = 1 … N | z i = 1 … N ∼ F ( θ z i ) {\displaystyle {\begin{array}{lcl}K&=&{\text{number of mixture components}}\\N&=&{\text{number of observations}}\\\theta _{i=1\dots K}&=&{\text{parameter of distribution of observation associated with component }}i\\\phi _{i=1\dots K}&=&{\text{mixture weight, i.e., prior probability of a particular component }}i\\{\boldsymbol {\phi }}&=&K{\text{-dimensional vector composed of all the individual }}\phi _{1\dots K}{\text{; must sum to 1}}\\z_{i=1\dots N}&=&{\text{component of observation }}i\\x_{i=1\dots N}&=&{\text{observation }}i\\F(x|\theta )&=&{\text{probability distribution of an observation, parametrized on }}\theta \\z_{i=1\dots N}&\sim &\operatorname {Categorical} ({\boldsymbol {\phi }})\\x_{i=1\dots N}|z_{i=1\dots N}&\sim &F(\theta _{z_{i}})\end{array}}} In a Bayesian setting, all parameters are associated with random variables, as follows: K , N = as above θ i = 1 … K , ϕ i = 1 … K , ϕ = as above z i = 1 … N , x i = 1 … N , F ( x | θ ) = as above α = shared hyperparameter for component parameters β = shared hyperparameter for mixture weights H ( θ | α ) = prior probability distribution of component parameters, parametrized on α θ i = 1 … K ∼ H ( θ | α ) ϕ ∼ S y m m e t r i c - D i r i c h l e t K ⁡ ( β ) z i = 1 … N | ϕ ∼ Categorical ⁡ ( ϕ ) x i = 1 … N | z i = 1 … N , θ i = 1 … K ∼ F ( θ z i ) {\displaystyle {\begin{array}{lcl}K,N&=&{\text{as above}}\\\theta _{i=1\dots K},\phi _{i=1\dots K},{\boldsymbol {\phi }}&=&{\text{as above}}\\z_{i=1\dots N},x_{i=1\dots N},F(x|\theta )&=&{\text{as above}}\\\alpha &=&{\text{shared hyperparameter for component parameters}}\\\beta &=&{\text{shared hyperparameter for mixture weights}}\\H(\theta |\alpha )&=&{\text{prior probability distribution of component parameters, parametrized on }}\alpha \\\theta _{i=1\dots K}&\sim &H(\theta |\alpha )\\{\boldsymbol {\phi }}&\sim &\operatorname {Symmetric-Dirichlet} _{K}(\beta )\\z_{i=1\dots N}|{\boldsymbol {\phi }}&\sim &\operatorname {Categorical} ({\boldsymbol {\phi }})\\x_{i=1\dots N}|z_{i=1\dots N},\theta _{i=1\dots K}&\sim &F(\theta _{z_{i}})\end{array}}} This characterization uses F and H to describe arbitrary distributions over observations and parameters, respectively. Typically H will be the conjugate prior of F. The two most common choices of F are Gaussian aka "normal" (for real-valued observations) and categorical (for discrete observations). Other common possibilities for the distribution of the mixture components are: Binomial distribution, for the number of "positive occurrences" (e.g., successes, yes votes, etc.) given a fixed number of total occurrences Multinomial distribution, similar to the binomial distribution, but for counts of multi-way occurrences (e.g., yes/no/maybe in a survey) Negative binomial distribution, for binomial-type observations but where the quantity of interest is the number of failures before a given number of successes occurs Poisson distribution, for the number of occurrences of an event in a given period of time, for an event that is characterized by a fixed rate of occurrence Exponential distribution, for the time before the next event occurs, for an event that is characterized by a fixed rate of occurrence Log-normal distribution, for positive real numbers that are assumed to grow exponentially, such as incomes or prices Multivariate normal distribution (aka multivariate Gaussian distribution), for vectors of correlated outcomes that are individually Gaussian-distributed Multivariate Student's t-distribution, for vectors of heavy-tailed correlated outcomes A vector of Bernoulli-distributed values, corresponding, e.g., to a black-and-white image, with each value representing a pixel; see the handwriting-recognition example below === Specific examples === ==== Gaussian mixture model ==== A typical non-Bayesian Gaussian mixture model looks like this: K , N = as above ϕ i = 1 … K , ϕ = as above z i = 1 … N , x i = 1 … N = as above θ i = 1 … K = { μ i = 1 … K , σ i = 1 … K 2 } μ i = 1 … K = mean of component i σ i = 1 … K 2 = variance of component i z i = 1 … N ∼ Categorical ⁡ ( ϕ ) x i = 1 … N ∼ N ( μ z i , σ z i 2 ) {\displaystyle {\begin{array}{lcl}K,N&=&{\text{as above}}\\\phi _{i=1\dots K},{\boldsymbol {\phi }}&=&{\text{as above}}\\z_{i=1\dots N},x_{i=1\dots N}&=&{\text{as above}}\\\theta _{i=1\dots K}&=&\{\mu _{i=1\dots K},\sigma _{i=1\dots K}^{2}\}\\\mu _{i=1\dots K}&=&{\text{mean of component }}i\\\sigma _{i=1\dots K}^{2}&=&{\text{variance of component }}i\\z_{i=1\dots N}&\sim &\operatorname {Categorical} ({\boldsymbol {\phi }})\\x_{i=1\dots N}&\sim &{\mathcal {N}}(\mu _{z_{i}},\sigma _{z_{i}}^{2})\end{array}}} A Bayesian version of a Gaussian mixture model is as follows: K , N = as above ϕ i = 1 … K , ϕ = as above z i = 1 … N , x i = 1 … N = as above θ i = 1 … K = { μ i = 1 … K , σ i = 1 … K 2 } μ i = 1 … K = mean of component i σ i = 1 … K 2 = variance of component i μ 0 , λ , ν , σ 0 2 = shared hyperparameters μ i = 1 … K ∼ N ( μ 0 , λ σ i 2 ) σ i = 1 … K 2 ∼ I n v e r s e - G a m m a ⁡ ( ν , σ 0 2 ) ϕ ∼ S y m m e t r i c - D i r i c h l e t K ⁡ ( β ) z i = 1 … N ∼ Categorical ⁡ ( ϕ ) x i = 1 … N ∼ N ( μ z i , σ z i 2 ) {\displaystyle {\begin{array}{lcl}K,N&=&{\text{as above}}\\\phi _{i=1\dots K},{\boldsymbol {\phi }}&=&{\text{as above}}\\z_{i=1\dots N},x_{i=1\dots N}&=&{\text{as above}}\\\theta _{i=1\

    Read more →