AI Data Analyst Zalando

AI Data Analyst Zalando — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • U-Net

    U-Net

    U-Net is a convolutional neural network that was developed for image segmentation. The network is based on a fully convolutional neural network whose architecture was modified and extended to work with fewer training images and to yield more precise segmentation. Segmentation of a 512 × 512 image takes less than a second on a modern (2015) GPU using the U-Net architecture. The U-Net architecture has also been employed in diffusion models for iterative image denoising. This technology underlies many modern image generation models, such as DALL-E, Midjourney, and Stable Diffusion. U-Net is also being explored for language models. Tokenization is not a separate step, allowing the model to more easily understand spelling and concurrently vectorizing / tokenizing higher level concepts. == Description == The U-Net architecture stems from the so-called "fully convolutional network". The main idea is to supplement a usual contracting network by successive layers, where pooling operations are replaced by upsampling operators. Hence these layers increase the resolution of the output. A successive convolutional layer can then learn to assemble a precise output based on this information. One important modification in U-Net is that there are a large number of feature channels in the upsampling part, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting part, and yields a u-shaped architecture. The network only uses the valid part of each convolution without any fully connected layers. To predict the pixels in the border region of the image, the missing context is extrapolated by mirroring the input image. This tiling strategy is important to apply the network to large images, since otherwise the resolution would be limited by the GPU memory. Recently, there had also been an interest in receptive field based U-Net models for medical image segmentation. == Network architecture == The network consists of a contracting path and an expansive path, which gives it the u-shaped architecture. The contracting path is a typical convolutional network that consists of repeated application of convolutions, each followed by a rectified linear unit (ReLU) and a max pooling operation. During the contraction, the spatial information is reduced while feature information is increased. The expansive pathway combines the feature and spatial information through a sequence of up-convolutions and concatenations with high-resolution features from the contracting path. == Applications == There are many applications of U-Net in biomedical image segmentation, such as brain image segmentation (''BRATS'') and liver image segmentation ("siliver07") as well as protein binding site prediction. U-Net implementations have also found use in the physical sciences, for example in the analysis of micrographs of materials. Variations of the U-Net have also been applied for medical image reconstruction. Here are some variants and applications of U-Net as follows: Pixel-wise regression using U-Net and its application on pansharpening; 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation; TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation. Image-to-image translation to estimate fluorescent stains In binding site prediction of protein structure. == History == U-Net was created by Olaf Ronneberger, Philipp Fischer, Thomas Brox in 2015 and reported in the paper "U-Net: Convolutional Networks for Biomedical Image Segmentation". It is an improvement and development of FCN: Evan Shelhamer, Jonathan Long, Trevor Darrell (2014). "Fully convolutional networks for semantic segmentation".

    Read more →
  • NETtalk (artificial neural network)

    NETtalk (artificial neural network)

    NETtalk is an artificial neural network that learns to pronounce written English text by supervised learning. It takes English text as input, and produces a matching phonetic transcriptions as output. It is the result of research carried out in the mid-1980s by Terrence Sejnowski and Charles Rosenberg. The intent behind NETtalk was to construct simplified models that might shed light on the complexity of learning human level cognitive tasks, and their implementation as a connectionist model that could also learn to perform a comparable task. The authors trained it by backpropagation. The network was trained on a large amount of English words and their corresponding pronunciations, and is able to generate pronunciations for unseen words with a high level of accuracy. The output of the network was a stream of phonemes, which fed into DECtalk to produce audible speech. It achieved popular success, appearing on the Today show. From the point of view of modeling human cognition, NETtalk does not specifically model the image processing stages and letter recognition of the visual cortex. Rather, it assumes that the letters have been pre-classified and recognized. It is NETtalk's task to learn proper associations between the correct pronunciation with a given sequence of letters based on the context in which the letters appear. A similar architecture was subsequently used for the opposite task, that of converting continuous speech signal to a phoneme sequence. == Training == The training dataset was a 20,008-word subset of the Brown Corpus, with manually annotated phoneme and stress for each letter. The development process was described in a 1993 interview. It took three months -- 250 person-hours -- to create the training dataset, but only a few days to train the network. After it was run successfully on this, the authors tried it on a phonological transcription of an interview with a young Latino boy from a barrio in Los Angeles. This resulted in a network that reproduced his Spanish accent. The original NETtalk was implemented on a Ridge 32, which took 0.275 seconds per learning step (one forward and one backward pass). Training NETtalk became a benchmark to test for the efficiency of backpropagation programs. For example, an implementation on Connection Machine-1 (with 16384 processors) ran at 52x speedup. An implementation on a 10-cell Warp ran at 340x speedup. The following table compiles the benchmark scores as of 1988. Speed is measured in "millions of connections per second" (MCPS). For example, the original NETtalk on Ridge 32 took 0.275 seconds per forward-backward pass, giving 18629 / 10 6 0.275 = 0.068 {\displaystyle {\frac {18629/10^{6}}{0.275}}=0.068} MCPS. Relative times are normalized to the MicroVax. == Architecture == The network had three layers and 18,629 adjustable weights, large by the standards of 1986. There were worries that it would overfit the dataset, but it was trained successfully. The input of the network has 203 units, divided into 7 groups of 29 units each. Each group is a one-hot encoding of one character. There are 29 possible characters: 26 letters, comma, period, and word boundary (whitespace). To produce the pronunciation of a single character, the network takes the character itself, as well as 3 characters before and 3 characters after it. The hidden layer has 80 units. The output has 26 units. 21 units encode for articulatory features (point of articulation, voicing, vowel height, etc.) of phonemes, and 5 units encode for stress and syllable boundaries. Sejnowski studied the learned representation in the network, and found that phonemes that sound similar are clustered together in representation space. The output of the network degrades, but remains understandable, when some hidden neurons are removed.

    Read more →
  • (1+ε)-approximate nearest neighbor search

    (1+ε)-approximate nearest neighbor search

    (1+ε)-approximate nearest neighbor search is a variant of the nearest neighbor search problem. A solution to the (1+ε)-approximate nearest neighbor search is a point or multiple points within distance (1+ε) R from a query point, where R is the distance between the query point and its true nearest neighbor. Reasons to approximate nearest neighbor search include the space and time costs of exact solutions in high-dimensional spaces (see curse of dimensionality) and that in some domains, finding an approximate nearest neighbor is an acceptable solution. Approaches for solving (1+ε)-approximate nearest neighbor search include k-d trees, locality-sensitive hashing and brute-force search.

    Read more →
  • Nearest neighbor search

    Nearest neighbor search

    Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the nearest neighbor (NN) search problem is defined as follows: given a set S of points in a space M and a query point q ∈ M {\displaystyle q\in M} , find the closest point in S to q. Donald Knuth in volume 3 of The Art of Computer Programming (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office. A direct generalization of this problem is a k-NN search, where we need to find the k closest points. Most commonly M is a metric space and dissimilarity is expressed as a distance metric, which is symmetric and satisfies the triangle inequality. Even more common, M is taken to be the d-dimensional vector space where dissimilarity is measured using the Euclidean distance, Manhattan distance or other distance metric. However, the dissimilarity function can be arbitrary. One example is asymmetric Bregman divergence, for which the triangle inequality does not hold. == Applications == The nearest neighbor search problem arises in numerous fields of application, including: Pattern recognition – in particular for optical character recognition Statistical classification – see k-nearest neighbor algorithm Computer vision – for point cloud registration Computational geometry – see Closest pair of points problem Cryptanalysis – for lattice problem Databases – e.g. content-based image retrieval Coding theory – see maximum likelihood decoding Semantic search Vector databases, where nearest-neighbor lookup over embeddings is used to retrieve semantically similar records Retrieval-augmented generation systems, where nearest-neighbor retrieval over embeddings is used to fetch candidate passages or documents before generation Data compression – see MPEG-2 standard Robotic sensing Recommendation systems, e.g. see Collaborative filtering Internet marketing – see contextual advertising and behavioral targeting DNA sequencing Spell checking – suggesting correct spelling Plagiarism detection Similarity scores for predicting career paths of professional athletes. Cluster analysis – assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense, usually based on Euclidean distance Chemical similarity Sampling-based motion planning == Methods == Various solutions to the NNS problem have been proposed. The quality and usefulness of the algorithms are determined by the time complexity of queries as well as the space complexity of any search data structures that must be maintained. The informal observation usually referred to as the curse of dimensionality states that there is no general-purpose exact solution for NNS in high-dimensional Euclidean space using polynomial preprocessing and polylogarithmic search time. === Exact methods === ==== Linear search ==== The simplest solution to the NNS problem is to compute the distance from the query point to every other point in the database, keeping track of the "best so far". This algorithm, sometimes referred to as the naive approach, has a running time of O(dN), where N is the cardinality of S and d is the dimensionality of S. There are no search data structures to maintain, so the linear search has no space complexity beyond the storage of the database. Naive search can, on average, outperform space partitioning approaches on higher dimensional spaces. The absolute distance is not required for distance comparison, only the relative distance. In geometric coordinate systems the distance calculation can be sped up considerably by omitting the square root calculation from the distance calculation between two coordinates. The distance comparison will still yield identical results. ==== Space partitioning ==== Since the 1970s, the branch and bound methodology has been applied to the problem. In the case of Euclidean space, this approach encompasses spatial index or spatial access methods. Several space-partitioning methods have been developed for solving the NNS problem. Perhaps the simplest is the k-d tree, which iteratively bisects the search space into two regions containing half of the points of the parent region. Queries are performed via traversal of the tree from the root to a leaf by evaluating the query point at each split. Depending on the distance specified in the query, neighboring branches that might contain hits may also need to be evaluated. For constant dimension query time, average complexity is O(log N) in the case of randomly distributed points, worst case complexity is O(kN^(1-1/k)) Alternatively the R-tree data structure was designed to support nearest neighbor search in dynamic context, as it has efficient algorithms for insertions and deletions such as the R tree. R-trees can yield nearest neighbors not only for Euclidean distance, but can also be used with other distances. In the case of general metric space, the branch-and-bound approach is known as the metric tree approach. Particular examples include vp-tree and BK-tree methods. Using a set of points taken from a 3-dimensional space and put into a BSP tree, and given a query point taken from the same space, a possible solution to the problem of finding the nearest point-cloud point to the query point is given in the following description of an algorithm. (Strictly speaking, no such point may exist, because it may not be unique. But in practice, usually we only care about finding any one of the subset of all point-cloud points that exist at the shortest distance to a given query point.) The idea is, for each branching of the tree, guess that the closest point in the cloud resides in the half-space containing the query point. This may not be the case, but it is a good heuristic. After having recursively gone through all the trouble of solving the problem for the guessed half-space, now compare the distance returned by this result with the shortest distance from the query point to the partitioning plane. This latter distance is that between the query point and the closest possible point that could exist in the half-space not searched. If this distance is greater than that returned in the earlier result, then clearly there is no need to search the other half-space. If there is such a need, then you must go through the trouble of solving the problem for the other half space, and then compare its result to the former result, and then return the proper result. The performance of this algorithm is nearer to logarithmic time than linear time when the query point is near the cloud, because as the distance between the query point and the closest point-cloud point nears zero, the algorithm needs only perform a look-up using the query point as a key to get the correct result. === Approximation methods === An approximate nearest neighbor search algorithm is allowed to return points whose distance from the query is at most c {\displaystyle c} times the distance from the query to its nearest points. The appeal of this approach is that, in many cases, an approximate nearest neighbor is almost as good as the exact one. In particular, if the distance measure accurately captures the notion of user quality, then small differences in the distance should not matter. ==== Greedy search in proximity neighborhood graphs ==== Proximity graph methods (such as navigable small world graphs and HNSW) are considered the current state-of-the-art for the approximate nearest neighbors search. The methods are based on greedy traversing in proximity neighborhood graphs G ( V , E ) {\displaystyle G(V,E)} in which every point x i ∈ S {\displaystyle x_{i}\in S} is uniquely associated with vertex v i ∈ V {\displaystyle v_{i}\in V} . The search for the nearest neighbors to a query q in the set S takes the form of searching for the vertex in the graph G ( V , E ) {\displaystyle G(V,E)} . The basic algorithm – greedy search – works as follows: search starts from an enter-point vertex v i ∈ V {\displaystyle v_{i}\in V} by computing the distances from the query q to each vertex of its neighborhood { v j : ( v i , v j ) ∈ E } {\displaystyle \{v_{j}:(v_{i},v_{j})\in E\}} , and then finds a vertex with the minimal distance value. If the distance value between the query and the selected vertex is smaller than the one between the query and the current element, then the algorithm moves to the selected vertex, and it becomes new enter-point. The algorithm stops when it reaches a local minimum: a vertex whose neighborhood does not contain a vertex that is closer to the query than the vertex itself. The idea of proximity neighborhood graphs was exploited in multiple publications, including the seminal paper by Arya and Mount, in the VoroNet syst

    Read more →
  • 3D-Coat

    3D-Coat

    3DCoat is a commercial digital sculpting program from Pilgway designed to create free-form organic and hard surfaced 3D models, with tools which enable users to sculpt, add polygonal topology (automatically or manually), create UV maps (automatically or manually), texture the resulting models with natural painting tools, and render static images or animated "turntable" movies. The program can also be used to modify imported 3D models from a number of commercial 3D software products by means of plugins called Applinks. Imported models can be converted into voxel objects for further refinement and for adding high resolution detail, complete UV unwrapping and mapping, as well as adding PBR textures for displacement, bump maps, specular and diffuse color maps. A live connection to a chosen external 3D application can be established through the Applink pipeline, allowing for the transfer of model and texture information. 3DCoat specializes in voxel sculpting and polygonal sculpting using dynamic patch tessellation technology and polygonal sculpting tools. It includes "auto-retopology", a proprietary skinning algorithm which generates a polygonal mesh skin over any voxel sculpture, composed primarily of quadrangles.

    Read more →
  • BrownBoost

    BrownBoost

    BrownBoost is a boosting algorithm that may be robust to noisy datasets. BrownBoost is an adaptive version of the boost by majority algorithm. As is the case for all boosting algorithms, BrownBoost is used in conjunction with other machine learning methods. BrownBoost was introduced by Yoav Freund in 2001. == Motivation == AdaBoost performs well on a variety of datasets; however, it can be shown that AdaBoost does not perform well on noisy data sets. This is a result of AdaBoost's focus on examples that are repeatedly misclassified. In contrast, BrownBoost effectively "gives up" on examples that are repeatedly misclassified. The core assumption of BrownBoost is that noisy examples will be repeatedly mislabeled by the weak hypotheses and non-noisy examples will be correctly labeled frequently enough to not be "given up on." Thus only noisy examples will be "given up on," whereas non-noisy examples will contribute to the final classifier. In turn, if the final classifier is learned from the non-noisy examples, the generalization error of the final classifier may be much better than if learned from noisy and non-noisy examples. The user of the algorithm can set the amount of error to be tolerated in the training set. Thus, if the training set is noisy (say 10% of all examples are assumed to be mislabeled), the booster can be told to accept a 10% error rate. Since the noisy examples may be ignored, only the true examples will contribute to the learning process. == Algorithm description == BrownBoost uses a non-convex potential loss function, thus it does not fit into the AdaBoost framework. The non-convex optimization provides a method to avoid overfitting noisy data sets. However, in contrast to boosting algorithms that analytically minimize a convex loss function (e.g. AdaBoost and LogitBoost), BrownBoost solves a system of two equations and two unknowns using standard numerical methods. The only parameter of BrownBoost ( c {\displaystyle c} in the algorithm) is the "time" the algorithm runs. The theory of BrownBoost states that each hypothesis takes a variable amount of time ( t {\displaystyle t} in the algorithm) which is directly related to the weight given to the hypothesis α {\displaystyle \alpha } . The time parameter in BrownBoost is analogous to the number of iterations T {\displaystyle T} in AdaBoost. A larger value of c {\displaystyle c} means that BrownBoost will treat the data as if it were less noisy and therefore will give up on fewer examples. Conversely, a smaller value of c {\displaystyle c} means that BrownBoost will treat the data as more noisy and give up on more examples. During each iteration of the algorithm, a hypothesis is selected with some advantage over random guessing. The weight of this hypothesis α {\displaystyle \alpha } and the "amount of time passed" t {\displaystyle t} during the iteration are simultaneously solved in a system of two non-linear equations ( 1. uncorrelated hypothesis w.r.t example weights and 2. hold the potential constant) with two unknowns (weight of hypothesis α {\displaystyle \alpha } and time passed t {\displaystyle t} ). This can be solved by bisection (as implemented in the JBoost software package) or Newton's method (as described in the original paper by Freund). Once these equations are solved, the margins of each example ( r i ( x j ) {\displaystyle r_{i}(x_{j})} in the algorithm) and the amount of time remaining s {\displaystyle s} are updated appropriately. This process is repeated until there is no time remaining. The initial potential is defined to be 1 m ∑ j = 1 m 1 − erf ( c ) = 1 − erf ( c ) {\displaystyle {\frac {1}{m}}\sum _{j=1}^{m}1-{\mbox{erf}}({\sqrt {c}})=1-{\mbox{erf}}({\sqrt {c}})} . Since a constraint of each iteration is that the potential be held constant, the final potential is 1 m ∑ j = 1 m 1 − erf ( r i ( x j ) / c ) = 1 − erf ( c ) {\displaystyle {\frac {1}{m}}\sum _{j=1}^{m}1-{\mbox{erf}}(r_{i}(x_{j})/{\sqrt {c}})=1-{\mbox{erf}}({\sqrt {c}})} . Thus the final error is likely to be near 1 − erf ( c ) {\displaystyle 1-{\mbox{erf}}({\sqrt {c}})} . However, the final potential function is not the 0–1 loss error function. For the final error to be exactly 1 − erf ( c ) {\displaystyle 1-{\mbox{erf}}({\sqrt {c}})} , the variance of the loss function must decrease linearly w.r.t. time to form the 0–1 loss function at the end of boosting iterations. This is not yet discussed in the literature and is not in the definition of the algorithm below. The final classifier is a linear combination of weak hypotheses and is evaluated in the same manner as most other boosting algorithms. == BrownBoost learning algorithm definition == Input: m {\displaystyle m} training examples ( x 1 , y 1 ) , … , ( x m , y m ) {\displaystyle (x_{1},y_{1}),\ldots ,(x_{m},y_{m})} where x j ∈ X , y j ∈ Y = { − 1 , + 1 } {\displaystyle x_{j}\in X,\,y_{j}\in Y=\{-1,+1\}} The parameter c {\displaystyle c} Initialise: s = c {\displaystyle s=c} . (The value of s {\displaystyle s} is the amount of time remaining in the game) r i ( x j ) = 0 {\displaystyle r_{i}(x_{j})=0} ∀ j {\displaystyle \forall j} . The value of r i ( x j ) {\displaystyle r_{i}(x_{j})} is the margin at iteration i {\displaystyle i} for example x j {\displaystyle x_{j}} . While s > 0 {\displaystyle s>0} : Set the weights of each example: W i ( x j ) = e − ( r i ( x j ) + s ) 2 c {\displaystyle W_{i}(x_{j})=e^{-{\frac {(r_{i}(x_{j})+s)^{2}}{c}}}} , where r i ( x j ) {\displaystyle r_{i}(x_{j})} is the margin of example x j {\displaystyle x_{j}} Find a classifier h i : X → { − 1 , + 1 } {\displaystyle h_{i}:X\to \{-1,+1\}} such that ∑ j W i ( x j ) h i ( x j ) y j > 0 {\displaystyle \sum _{j}W_{i}(x_{j})h_{i}(x_{j})y_{j}>0} Find values α , t {\displaystyle \alpha ,t} that satisfy the equation: ∑ j h i ( x j ) y j e − ( r i ( x j ) + α h i ( x j ) y j + s − t ) 2 c = 0 {\displaystyle \sum _{j}h_{i}(x_{j})y_{j}e^{-{\frac {(r_{i}(x_{j})+\alpha h_{i}(x_{j})y_{j}+s-t)^{2}}{c}}}=0} . (Note this is similar to the condition E W i + 1 [ h i ( x j ) y j ] = 0 {\displaystyle E_{W_{i+1}}[h_{i}(x_{j})y_{j}]=0} set forth by Schapire and Singer. In this setting, we are numerically finding the W i + 1 = exp ⁡ ( ⋯ ⋯ ) {\displaystyle W_{i+1}=\exp \left({\frac {\cdots }{\cdots }}\right)} such that E W i + 1 [ h i ( x j ) y j ] = 0 {\displaystyle E_{W_{i+1}}[h_{i}(x_{j})y_{j}]=0} .) This update is subject to the constraint ∑ ( Φ ( r i ( x j ) + α h ( x j ) y j + s − t ) − Φ ( r i ( x j ) + s ) ) = 0 {\displaystyle \sum \left(\Phi \left(r_{i}(x_{j})+\alpha h(x_{j})y_{j}+s-t\right)-\Phi \left(r_{i}(x_{j})+s\right)\right)=0} , where Φ ( z ) = 1 − erf ( z / c ) {\displaystyle \Phi (z)=1-{\mbox{erf}}(z/{\sqrt {c}})} is the potential loss for a point with margin r i ( x j ) {\displaystyle r_{i}(x_{j})} Update the margins for each example: r i + 1 ( x j ) = r i ( x j ) + α h ( x j ) y j {\displaystyle r_{i+1}(x_{j})=r_{i}(x_{j})+\alpha h(x_{j})y_{j}} Update the time remaining: s = s − t {\displaystyle s=s-t} Output: H ( x ) = sign ( ∑ i α i h i ( x ) ) {\displaystyle H(x)={\textrm {sign}}\left(\sum _{i}\alpha _{i}h_{i}(x)\right)} == Empirical results == In preliminary experimental results with noisy datasets, BrownBoost outperformed AdaBoost's generalization error; however, LogitBoost performed as well as BrownBoost. An implementation of BrownBoost can be found in the open source software JBoost.

    Read more →
  • ARKA descriptors in QSAR

    ARKA descriptors in QSAR

    In computational chemistry and cheminformatics, ARKA descriptors in QSAR are a class of molecular descriptors used in quantitative structure–activity relationship (QSAR) modeling (or related approaches such as QSPR and QSTR), a computational method for predicting the biological activity or toxicity of chemical compounds based on their molecular structure. Molecular descriptors are numerical values that summarize information about a molecule's structure, topology, geometry, or physicochemical properties in a form suitable for machine learning or statistical modeling. ARKA (Arithmetic Residuals in K-Groups Analysis) descriptors differ from traditional descriptors by encoding atomic-level information through recursive autoregression techniques, which aim to capture subtle structural patterns and improve predictive accuracy. They are designed to be both interpretable and well-suited to modeling nonlinear relationships in QSAR studies. == Comparisons == While QSAR is essentially a similarity-based approach, the occurrence of activity/property cliffs may greatly reduce the predictive accuracy of the developed models. The novel Arithmetic Residuals in K-groups Analysis (ARKA) approach is a supervised dimensionality reduction technique developed by the DTC Laboratory, Jadavpur University that can easily identify activity cliffs in a data set. Activity cliffs are similar in their structures but differ considerably in their activity. The basic idea of the ARKA descriptors is to group the conventional QSAR descriptors based on a predefined criterion and then assign weightage to each descriptor in each group. ARKA descriptors have also been used to develop classification-based and regression-based QSAR models with acceptable quality statistics. The ARKA descriptors have been used for the identification of activity cliffs in QSAR studies and/or model development by multiple researchers. A tutorial presentation on the ARKA descriptors is available. Recently a multi-class ARKA framework has been proposed for improved q-RASAR model generation.

    Read more →
  • Constrained clustering

    Constrained clustering

    In computer science, constrained clustering is a class of semi-supervised learning algorithms. Typically, constrained clustering incorporates either a set of must-link constraints, cannot-link constraints, or both, with a data clustering algorithm. A cluster in which the members conform to all must-link and cannot-link constraints is called a chunklet. == Types of constraints == Both a must-link and a cannot-link constraint define a relationship between two data instances. Together, the sets of these constraints act as a guide for which a constrained clustering algorithm will attempt to find chunklets (clusters in the dataset which satisfy the specified constraints). A must-link constraint is used to specify that the two instances in the must-link relation should be associated with the same cluster. A cannot-link constraint is used to specify that the two instances in the cannot-link relation should not be associated with the same cluster. Some constrained clustering algorithms will abort if no such clustering exists which satisfies the specified constraints. Others will try to minimize the amount of constraint violation should it be impossible to find a clustering which satisfies the constraints. Constraints could also be used to guide the selection of a clustering model among several possible solutions. == Examples == Examples of constrained clustering algorithms include: COP K-means PCKmeans (Pairwise Constrained K-means) CMWK-Means (Constrained Minkowski Weighted K-Means)

    Read more →
  • Agent Ruby

    Agent Ruby

    Agent Ruby (1998–2002) by Lynn Hershman Leeson is an interactive, multiuser work using artificial intelligence. == Description == On Agent Ruby's website, "Agent Ruby's Edream Portal," a female face moves her eyes and lips. Ruby, named from Hershman Leeson's own film, Teknolust, answers questions and often responds that she needs a better algorithm to answer questions not within her database. The work, created with AI, explores relationships between real and virtual worlds. Hershman Leeson had created an earlier version of Ruby, CyberRoberta, which was a custom-made doll with webcam eyes that interacted with the internet. The work in a gallery provides a screen and a sign inviting gallery-goers to "Chat with Ruby." == Artificial intelligence == In 2015 when Agent Ruby was exhibited at the gallery Modern Art Oxford, a review in Aesthetica Magazine described it as an artificial intelligence agent. A review in New Scientist noted that "Ruby is a fast learner, but perhaps not a natural conversationalist." A 2024 list of "25 Essential AI Artworks" published by ARTnews wrote that while "Agent Ruby's capabilities seem limited by today's standards," it was extensive for its day. == Publications and exhibitions == Agent Ruby was commissioned and displayed at the San Francisco Museum of Modern Art, Modern Art Oxford, and the ZKM Center for Art and Media in Karlsruhe, Germany. The San Francisco Museum of Modern Art (SFMOMA) presented Lynn Hershman Leeson: The Agent Ruby Files, March 30 through June 2, 2013 which presented the project server's archive of user conversations over the 12 years of exhibitions.

    Read more →
  • Extremal Ensemble Learning

    Extremal Ensemble Learning

    Extremal Ensemble Learning (EEL) is a machine learning algorithmic paradigm for graph partitioning. EEL creates an ensemble of partitions and then uses information contained in the ensemble to find new and improved partitions. The ensemble evolves and learns how to form improved partitions through extremal updating procedure. The final solution is found by achieving consensus among its member partitions about what the optimal partition is. == Reduced-Network Extremal Ensemble Learning (RenEEL) == A particular implementation of the EEL paradigm is the Reduced-Network Extremal Ensemble Learning (RenEEL) scheme for partitioning a graph. RenEEL uses consensus across many partitions in an ensemble to create a reduced network that can be efficiently analyzed to find more accurate partitions. These better quality partitions are subsequently used to update the ensemble. An algorithm that utilizes the RenEEL scheme is currently the best algorithm for finding the graph partition with maximum modularity, which is an NP-hard problem.

    Read more →
  • Probabilistic latent semantic analysis

    Probabilistic latent semantic analysis

    Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of two-mode and co-occurrence data. In effect, one can derive a low-dimensional representation of the observed variables in terms of their affinity to certain hidden variables, just as in latent semantic analysis, from which PLSA evolved. Compared to standard latent semantic analysis which stems from linear algebra and downsizes the occurrence tables (usually via a singular value decomposition), probabilistic latent semantic analysis is based on a mixture decomposition derived from a latent class model. == Model == Considering observations in the form of co-occurrences ( w , d ) {\displaystyle (w,d)} of words and documents, PLSA models the probability of each co-occurrence as a mixture of conditionally independent multinomial distributions: P ( w , d ) = ∑ c P ( d ) P ( c | d ) P ( w | c ) = P ( d ) ∑ c P ( c | d ) P ( w | c ) {\displaystyle P(w,d)=\sum _{c}P(d)P(c|d)P(w|c)=P(d)\sum _{c}P(c|d)P(w|c)} with c {\displaystyle c} being the words' topic. Note that the number of topics is a hyperparameter that must be chosen in advance and is not estimated from the data. The first formulation is the symmetric formulation, where w {\displaystyle w} and d {\displaystyle d} are both generated from the latent class c {\displaystyle c} in similar ways (using the conditional probabilities P ( d | c ) {\displaystyle P(d|c)} and P ( w | c ) {\displaystyle P(w|c)} ), whereas the second formulation is the asymmetric formulation, where, for each document d {\displaystyle d} , a latent class is chosen conditionally to the document according to P ( c | d ) {\displaystyle P(c|d)} , and a word is then generated from that class according to P ( w | c ) {\displaystyle P(w|c)} . Although we have used words and documents in this example, the co-occurrence of any couple of discrete variables may be modelled in exactly the same way. So, the number of parameters is equal to c d + w c {\displaystyle cd+wc} . The number of parameters grows linearly with the number of documents. In addition, although PLSA is a generative model of the documents in the collection it is estimated on, it is not a generative model of new documents. Their parameters are learned using the EM algorithm. == Application == PLSA may be used in a discriminative setting, via Fisher kernels. PLSA has applications in information retrieval and filtering, natural language processing, machine learning from text, bioinformatics, and related areas. It is reported that the aspect model used in the probabilistic latent semantic analysis has severe overfitting problems. == Extensions == Hierarchical extensions: Asymmetric: MASHA ("Multinomial ASymmetric Hierarchical Analysis") Symmetric: HPLSA ("Hierarchical Probabilistic Latent Semantic Analysis") Generative models: The following models have been developed to address an often-criticized shortcoming of PLSA, namely that it is not a proper generative model for new documents. Latent Dirichlet allocation – adds a Dirichlet prior on the per-document topic distribution Higher-order data: Although this is rarely discussed in the scientific literature, PLSA extends naturally to higher order data (three modes and higher), i.e. it can model co-occurrences over three or more variables. In the symmetric formulation above, this is done simply by adding conditional probability distributions for these additional variables. This is the probabilistic analogue to non-negative tensor factorisation. == History == This is an example of a latent class model (see references therein), and it is related to non-negative matrix factorization. The present terminology was coined in 1999 by Thomas Hofmann.

    Read more →
  • LIBSVM

    LIBSVM

    LIBSVM and LIBLINEAR are two popular open source machine learning libraries, both developed at the National Taiwan University and both written in C++ though with a C API. LIBSVM implements the sequential minimal optimization (SMO) algorithm for kernelized support vector machines (SVMs), supporting classification and regression. LIBLINEAR implements linear SVMs and logistic regression models trained using a coordinate descent algorithm. The SVM learning code from both libraries is often reused in other open source machine learning toolkits, including GATE, KNIME, Orange and scikit-learn. Bindings and ports exist for programming languages such as Java, MATLAB, R, Julia, and Python. It is available in e1071 library in R and scikit-learn in Python. Both libraries are free software released under the 3-clause BSD license.

    Read more →
  • Simulation noise

    Simulation noise

    Simulation noise is a function that creates a divergence-free vector field. This signal can be used in artistic simulations for the purpose of increasing the perception of extra detail. The function can be calculated in three dimensions by dividing the space into a regular lattice grid. With each edge is associated a random value, indicating a rotational component of material revolving around the edge. By following rotating material into and out of faces, one can quickly sum the flux passing through each face of the lattice. Flux values at lattice faces are then interpolated to create a field value for all positions. Perlin noise is the earliest form of lattice noise, which has become very popular in computer graphics. Perlin Noise is not suited for simulation because it is not divergence-free. Noises based on lattices, such as simulation noise and Perlin noise, are often calculated at different frequencies and summed together to form band-limited fractal signals. Other approaches developed later that use vector calculus identities to produce divergence free fields, such as "Curl-Noise" as suggested by Rook Bridson, and "Divergence-Free Noise" due to Ivan DeWolf. These often require calculation of lattice noise gradients, which sometimes are not readily available. A naive implementation would call a lattice noise function several times to calculate its gradient, resulting in more computation than is strictly necessary. Unlike these noises, simulation noise has a geometric rationale in addition to its mathematical properties. It simulates vortices scattered in space, to produce its pleasing aesthetic. == Curl noise == The vector field is created as follows, for every point (x,y,z) in the space a vector field G is created, every component x, y and z of the vector field (Gx, Gy, Gz) is defined by a 3D perlin or simplex noise function with x, y and z as parameters. The partial derivative of Gx, Gy, and Gz respect to x, y and z is obtained with the gradient of the perlin or simplex noise by finite differences of implicit calculation inside the simplex noise. The partial derivatives are used to calculate F as the curl of G given by F = ( ∂ G z ∂ y − ∂ G y ∂ z , ∂ G x ∂ z − ∂ G z ∂ x , ∂ G y ∂ x − ∂ G x ∂ y ) {\displaystyle F=({\frac {\partial Gz}{\partial y}}-{\frac {\partial Gy}{\partial z}},{\frac {\partial Gx}{\partial z}}-{\frac {\partial Gz}{\partial x}},{\frac {\partial Gy}{\partial x}}-{\frac {\partial Gx}{\partial y}})} == Bitangent noise == This method is based in the fact that the curl of the gradient of scalar field is zero and the identity that expand the divergence of a cross product of two vectors A and B as the difference of the dot products of each vector with the curl of the other: ∇ × ( ∇ φ ) = 0 . {\displaystyle \nabla \times (\nabla \varphi )=\mathbf {0} .} ∇ ⋅ ( A × B ) = ( ∇ × A ) ⋅ B − A ⋅ ( ∇ × B ) {\displaystyle \nabla \cdot (\mathbf {A} \times \mathbf {B} )=\ (\nabla {\times }\mathbf {A} )\cdot \mathbf {B} \,-\,\mathbf {A} \cdot (\nabla {\times }\mathbf {B} )} which means that if the curl of both vector fields is zero then the divergence of the product of two vectors that are the gradients of scalar fields is zero too. This result in a divergence free vector field by construction only calling two noise functions to create the scalar fields. The vector field es created as follows, two scalar fields are calculated ϕ {\displaystyle \phi } and ψ {\displaystyle \psi } using 3D perlin or simplex noise functions, then the gradients A and B of each of this fields is calculated, the cross product of A and B gives a divergence free vector field. == Signed distance noise == The vector field is created based on a closed and differentiable implicit surface S = F(x,y,z) = 0. For every point in the space, frequently outside or near the surface, we get a vector g that is normal to the surface, this is the gradient of S or the partial derivatives respect to x, y and z, this vector is not unitary, but we can get a unitary normal n by dividing each component of the point by the magnitude of the gradient g. Outside of the surface all these normals point away from the surface. g = ∇ F ( x , y , z ) = ( ∂ F ∂ x , ∂ F ∂ y , ∂ F ∂ z ) {\displaystyle g=\nabla F(x,y,z)=\left({\frac {\partial F}{\partial x}},{\frac {\partial F}{\partial y}},{\frac {\partial F}{\partial z}}\right)} n = g ( x , y , z ) ‖ ∇ F ( x , y , z ) ‖ {\displaystyle \mathbf {n} ={\frac {g(x,y,z)}{\|\nabla F(x,y,z)\|}}} ‖ ∇ F ( x , y , z ) ‖ = ( ∂ F ∂ x ) 2 + ( ∂ F ∂ y ) 2 + ( ∂ F ∂ z ) 2 {\displaystyle \|\nabla F(x,y,z)\|={\sqrt {\left({\frac {\partial F}{\partial x}}\right)^{2}+\left({\frac {\partial F}{\partial y}}\right)^{2}+\left({\frac {\partial F}{\partial z}}\right)^{2}}}} Afterwards we calculate a scalar value p for that point in the space using a 3D perlin or simplex noise function. Now we create a vector field V = pn pointing outside of the surface. The curl of this vector field gives the direction in every point in the space where the particles should move. S D N = ( ∂ V z ∂ y − ∂ V y ∂ z , ∂ V x ∂ z − ∂ V z ∂ x , ∂ V y ∂ x − ∂ V x ∂ y ) {\displaystyle SDN=({\frac {\partial Vz}{\partial y}}-{\frac {\partial Vy}{\partial z}},{\frac {\partial Vx}{\partial z}}-{\frac {\partial Vz}{\partial x}},{\frac {\partial Vy}{\partial x}}-{\frac {\partial Vx}{\partial y}})} By construction this vector SDN will point in a tangent direction to an isosurface at the level of the signed distance to the original surface and can be used to confine the movements of the particles to stay in that surface.

    Read more →
  • Operational taxonomic unit

    Operational taxonomic unit

    An operational taxonomic unit (OTU) is an operational definition used to classify groups of closely related individuals. The term was originally introduced in 1963 by Robert R. Sokal and Peter H. A. Sneath in the context of numerical taxonomy, where an "operational taxonomic unit" is simply the group of organisms currently being studied. In this sense, an OTU is a pragmatic definition to group individuals by similarity, equivalent to but not necessarily in line with classical Linnaean taxonomy or modern evolutionary taxonomy. Nowadays, however, the term is commonly used in a different context and refers to clusters of (uncultivated or unknown) organisms, grouped by DNA sequence similarity of a specific taxonomic marker gene (originally coined as mOTU; molecular OTU). In other words, OTUs are pragmatic proxies for "species" at different taxonomic levels, in the absence of traditional systems of biological classification as are available for macroscopic organisms. For several years, OTUs have been the most commonly used units of diversity, especially when analysing small subunit 16S (for prokaryotes) or 18S rRNA (for eukaryotes) marker gene sequence datasets. == Molecular OTU by clustering of marker gene sequences == In the approach represented by DNA barcoding, a particular locus is chosen to be used as the marker gene for classification. This locus should be universally present in the scope selected, variable enough to be different among close-related species, and be flanked by conservative sequences that allow for easy amplification and detection. There are databases containing sequences for such marker genes from many different species, allowing for comparison. (Sometimes only using one locus does not provide sufficient resolution, so multiple marker genes are used. This is the case for plants, where rbcL+matK is common.) Sequences obtained this way can be clustered according to their similarity to one another, and operational taxonomic units are defined based on the similarity threshold set by the researcher. The exact threshold depends on the taxa in question and the mutational rates of the selected locus in the taxon. 97–99% are commonly used, but "it is now recognized to be somewhat arbitrary as sequence variation within and among species varies across taxa". 100% similarity (fully identical) is also common, also known as single variants. It remains debatable how well this commonly used method recapitulates true microbial species phylogeny or ecology. Although OTUs can be calculated differently when using different algorithms or thresholds, research by Schmidt et al. (2014) demonstrated that 16S-derived microbial OTUs were generally ecologically consistent across habitats and several clustering approaches. The number of OTUs defined may be inflated due to errors in DNA sequencing. === OTU clustering approaches === There are three main approaches to clustering OTUs: De novo, for which the clustering is based on similarities between sequencing reads. Closed-reference, for which the clustering is performed against a reference database of sequences. Open-reference, where clustering is first performed against a reference database of sequences, then any remaining sequences that could not be mapped to the reference are clustered de novo. Using a reference provides taxonomic context for the OTUs found. Alternatively, taxonomic context can be found after the construction of clusters by comparing representative sequences from clusters against a reference database. There are also specialized classifiers for this purpose which are much faster than naive comparison using BLAST. === OTU clustering algorithms === Hierarchical clustering algorithms (HCA): uclust & cd-hit & ESPRIT Bayesian clustering: CROP == Molecular OTU by other methods == In addition to similarity-based grouping, marker gene sequences can be sorted into OTUs using molecular phylogeny, k-mer composition, or hybrid methods combining these methods with similarity. There are also Bayesian tree-less methods and machine learning approaches. Using phylogeny often involves manually assigning terminal clades or single nodes to an OTU, so this is usually only done for refinement. Genome skimming can be used to obtain high-copy DNA without the need to choose marker genes or to design PCR primers for the chosen genes. It can provide fairly good coverage of organelle DNA and repetitive elements such as ribosomal DNA, both of which can be used like marker genes in OTU analysis. Whole-genome sequencing is more expensive and involves the production and processing of more data. By considering the entire genome, many (sometimes over 100) marker genes can be used at the same time, producing highly resolved phylogenies that correctly identify problematic taxa. It is also possible to use entire genomes for OTU assignment. For example, genomes from different bacterial species almost always have an average nucleotide identity lower than 95%, a fact that can be used to define new OTUs (and likely new species).

    Read more →
  • Causal Markov condition

    Causal Markov condition

    The Causal Markov (CM) condition states that, conditional on the set of all its direct causes, a node is independent of all variables which are not effects or direct causes of that node. In the event that the structure of a Bayesian network accurately depicts causality, the two conditions are equivalent. This is related to the Markov condition, an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket. A network may accurately embody the Markov condition without depicting causality, in which case it should not be assumed to embody the causal Markov condition. == Motivation == Statisticians are enormously interested in the ways in which certain events and variables are connected. The precise notion of what constitutes a cause and effect is necessary to understand the connections between them. The central idea behind the philosophical study of probabilistic causation is that causes raise the probabilities of their effects, all else being equal. A deterministic interpretation of causation means that if A causes B, then A must always be followed by B. In this sense, smoking does not cause cancer because some smokers never develop cancer. On the other hand, a probabilistic interpretation simply means that causes raise the probability of their effects. In this sense, changes in meteorological readings associated with a storm do cause that storm, since they raise its probability. (However, simply looking at a barometer does not change the probability of the storm, for a more detailed analysis, see:). == Examples == In a simple view, releasing one's hand from a hammer causes the hammer to fall. However, doing so in outer space does not produce the same outcome, calling into question if releasing one's fingers from a hammer always causes it to fall. A causal graph could be created to acknowledge that both the presence of gravity and the release of the hammer contribute to its falling. However, it would be very surprising if the surface underneath the hammer affected its falling. This essentially states the Causal Markov Condition, that given the existence of gravity the release of the hammer, it will fall regardless of what is beneath it. == Implications == === Dependence and Causation === It follows from the definition that if X and Y are in V and are probabilistically dependent, then either X causes Y, Y causes X, or X and Y are both effects of some common cause Z in V. This definition was seminally introduced by Hans Reichenbach as the Common Cause Principle (CCP). === Screening === It once again follows from the definition that the parents of X screen X from other "indirect causes" of X (parents of Parents(X)) and other effects of Parents(X) which are not also effects of X.

    Read more →