AI Assistant Jarvis

AI Assistant Jarvis — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Description logic

    Description logic

    Description logics (DL) are a family of formal knowledge representation languages. Many DLs are more expressive than propositional logic but less expressive than first-order logic. In contrast to the latter, the core reasoning problems for DLs are (usually) decidable, and efficient decision procedures have been designed and implemented for these problems. There are general, spatial, temporal, spatiotemporal, and fuzzy description logics, and each description logic features a different balance between expressive power and reasoning complexity by supporting different sets of mathematical constructors. DLs are used in artificial intelligence to describe and reason about the relevant concepts of an application domain (known as terminological knowledge). It is of particular importance in providing a logical formalism for ontologies and the Semantic Web: the Web Ontology Language (OWL) and its profiles are based on DLs. A major area of application of DLs and OWL is in biomedical informatics, where they assist in the codification of biomedical knowledge. DLs and OWL are also applied in other domains, including defense, climate modeling, and large-scale industrial knowledge graphs. == Introduction == A DL models concepts, roles and individuals, and their relationships. The fundamental modeling concept of a DL is the axiom—a logical statement relating roles and/or concepts. This is a key difference from the frames paradigm where a frame specification declares and completely defines a class. == Nomenclature == === Terminology compared to FOL and OWL === The description logic community uses different terminology than the first-order logic (FOL) community for operationally equivalent notions; some examples are given below. The Web Ontology Language (OWL) uses again a different terminology, also given in the table below. === Naming convention === There are many varieties of description logics and there is an informal naming convention, roughly describing the operators allowed. The expressivity is encoded in the label for a logic starting with one of the following basic logics: Followed by any of the following extensions: ==== Exceptions ==== Some canonical DLs that do not exactly fit this convention are: ==== Examples ==== As an example, A L C {\displaystyle {\mathcal {ALC}}} is a centrally important description logic from which comparisons with other varieties can be made. A L C {\displaystyle {\mathcal {ALC}}} is simply A L {\displaystyle {\mathcal {AL}}} with complement of any concept allowed, not just atomic concepts. A L C {\displaystyle {\mathcal {ALC}}} is used instead of the equivalent A L U E {\displaystyle {\mathcal {ALUE}}} . A further example, the description logic S H I Q {\displaystyle {\mathcal {SHIQ}}} is the logic A L C {\displaystyle {\mathcal {ALC}}} plus extended cardinality restrictions, and transitive and inverse roles. The naming conventions aren't purely systematic so that the logic A L C O I N {\displaystyle {\mathcal {ALCOIN}}} might be referred to as A L C N I O {\displaystyle {\mathcal {ALCNIO}}} and other abbreviations are also made where possible. The Protégé ontology editor supports S H O I N ( D ) {\displaystyle {\mathcal {SHOIN}}^{\mathcal {(D)}}} . Three major biomedical informatics terminology bases, SNOMED CT, GALEN, and GO, are expressible in E L {\displaystyle {\mathcal {EL}}} (with additional role properties). OWL 2 provides the expressiveness of S R O I Q ( D ) {\displaystyle {\mathcal {SROIQ}}^{\mathcal {(D)}}} , OWL-DL is based on S H O I N ( D ) {\displaystyle {\mathcal {SHOIN}}^{\mathcal {(D)}}} , and for OWL-Lite it is S H I F ( D ) {\displaystyle {\mathcal {SHIF}}^{\mathcal {(D)}}} . == History == Description logic was given its current name in the 1980s. Previous to this it was called (chronologically): terminological systems, and concept languages. === Knowledge representation === Frames and semantic networks lack formal (logic-based) semantics. DL was first introduced into knowledge representation (KR) systems to overcome this deficiency. The first DL-based KR system was KL-ONE (by Ronald J. Brachman and Schmolze, 1985). During the '80s other DL-based systems using structural subsumption algorithms were developed including KRYPTON (1983), LOOM (1987), BACK (1988), K-REP (1991) and CLASSIC (1991). This approach featured DL with limited expressiveness but relatively efficient (polynomial time) reasoning. In the early '90s, the introduction of a new tableau based algorithm paradigm allowed efficient reasoning on more expressive DL. DL-based systems using these algorithms — such as KRIS (1991) — show acceptable reasoning performance on typical inference problems even though the worst case complexity is no longer polynomial. From the mid '90s, reasoners were created with good practical performance on very expressive DL with high worst case complexity. Examples from this period include FaCT, RACER (2001), CEL (2005), and KAON 2 (2005). DL reasoners, such as FaCT, FaCT++, RACER, DLP and Pellet, implement the method of analytic tableaux. KAON2 is implemented by algorithms which reduce a SHIQ(D) knowledge base to a disjunctive datalog program. === Semantic web === The DARPA Agent Markup Language (DAML) and Ontology Inference Layer (OIL) ontology languages for the Semantic Web can be viewed as syntactic variants of DL. In particular, the formal semantics and reasoning in OIL use the S H I Q {\displaystyle {\mathcal {SHIQ}}} DL. The DAML+OIL DL was developed as a submission to—and formed the starting point of—the World Wide Web Consortium (W3C) Web Ontology Working Group. In 2004, the Web Ontology Working Group completed its work by issuing the OWL recommendation. The design of OWL is based on the S H {\displaystyle {\mathcal {SH}}} family of DL with OWL DL and OWL Lite based on S H O I N ( D ) {\displaystyle {\mathcal {SHOIN}}^{\mathcal {(D)}}} and S H I F ( D ) {\displaystyle {\mathcal {SHIF}}^{\mathcal {(D)}}} respectively. The W3C OWL Working Group began work in 2007 on a refinement of - and extension to - OWL. In 2009, this was completed by the issuance of the OWL2 recommendation. OWL2 is based on the description logic S R O I Q ( D ) {\displaystyle {\mathcal {SROIQ}}^{\mathcal {(D)}}} . Practical experience demonstrated that OWL DL lacked several key features necessary to model complex domains. == Modeling == === TBox vs Abox === In DL, a distinction is drawn between the so-called TBox (terminological box) and the ABox (assertional box). In general, the TBox contains sentences describing concept hierarchies (i.e., relations between concepts) while the ABox contains ground sentences stating where in the hierarchy, individuals belong (i.e., relations between individuals and concepts). For example, the statement: belongs in the TBox, while the statement: belongs in the ABox. Note that the TBox/ABox distinction is not significant, in the same sense that the two "kinds" of sentences are not treated differently in first-order logic (which subsumes most DL). When translated into first-order logic, a subsumption axiom like (1) is simply a conditional restriction to unary predicates (concepts) with only variables appearing in it. Clearly, a sentence of this form is not privileged or special over sentences in which only constants ("grounded" values) appear like (2). === Motivation for having Tbox and Abox === So why was the distinction introduced? The primary reason is that the separation can be useful when describing and formulating decision-procedures for various DL. For example, a reasoner might process the TBox and ABox separately, in part because certain key inference problems are tied to one but not the other one ('classification' is related to the TBox, 'instance checking' to the ABox). Another example is that the complexity of the TBox can greatly affect the performance of a given decision-procedure for a certain DL, independently of the ABox. Thus, it is useful to have a way to talk about that specific part of the knowledge base. The secondary reason is that the distinction can make sense from the knowledge base modeler's perspective. It is plausible to distinguish between our conception of terms/concepts in the world (class axioms in the TBox) and particular manifestations of those terms/concepts (instance assertions in the ABox). In the above example: when the hierarchy within a company is the same in every branch but the assignment to employees is different in every department (because there are other people working there), it makes sense to reuse the TBox for different branches that do not use the same ABox. There are two features of description logic that are not shared by most other data description formalisms: DL does not make the unique name assumption (UNA) or the closed-world assumption (CWA). Not having UNA means that two concepts with different names may be allowed by some inference to be shown to be equivalent. Not having CWA, or rather having the open world assumption (OWA) means that

    Read more →
  • SqueezeNet

    SqueezeNet

    SqueezeNet is a deep neural network for image classification released in 2016. SqueezeNet was developed by researchers at DeepScale, University of California, Berkeley, and Stanford University. In designing SqueezeNet, the authors' goal was to create a smaller neural network with fewer parameters while achieving competitive accuracy. Their best-performing model achieved the same accuracy as AlexNet on ImageNet classification, but has a size 510x less than it. == Version history == SqueezeNet was originally released on February 22, 2016. This original version of SqueezeNet was implemented on top of the Caffe deep learning software framework. Shortly thereafter, the open-source research community ported SqueezeNet to a number of other deep learning frameworks. On February 26, 2016, Eddie Bell released a port of SqueezeNet for the Chainer deep learning framework. On March 2, 2016, Guo Haria released a port of SqueezeNet for the Apache MXNet framework. On June 3, 2016, Tammy Yang released a port of SqueezeNet for the Keras framework. In 2017, companies including Baidu, Xilinx, Imagination Technologies, and Synopsys demonstrated SqueezeNet running on low-power processing platforms such as smartphones, FPGAs, and custom processors. As of 2018, SqueezeNet ships "natively" as part of the source code of a number of deep learning frameworks such as PyTorch, Apache MXNet, and Apple CoreML. In addition, third party developers have created implementations of SqueezeNet that are compatible with frameworks such as TensorFlow. Below is a summary of frameworks that support SqueezeNet. == Relationship to other networks == === AlexNet === SqueezeNet was originally described in SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. AlexNet is a deep neural network that has 240 MB of parameters, and SqueezeNet has just 5 MB of parameters. This small model size can more easily fit into computer memory and can more easily be transmitted over a computer network. However, it's important to note that SqueezeNet is not a "squeezed version of AlexNet." Rather, SqueezeNet is an entirely different DNN architecture than AlexNet. What SqueezeNet and AlexNet have in common is that both of them achieve approximately the same level of accuracy when evaluated on the ImageNet image classification validation dataset. === Model compression === Model compression (e.g. quantization and pruning of model parameters) can be applied to a deep neural network after it has been trained. In the SqueezeNet paper, the authors demonstrated that a model compression technique called Deep Compression can be applied to SqueezeNet to further reduce the size of the parameter file from 5 MB to 500 KB. Deep Compression has also been applied to other DNNs, such as AlexNet and VGG. == Variants == Some of the members of the original SqueezeNet team have continued to develop resource-efficient deep neural networks for a variety of applications. A few of these works are noted in the following table. As with the original SqueezeNet model, the open-source research community has ported and adapted these newer "squeeze"-family models for compatibility with multiple deep learning frameworks. In addition, the open-source research community has extended SqueezeNet to other applications, including semantic segmentation of images and style transfer.

    Read more →
  • ARKA descriptors in QSAR

    ARKA descriptors in QSAR

    In computational chemistry and cheminformatics, ARKA descriptors in QSAR are a class of molecular descriptors used in quantitative structure–activity relationship (QSAR) modeling (or related approaches such as QSPR and QSTR), a computational method for predicting the biological activity or toxicity of chemical compounds based on their molecular structure. Molecular descriptors are numerical values that summarize information about a molecule's structure, topology, geometry, or physicochemical properties in a form suitable for machine learning or statistical modeling. ARKA (Arithmetic Residuals in K-Groups Analysis) descriptors differ from traditional descriptors by encoding atomic-level information through recursive autoregression techniques, which aim to capture subtle structural patterns and improve predictive accuracy. They are designed to be both interpretable and well-suited to modeling nonlinear relationships in QSAR studies. == Comparisons == While QSAR is essentially a similarity-based approach, the occurrence of activity/property cliffs may greatly reduce the predictive accuracy of the developed models. The novel Arithmetic Residuals in K-groups Analysis (ARKA) approach is a supervised dimensionality reduction technique developed by the DTC Laboratory, Jadavpur University that can easily identify activity cliffs in a data set. Activity cliffs are similar in their structures but differ considerably in their activity. The basic idea of the ARKA descriptors is to group the conventional QSAR descriptors based on a predefined criterion and then assign weightage to each descriptor in each group. ARKA descriptors have also been used to develop classification-based and regression-based QSAR models with acceptable quality statistics. The ARKA descriptors have been used for the identification of activity cliffs in QSAR studies and/or model development by multiple researchers. A tutorial presentation on the ARKA descriptors is available. Recently a multi-class ARKA framework has been proposed for improved q-RASAR model generation.

    Read more →
  • Scale-invariant feature operator

    Scale-invariant feature operator

    In the fields of computer vision and image analysis, the scale-invariant feature operator (or SFOP) is an algorithm to detect local features in images. The algorithm was published by Förstner et al. in 2009. == Algorithm == The scale-invariant feature operator (SFOP) is based on two theoretical concepts: spiral model feature operator Desired properties of keypoint detectors: Invariance and repeatability for object recognition Accuracy to support camera calibration Interpretability: Especially corners and circles, should be part of the detected keypoints (see figure). As few control parameters as possible with clear semantics Complementarity to known detectors scale-invariant corner/circle detector. == Theory == === Maximize the weight === Maximize the weight w {\displaystyle w} = 1/variance of a point p {\displaystyle p} w ( p , α , τ , σ ) = ( N ( σ ) − 2 ) λ m i n ( M ( p , α , τ , σ ) ) Ω ( p , α , τ , σ ) {\displaystyle w(\mathbf {p} ,\alpha ,\tau ,\sigma )=\left(N(\sigma )-2\right){\frac {\lambda _{min}(M(\mathbf {p} ,\alpha ,\tau ,\sigma ))}{\Omega (\mathbf {p} ,\alpha ,\tau ,\sigma )}}} comprising: 1. the image model Ω ( p , α , τ , σ ) = ∑ n = 1 N ( σ ) [ ( q n − p ) T R α ∇ T g ( q n ) ] 2 G σ ( q n − p ) = N ( σ ) t r { R α ∇ τ ∇ τ T R α T ∗ p p T G σ ( p ) } {\displaystyle {\begin{aligned}\Omega (\mathbf {p} ,\alpha ,\tau ,\sigma )&=\sum _{n=1}^{N(\sigma )}[(\mathbf {q} _{n}-\mathbf {p} )^{T}\mathbf {R} _{\alpha }\mathbf {\nabla } _{T}g(\mathbf {q} _{n})]^{2}G_{\sigma }(\mathbf {q} _{n}-\mathbf {p} )\\&=N(\sigma )\mathbf {tr} \left\{R_{\alpha }\mathbf {\nabla } _{\tau }\mathbf {\nabla } _{\tau }^{T}R_{\alpha }^{T}\mathbf {p} \mathbf {p} ^{T}G_{\sigma }(\mathbf {p} )\right\}\end{aligned}}} 2. the smaller eigenvalue of the structure tensor M ( p , α , τ , σ ) ⏟ structure tensor = G σ ( p ) ⏟ weighted summation ∗ ( R σ ∇ τ ∇ τ T R σ T ) ⏟ squared rotated gradients {\displaystyle \underbrace {M(\mathbf {p} ,\alpha ,\tau ,\sigma )} _{\text{structure tensor}}=\underbrace {G_{\sigma }(\mathbf {p} )} _{\text{weighted summation}}\underbrace {(R_{\sigma }\nabla _{\tau }\nabla _{\tau }^{T}R_{\sigma }^{T})} _{\text{squared rotated gradients}}} === Reduce the search space === Reduce the 5-dimensional search space by linking the differentiation scale τ {\displaystyle \tau } to the integration scale τ = σ / 3 {\displaystyle \tau =\sigma /3} solving for the optimal α ^ {\displaystyle {\hat {\alpha }}} using the model Ω ( α ) = a − b cos ⁡ ( 2 α − 2 α 0 ) {\displaystyle \Omega (\alpha )=a-b\cos(2\alpha -2\alpha _{0})} and determining the parameters from three angles, e. g. Ω ( 0 ∘ ) , Ω ( 60 ∘ ) , Ω ( 120 ∘ ) → a , b , α 0 → α ^ {\displaystyle \Omega (0^{\circ }),\Omega (60^{\circ }),\Omega (120^{\circ })\quad \rightarrow \quad a,b,\alpha _{0}\quad \rightarrow \quad {\hat {\alpha }}} pre-selection possible: α = 0 ∘ → junctions , α = 90 ∘ → circular features {\displaystyle \alpha =0^{\circ }\,\rightarrow \,{\mbox{junctions}},\quad \alpha =90^{\circ }\,\rightarrow \,{\mbox{circular features}}} === Filter potential keypoints === non-maxima suppression over scale, space and angle thresholding the isotropy λ 2 ( M ) {\displaystyle \lambda _{2(M)}} :eigenvalues characterize the shape of the keypoint, smallest eigenvalue has to be larger than threshold T λ {\displaystyle T_{\lambda }} derived from noise variance V ( n ) {\displaystyle V(n)} and significance level S {\displaystyle S} : T λ ( V ( n ) , τ , σ , S ) = N ( σ ) 16 π τ 4 V ( n ) χ 2 , S 2 {\displaystyle T_{\lambda }(V(n),\tau ,\sigma ,S)={\frac {N(\sigma )}{16\pi \tau ^{4}}}V(n)\chi _{2,S}^{2}} == Algorithm == == Results == === Interpretability of SFOP keypoints ===

    Read more →
  • Reflection lines

    Reflection lines

    Engineers use reflection lines to judge a surface's quality. Reflection lines reveal surface flaws, particularly discontinuities in normals indicating that the surface is not C 2 {\displaystyle C^{2}} . Reflection lines may be created and examined on physical surfaces or virtual surfaces with the help of computer graphics. For example, the shiny surface of an automobile body is illuminated with reflection lines by surrounding the car with parallel light sources. Virtually, a surface can be rendered with reflection lines by modulating the surfaces point-wise color according to a simple calculation involving the surface normal, viewing direction and a square wave environment map. == Mathematical definition == Consider a point p {\displaystyle p} on a surface M {\displaystyle M} with (normalized) normal n {\displaystyle n} . If an observer views this point from infinity at view direction v {\displaystyle v} then the reflected view direction r {\displaystyle r} is: r = v − 2 ( n ⋅ v ) n . {\displaystyle r=v-2(n\cdot v)n.} (The vector v {\displaystyle v} is decomposed into its normal part v n = ( n ⋅ v ) v {\displaystyle v_{n}=(n\cdot v)v} and tangential part v t = v − v n {\displaystyle v_{t}=v-v_{n}} . Upon reflection, the tangential part is kept and the normal part is negated.) For reflection lines we consider the surface M {\displaystyle M} surrounded by parallel lines with direction a {\displaystyle a} , representing infinite, non-dispersive light sources. For each point p {\displaystyle p} on M {\displaystyle M} we determine which line is seen from direction v {\displaystyle v} . The position on each line is of no interest. Define the vector r p {\displaystyle r_{p}} to be the reflection direction r {\displaystyle r} projected onto a plane P {\displaystyle P} that is orthogonal to a {\displaystyle a} : r p = r − ( r ⋅ a ) a {\displaystyle r_{p}=r-(r\cdot a)a} and similarly let v p {\displaystyle v_{p}} be the viewing direction projected onto P {\displaystyle P} : v p = v − ( v ⋅ a ) a {\displaystyle v_{p}=v-(v\cdot a)a} Finally, define v o {\displaystyle v_{o}} to be the direction lying in P {\displaystyle P} perpendicular to a {\displaystyle a} and v p {\displaystyle v_{p}} : v o = a × v p {\displaystyle v_{o}=a\times v_{p}} Using these vectors, the reflection line function θ ( p ) : M → ( − π , π ] {\displaystyle \theta (p):M\rightarrow (-\pi ,\pi ]} is a scalar function mapping points p {\displaystyle p} on the surface to angles between v p {\displaystyle v_{p}} and r p {\displaystyle r_{p}} : θ = arctan ⁡ ( r p ⋅ v o , r p ⋅ v p ) {\displaystyle \theta =\arctan {(r_{p}\cdot v_{o},r_{p}\cdot v_{p})}} where a r c t a n ( y , x ) {\displaystyle arctan(y,x)} is the atan2 function producing a number in the range ( − π , π ] {\displaystyle (-\pi ,\pi ]} . ( v p {\displaystyle v_{p}} and v o {\displaystyle v_{o}} can be viewed as a local coordinate system in P {\displaystyle P} with x {\displaystyle x} -axis in direction v p {\displaystyle v_{p}} and y {\displaystyle y} -axis in direction v o {\displaystyle v_{o}} .) Finally, to render the reflection lines positive values θ > 0 {\displaystyle \theta >0} are mapped to a light color and non-positive values to a dark color. == Highlight lines == Highlight lines are a view-independent alternative to reflection lines. Here the projected normal is directly compared against some arbitrary vector x {\displaystyle x} perpendicular to the light source: θ = arctan ⁡ ( n a ⋅ a ⊥ , n a ⋅ x ) {\displaystyle \theta =\arctan {(n_{a}\cdot a^{\perp },n_{a}\cdot x)}} where n a {\displaystyle n_{a}} is the surface normal projected on the light source plane P {\displaystyle P} : n a ^ / | n a ^ | , n a ^ = n − ( n ⋅ a ) a {\displaystyle {\hat {n_{a}}}/|{\hat {n_{a}}}|,{\hat {n_{a}}}=n-(n\cdot a)a} The relationship between reflection lines and highlight lines is likened to that between specular and diffuse shading.

    Read more →
  • Synaptic weight

    Synaptic weight

    In neuroscience and computer science, synaptic weight refers to the strength or amplitude of a connection between two nodes, corresponding in biology to the amount of influence the firing of one neuron has on another. The term is typically used in artificial and biological neural network research. == Computation == In a computational neural network, a vector or set of inputs x {\displaystyle {\textbf {x}}} and outputs y {\displaystyle {\textbf {y}}} , or pre- and post-synaptic neurons respectively, are interconnected with synaptic weights represented by the matrix w {\displaystyle w} , where for a linear neuron y j = ∑ i w i j x i or y = w x {\displaystyle y_{j}=\sum _{i}w_{ij}x_{i}~~{\textrm {or}}~~{\textbf {y}}=w{\textbf {x}}} . where the rows of the synaptic matrix represent the vector of synaptic weights for the output indexed by j {\displaystyle j} . The synaptic weight is changed by using a learning rule, the most basic of which is Hebb's rule, which is usually stated in biological terms as Neurons that fire together, wire together. Computationally, this means that if a large signal from one of the input neurons results in a large signal from one of the output neurons, then the synaptic weight between those two neurons will increase. The rule is unstable, however, and is typically modified using such variations as Oja's rule, radial basis functions or the backpropagation algorithm. == Biology == For biological networks, the effect of synaptic weights is not as simple as for linear neurons or Hebbian learning. However, biophysical models such as BCM theory have seen some success in mathematically describing these networks. In the mammalian central nervous system, signal transmission is carried out by interconnected networks of nerve cells, or neurons. For the basic pyramidal neuron, the input signal is carried by the axon, which releases neurotransmitter chemicals into the synapse which is picked up by the dendrites of the next neuron, which can then generate an action potential which is analogous to the output signal in the computational case. The synaptic weight in this process is determined by several variable factors: How well the input signal propagates through the axon (see myelination), The amount of neurotransmitter released into the synapse and the amount that can be absorbed in the following cell (determined by the number of AMPA and NMDA receptors on the cell membrane and the amount of intracellular calcium and other ions), The number of such connections made by the axon to the dendrites, How well the signal propagates and integrates in the postsynaptic cell. The changes in synaptic weight that occur is known as synaptic plasticity, and the process behind long-term changes (long-term potentiation and depression) is still poorly understood. Hebb's original learning rule was originally applied to biological systems, but has had to undergo many modifications as a number of theoretical and experimental problems came to light.

    Read more →
  • Gremlin (query language)

    Gremlin (query language)

    Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop of the Apache Software Foundation. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph processors. Gremlin's automata and functional language foundation enable Gremlin to naturally support imperative and declarative querying, host language agnosticism, user-defined domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, and hybrid depth- and breadth-first evaluation with Turing completeness. As an explanatory analogy, Apache TinkerPop and Gremlin are to graph databases what the JDBC and SQL are to relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the Java virtual machine is to general purpose computing. == History == 2009-10-30 the project is born, and immediately named "TinkerPop" 2009-12-25 v0.1 is the first release 2011-05-21 v1.0 is released 2012-05-24 v2.0 is released 2015-01-16 TinkerPop becomes an Apache Incubator project 2015-07-09 v3.0.0-incubating is released 2016-05-23 Apache TinkerPop becomes a top-level project 2016-07-18 v3.1.3 and v3.2.1 are first releases as Apache TinkerPop 2017-12-17 v3.3.1 is released 2018-05-08 v3.3.3 is released 2019-08-05 v3.4.3 is released 2020-02-20 v3.4.6 is released 2021-05-01 v3.5.0 is released 2022-04-04 v3.6.0 is released 2023-07-31 v3.7.0 is released 2025-11-12 v3.8.0 is released == Vendor integration == Gremlin is an Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP graph databases and OLAP graph processors. The table below outlines those graph vendors that support Gremlin. == Traversal examples == The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of the MovieLens dataset. The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below. === Simple traversals === For each vertex in the graph, emit its label, then group and count each distinct label. What year was the oldest movie made? What is Die Hard's average rating? === Projection traversals === For each category, emit a map of its name and the number of movies it represents. For each movie with at least 11 ratings, emit a map of its name and average rating. Sort the maps in decreasing order by their average rating. Emit the first 10 maps (i.e. top 10). === Declarative pattern matching traversals === Gremlin supports declarative graph pattern matching similar to SPARQL. For instance, the following query below uses Gremlin's match()-step. What 80's action movies do 30-something programmers like? Group count the movies by their name and sort the group count map in decreasing order by value. Clip the map to the top 10 and emit the map entries. === OLAP traversal === Which movies are most central in the implicit 5-stars graph? == Gremlin graph traversal machine == Gremlin is a virtual machine composed of an instruction set as well as an execution engine. An analogy is drawn between Gremlin and Java. === Gremlin steps (instruction set) === The following traversal is a Gremlin traversal in the Gremlin-Java8 dialect. The Gremlin language (i.e. the fluent-style of expressing a graph traversal) can be represented in any host language that supports function composition and function nesting. Due to this simple requirement, there exists various Gremlin dialects including Gremlin-Groovy, Gremlin-Scala, Gremlin-Clojure, etc. The above Gremlin-Java8 traversal is ultimately compiled down to a step sequence called a traversal. A string representation of the traversal above provided below. The steps are the primitives of the Gremlin graph traversal machine. They are the parameterized instructions that the machine ultimately executes. The Gremlin instruction set is approximately 30 steps. These steps are sufficient to provide general purpose computing and what is typically required to express the common motifs of any graph traversal query. Given that Gremlin is a language, an instruction set, and a virtual machine, it is possible to design another traversal language that compiles to the Gremlin traversal machine (analogous to how Scala compiles to the JVM). For instance, the popular SPARQL graph pattern match language can be compiled to execute on the Gremlin machine. The following SPARQL query would compile to In Gremlin-Java8, the SPARQL query above would be represented as below and compile to the identical Gremlin step sequence (i.e. traversal). === Gremlin Machine (virtual machine) === The Gremlin graph traversal machine can execute on a single machine or across a multi-machine compute cluster. Execution agnosticism allows Gremlin to run over both graph databases (OLTP) and graph processors (OLAP).

    Read more →
  • Charge based boundary element fast multipole method

    Charge based boundary element fast multipole method

    The charge-based formulation of the boundary element method (BEM) is a dimensionality reduction numerical technique that is used to model quasistatic electromagnetic phenomena in highly complex conducting media (targeting, e.g., the human brain) with a very large (up to approximately 1 billion) number of unknowns. The charge-based BEM solves an integral equation of the potential theory written in terms of the induced surface charge density. This formulation is naturally combined with fast multipole method (FMM) acceleration, and the entire method is known as charge-based BEM-FMM. The combination of BEM and FMM is a common technique in different areas of computational electromagnetics and, in the context of bioelectromagnetism, it provides improvements over the finite element method. == Historical development == Along with more common electric potential-based BEM, the quasistatic charge-based BEM, derived in terms of the single-layer (charge) density, for a single-compartment medium has been known in the potential theory since the beginning of the 20th century. For multi-compartment conducting media, the surface charge density formulation first appeared in discretized form (for faceted interfaces) in the 1964 paper by Gelernter and Swihart. A subsequent continuous form, including time-dependent and dielectric effects, appeared in the 1967 paper by Barnard, Duck, and Lynn. The charge-based BEM has also been formulated for conducting, dielectric, and magnetic media, and used in different applications. In 2009, Greengard et al. successfully applied the charge-based BEM with fast multipole acceleration to molecular electrostatics of dielectrics. A similar approach to realistic modeling of the human brain with multiple conducting compartments was first described by Makarov et al. in 2018. Along with this, the BEM-based multilevel fast multipole method has been widely used in radar and antenna studies at microwave frequencies as well as in acoustics. == Physical background - surface charges in biological media == The charge-based BEM is based on the concept of an impressed (or primary) electric field E i {\displaystyle \mathbf {E} ^{i}} and a secondary electric field E s {\displaystyle \mathbf {E} ^{s}} . The impressed field is usually known a priori or is trivial to find. For the human brain, the impressed electric field can be classified as one of the following: A conservative field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed density of EEG or MEG current sources in a homogeneous infinite medium with the conductivity σ {\displaystyle \sigma } at the source location; An instantaneous solenoidal field E i {\displaystyle \mathbf {E} ^{i}} of an induction coil obtained from Faraday's law of induction in a homogeneous infinite medium (air), when transcranial magnetic stimulation (TMS) problems are concerned; A surface field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed surface current density J i = σ E i {\displaystyle \mathbf {J} ^{i}=\sigma \mathbf {E} ^{i}} of current electrodes injecting electric current at a boundary of a compartment with conductivity σ {\displaystyle \sigma } when transcranial direct-current stimulation (tDCS) or deep brain stimulation (DBS) are concerned; A conservative field E i {\displaystyle \mathbf {E} ^{i}} of charges deposited on voltage electrodes for tDCS or DBS. This specific problem requires a coupled treatment since these charges will depend on the environment; In application to multiscale modeling, a field E i {\displaystyle \mathbf {E} ^{i}} obtained from any other macroscopic numerical solution in a small (mesoscale or microscale) spatial domain within the brain. For example, a constant field can be used. When the impressed field is "turned on", free charges located within a conducting volume D immediately begin to redistribute and accumulate at the boundaries (interfaces) of regions of different conductivity in D. A surface charge density ρ ( r ) {\displaystyle \rho (\mathbf {r} )} appears on the conductivity interfaces. This charge density induces a secondary conservative electric field E s {\displaystyle \mathbf {E} ^{s}} following Coulomb's law. One example is a human under a direct current powerline with the known field E i {\displaystyle \mathbf {E} ^{i}} directed down. The superior surface of the human's conducting body will be charged negatively while its inferior portion is charged positively. These surface charges create a secondary electric field that effectively cancels or blocks the primary field everywhere in the body so that no current will flow within the body under DC steady state conditions. Another example is a human head with electrodes attached. At any conductivity interface with a normal vector n {\displaystyle \mathbf {n} } pointing from an "inside" (-) compartment of conductivity σ − {\displaystyle \sigma ^{-}} to an "outside" (+) compartment of conductivity σ + {\displaystyle \sigma ^{+}} , Kirchhoff's current law requires continuity of the normal component of the electric current density. This leads to the interfacial boundary condition in the form for every facet at a triangulated interface. As long as σ ± {\displaystyle \sigma ^{\pm }} are different from each other, the two normal components of the electric field, E ± ⋅ n {\displaystyle \mathbf {E} ^{\pm }\cdot \mathbf {n} } , must also be different. Such a jump across the interface is only possible when a sheet of surface charge exists at that interface. Thus, if an electric current or voltage is applied, the surface charge density follows. The goal of the numerical analysis is to find the unknown surface charge distribution and thus the total electric field E = E i + E s {\displaystyle \mathbf {E} =\mathbf {E} ^{i}+\mathbf {E} ^{s}} (and the total electric potential if required) anywhere in space. == System of equations for surface charges == Below, a derivation is given based on Gauss's law and Coulomb's law. All conductivity interfaces, denoted by S, are discretized into planar triangular facets t m {\displaystyle t_{m}} with centers r m {\displaystyle \mathbf {r} _{m}} . Assume that an m-th facet with the normal vector n m {\displaystyle \mathbf {n} _{m}} and area A m {\displaystyle A_{m}} carries a uniform surface charge density ρ m {\displaystyle \rho _{m}} . If a volumetric tetrahedral mesh were present, the charged facets would belong to tetrahedra with different conductivity values. We first compute the electric field E m + {\displaystyle \mathbf {E} _{m}^{+}} at the point r m + δ n m {\displaystyle \mathbf {r} _{m}+\delta \mathbf {n} _{m}} , for δ → 0 + {\displaystyle \delta \rightarrow 0^{+}} i.e., just outside facet 𝑚 at its center. This field contains three contributions: The continuous impressed electric field E i {\displaystyle \mathbf {E} ^{i}} itself; An electric field of the m-th charged facet itself. Very close to the facet, it can be approximated as the electric field of an infinite sheet of uniform surface charge ρ m {\displaystyle \rho _{m}} . By Gauss's law, it is given by + ρ m / 2 ε 0 ⋅ n m {\displaystyle +\rho _{m}/2\varepsilon _{0}\cdot \mathbf {n} _{m}} where ε 0 {\displaystyle \varepsilon _{0}} is a background electrical permittivity; An electric field generated by all other facets t n {\displaystyle t_{n}} , which we approximate as point charges of charge A n ρ n {\displaystyle A_{n}\rho _{n}} at each center r n {\displaystyle \mathbf {r} _{n}} . A similar treatment holds for the electric field E m − {\displaystyle \mathbf {E} _{m}^{-}} just inside facet 𝑚, but the electric field of the flat sheet of charge changes its sign. Using Coulomb's law to calculate the contribution of facets different from t m {\displaystyle t_{m}} , we find From this equation, we see that the normal component of the electric field indeed undergoes a jump through the charged interface. This is equivalent to a jump relation of the potential theory. As a second step, the two expressions for E m ± {\displaystyle \mathbf {E} _{m}^{\pm }} are substituted into the interfacial boundary condition σ − E m − ⋅ n m = σ + E m + ⋅ n m {\displaystyle \sigma ^{-}\mathbf {E} _{m}^{-}\cdot \mathbf {n} _{m}=\sigma ^{+}\mathbf {E} _{m}^{+}\cdot \mathbf {n} _{m}} , applied to every facet 𝑚. This operation leads to a system of linear equations for unknown charge densities ρ m {\displaystyle \rho _{m}} which solves the problem: where K m = σ − − σ + σ − + σ + {\displaystyle K_{m}={\frac {\sigma ^{-}-\sigma ^{+}}{\sigma ^{-}+\sigma ^{+}}}} is the electric conductivity contrast at the m-th facet. The normalization constant ε 0 {\displaystyle \varepsilon _{0}} will cancel out after the solution is substituted in the expression for E s {\displaystyle \mathbf {E} ^{s}} and becomes redundant. == Application of fast multipole method == For modern characterizations of brain topologies with ever-increasing levels of complexity, the above system of equations for ρ m {\displaystyle \rho _{m}} is very large; it is t

    Read more →
  • Outline of machine learning

    Outline of machine learning

    The following outline is provided as an overview of, and topical guide to, machine learning: Machine learning (ML) is a subfield of artificial intelligence within computer science that evolved from the study of pattern recognition and computational learning theory. In 1959, Arthur Samuel defined machine learning as a "field of study that gives computers the ability to learn without being explicitly programmed". ML involves the study and construction of algorithms that can learn from and make predictions on data. These algorithms operate by building a model from a training set of example observations to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions. == How can machine learning be categorized? == An academic discipline A branch of science An applied science A subfield of computer science A branch of artificial intelligence A subfield of soft computing Application of statistics === Paradigms of machine learning === Supervised learning, where the model is trained on labeled data Unsupervised learning, where the model tries to identify patterns in unlabeled data Reinforcement learning, where the model learns to make decisions by receiving rewards or penalties. == Applications of machine learning == Applications of machine learning Bioinformatics Biomedical informatics Computer vision Customer relationship management Data mining Earth sciences Email filtering Inverted pendulum (balance and equilibrium system) Natural language processing Named Entity Recognition Automatic summarization Automatic taxonomy construction Dialog system Grammar checker Language recognition Handwriting recognition Optical character recognition Speech recognition Text to Speech Synthesis Speech Emotion Recognition Machine translation Question answering Speech synthesis Text mining Term frequency–inverse document frequency Text simplification Pattern recognition Facial recognition system Handwriting recognition Image recognition Optical character recognition Speech recognition Recommendation system Collaborative filtering Content-based filtering Hybrid recommender systems Search engine Search engine optimization Social engineering == Machine learning hardware == Graphics processing unit Tensor processing unit Vision processing unit == Machine learning tools == Comparison of machine learning software Comparison of deep learning software === Machine learning frameworks === ==== Proprietary machine learning frameworks ==== Amazon Machine Learning Microsoft Azure Machine Learning Studio DistBelief (replaced by TensorFlow) ==== Open source machine learning frameworks ==== Apache Singa Apache MXNet Caffe PyTorch mlpack TensorFlow Torch CNTK Accord.Net Jax MLJ.jl – A machine learning framework for Julia === Machine learning libraries === Deeplearning4j Theano scikit-learn Keras === Machine learning algorithms === == Machine learning methods == === Instance-based algorithm === K-nearest neighbors algorithm (KNN) Learning vector quantization (LVQ) Self-organizing map (SOM) === Regression analysis === Logistic regression Ordinary least squares regression (OLSR) Linear regression Stepwise regression Multivariate adaptive regression splines (MARS) Regularization algorithm Ridge regression Least Absolute Shrinkage and Selection Operator (LASSO) Elastic net Least-angle regression (LARS) Classifiers Probabilistic classifier Naive Bayes classifier Binary classifier Linear classifier Hierarchical classifier === Dimensionality reduction === Dimensionality reduction Canonical correlation analysis (CCA) Factor analysis Feature extraction Feature selection Independent component analysis (ICA) Linear discriminant analysis (LDA) Multidimensional scaling (MDS) Non-negative matrix factorization (NMF) Partial least squares regression (PLSR) Principal component analysis (PCA) Principal component regression (PCR) Projection pursuit Sammon mapping t-distributed stochastic neighbor embedding (t-SNE) === Ensemble learning === Ensemble learning AdaBoost Boosting Bootstrap aggregating (also "bagging" or "bootstrapping") Ensemble averaging Gradient boosted decision tree (GBDT) Gradient boosting Random Forest Stacked Generalization === Meta-learning === Meta-learning Inductive bias Metadata === Reinforcement learning === Reinforcement learning Q-learning State–action–reward–state–action (SARSA) Temporal difference learning (TD) Learning Automata === Supervised learning === Supervised learning Averaged one-dependence estimators (AODE) Artificial neural network Case-based reasoning Gaussian process regression Gene expression programming Group method of data handling (GMDH) Inductive logic programming Instance-based learning Lazy learning Learning Automata Learning Vector Quantization Logistic Model Tree Minimum message length (decision trees, decision graphs, etc.) Nearest Neighbor Algorithm Analogical modeling Probably approximately correct learning (PAC) learning Ripple down rules, a knowledge acquisition methodology Symbolic machine learning algorithms Support vector machines Random Forests Ensembles of classifiers Bootstrap aggregating (bagging) Boosting (meta-algorithm) Ordinal classification Conditional Random Field ANOVA Quadratic classifiers k-nearest neighbor Boosting SPRINT Bayesian networks Naive Bayes Hidden Markov models Hierarchical hidden Markov model ==== Bayesian ==== Bayesian statistics Bayesian knowledge base Naive Bayes Gaussian Naive Bayes Multinomial Naive Bayes Averaged One-Dependence Estimators (AODE) Bayesian Belief Network (BBN) Bayesian Network (BN) ==== Decision tree algorithms ==== Decision tree algorithm Decision tree Classification and regression tree (CART) Iterative Dichotomiser 3 (ID3) C4.5 algorithm C5.0 algorithm Chi-squared Automatic Interaction Detection (CHAID) Decision stump Conditional decision tree ID3 algorithm Random forest SLIQ ==== Linear classifier ==== Linear classifier Fisher's linear discriminant Linear regression Logistic regression Multinomial logistic regression Naive Bayes classifier Perceptron Support vector machine === Unsupervised learning === Unsupervised learning Expectation-maximization algorithm Vector Quantization Generative topographic map Information bottleneck method Association rule learning algorithms Apriori algorithm Eclat algorithm ==== Artificial neural networks ==== Artificial neural network Feedforward neural network Extreme learning machine Convolutional neural network Recurrent neural network Long short-term memory (LSTM) Logic learning machine Self-organizing map ==== Association rule learning ==== Association rule learning Apriori algorithm Eclat algorithm FP-growth algorithm ==== Hierarchical clustering ==== Hierarchical clustering Single-linkage clustering Conceptual clustering ==== Cluster analysis ==== Cluster analysis BIRCH DBSCAN Expectation–maximization (EM) Fuzzy clustering Hierarchical clustering k-means clustering k-medians Mean-shift OPTICS algorithm ==== Anomaly detection ==== Anomaly detection k-nearest neighbors algorithm (k-NN) Local outlier factor === Semi-supervised learning === Semi-supervised learning Active learning Generative models Low-density separation Graph-based methods Co-training Transduction === Deep learning === Deep learning Deep belief networks Deep Boltzmann machines Deep Convolutional neural networks Deep Recurrent neural networks Hierarchical temporal memory Generative Adversarial Network Style transfer Transformer Stacked Auto-Encoders === Other machine learning methods and problems === Anomaly detection Association rules Bias-variance dilemma Classification Multi-label classification Clustering Data Pre-processing Empirical risk minimization Feature engineering Feature learning Learning to rank Occam learning Online machine learning PAC learning Regression Reinforcement Learning Semi-supervised learning Statistical learning Structured prediction Graphical models Bayesian network Conditional random field (CRF) Hidden Markov model (HMM) Unsupervised learning VC theory == Machine learning research == List of artificial intelligence projects List of datasets for machine learning research == History of machine learning == History of machine learning Timeline of machine learning == Machine learning projects == Machine learning projects: DeepMind Google Brain OpenAI Meta AI Hugging Face == Machine learning organizations == === Machine learning conferences and workshops === Artificial Intelligence and Security (AISec) (co-located workshop with CCS) Conference on Neural Information Processing Systems (NIPS) ECML PKDD International Conference on Machine Learning (ICML) ML4ALL (Machine Learning For All) == Machine learning publications == === Books on machine learning === Mathematics for Machine Learning Hands-On Machine Learning Scikit-Learn, Keras, and TensorFlow The Hundred-Page Machine Learning Book === Machine learning journals === Machine Learning Journal of Machine Learning Research (JMLR) Neural Computation == Pe

    Read more →
  • Consensus clustering

    Consensus clustering

    Consensus clustering is a method of aggregating (potentially conflicting) results from multiple clustering algorithms. Also called cluster ensembles or aggregation of clustering (or partitions), it refers to the situation in which a number of different (input) clusterings have been obtained for a particular dataset and it is desired to find a single (consensus) clustering which is a better fit in some sense than the existing clusterings. Consensus clustering is thus the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. When cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete, even when the number of input clusterings is three. Consensus clustering for unsupervised learning is analogous to ensemble learning in supervised learning. == Issues with existing clustering techniques == Current clustering techniques do not address all the requirements adequately. Dealing with large number of dimensions and large number of data items can be problematic because of time complexity; Effectiveness of the method depends on the definition of "distance" (for distance-based clustering) If an obvious distance measure doesn't exist, we must "define" it, which is not always easy, especially in multidimensional spaces. The result of the clustering algorithm (that, in many cases, can be arbitrary itself) can be interpreted in different ways. == Justification for using consensus clustering == There are potential shortcomings for all existing clustering techniques. This may cause interpretation of results to become difficult, especially when there is no knowledge about the number of clusters. Clustering methods are also very sensitive to the initial clustering settings, which can cause non-significant data to be amplified in non-reiterative methods. An extremely important issue in cluster analysis is the validation of the clustering results, that is, how to gain confidence about the significance of the clusters provided by the clustering technique (cluster numbers and cluster assignments). Lacking an external objective criterion (the equivalent of a known class label in supervised analysis), this validation becomes somewhat elusive. Iterative descent clustering methods, such as the SOM and k-means clustering circumvent some of the shortcomings of hierarchical clustering by providing for univocally defined clusters and cluster boundaries. Consensus clustering provides a method that represents the consensus across multiple runs of a clustering algorithm, to determine the number of clusters in the data, and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. It can provide data for a visualization tool to inspect cluster number, membership, and boundaries. However, they lack the intuitive and visual appeal of hierarchical clustering dendrograms, and the number of clusters must be chosen a priori. == The Monti consensus clustering algorithm == The Monti consensus clustering algorithm is one of the most popular consensus clustering algorithms and is used to determine the number of clusters, K {\displaystyle K} . Given a dataset of N {\displaystyle N} total number of points to cluster, this algorithm works by resampling and clustering the data, for each K {\displaystyle K} and a N × N {\displaystyle N\times N} consensus matrix is calculated, where each element represents the fraction of times two samples clustered together. A perfectly stable matrix would consist entirely of zeros and ones, representing all sample pairs always clustering together or not together over all resampling iterations. The relative stability of the consensus matrices can be used to infer the optimal K {\displaystyle K} . More specifically, given a set of points to cluster, D = { e 1 , e 2 , . . . e N } {\displaystyle D=\{e_{1},e_{2},...e_{N}\}} , let D 1 , D 2 , . . . , D H {\displaystyle D^{1},D^{2},...,D^{H}} be the list of H {\displaystyle H} perturbed (resampled) datasets of the original dataset D {\displaystyle D} , and let M h {\displaystyle M^{h}} denote the N × N {\displaystyle N\times N} connectivity matrix resulting from applying a clustering algorithm to the dataset D h {\displaystyle D^{h}} . The entries of M h {\displaystyle M^{h}} are defined as follows: M h ( i , j ) = { 1 , if points i and j belong to the same cluster 0 , otherwise {\displaystyle M^{h}(i,j)={\begin{cases}1,&{\text{if}}{\text{ points i and j belong to the same cluster}}\\0,&{\text{otherwise}}\end{cases}}} Let I h {\displaystyle I^{h}} be the N × N {\displaystyle N\times N} identicator matrix where the ( i , j ) {\displaystyle (i,j)} -th entry is equal to 1 if points i {\displaystyle i} and j {\displaystyle j} are in the same perturbed dataset D h {\displaystyle D^{h}} , and 0 otherwise. The indicator matrix is used to keep track of which samples were selected during each resampling iteration for the normalisation step. The consensus matrix C {\displaystyle C} is defined as the normalised sum of all connectivity matrices of all the perturbed datasets and a different one is calculated for every K {\displaystyle K} . C ( i , j ) = ( ∑ h = 1 H M h ( i , j ) ∑ h = 1 H I h ( i , j ) ) {\displaystyle C(i,j)=\left({\frac {\textstyle \sum _{h=1}^{H}M^{h}(i,j)\displaystyle }{\sum _{h=1}^{H}I^{h}(i,j)}}\right)} That is the entry ( i , j ) {\displaystyle (i,j)} in the consensus matrix is the number of times points i {\displaystyle i} and j {\displaystyle j} were clustered together divided by the total number of times they were selected together. The matrix is symmetric and each element is defined within the range [ 0 , 1 ] {\displaystyle [0,1]} . A consensus matrix is calculated for each K {\displaystyle K} to be tested, and the stability of each matrix, that is how far the matrix is towards a matrix of perfect stability (just zeros and ones) is used to determine the optimal K {\displaystyle K} . One way of quantifying the stability of the K {\displaystyle K} th consensus matrix is examining its CDF curve (see below). == Over-interpretation potential of the Monti consensus clustering algorithm == Monti consensus clustering can be a powerful tool for identifying clusters, but it needs to be applied with caution as shown by Şenbabaoğlu et al. It has been shown that the Monti consensus clustering algorithm is able to claim apparent stability of chance partitioning of null datasets drawn from a unimodal distribution, and thus has the potential to lead to over-interpretation of cluster stability in a real study. If clusters are not well separated, consensus clustering could lead one to conclude apparent structure when there is none, or declare cluster stability when it is subtle. Identifying false positive clusters is a common problem throughout cluster research, and has been addressed by methods such as SigClust and the GAP-statistic. However, these methods rely on certain assumptions for the null model that may not always be appropriate. Şenbabaoğlu et al demonstrated the original delta K metric to decide K {\displaystyle K} in the Monti algorithm performed poorly, and proposed a new superior metric for measuring the stability of consensus matrices using their CDF curves. In the CDF curve of a consensus matrix, the lower left portion represents sample pairs rarely clustered together, the upper right portion represents those almost always clustered together, whereas the middle segment represent those with ambiguous assignments in different clustering runs. The proportion of ambiguous clustering (PAC) score measure quantifies this middle segment; and is defined as the fraction of sample pairs with consensus indices falling in the interval (u1, u2) ∈ [0, 1] where u1 is a value close to 0 and u2 is a value close to 1 (for instance u1=0.1 and u2=0.9). A low value of PAC indicates a flat middle segment, and a low rate of discordant assignments across permuted clustering runs. One can therefore infer the optimal number of clusters by the K {\displaystyle K} value having the lowest PAC. == Related work == Clustering ensemble (Strehl and Ghosh): They considered various formulations for the problem, most of which reduce the problem to a hyper-graph partitioning problem. In one of their formulations they considered the same graph as in the correlation clustering problem. The solution they proposed is to compute the best k-partition of the graph, which does not take into account the penalty for merging two nodes that are far apart. Clustering aggregation (Fern and Brodley): They applied the clustering aggregation idea to a collection of soft clusterings they obtained by random projections. They used an agglomerative algorithm

    Read more →
  • Radial basis function

    Radial basis function

    In mathematics a radial basis function (RBF) is a real-valued function φ {\textstyle \varphi } whose value depends only on the distance between the input and some fixed point, either the origin, so that φ ( x ) = φ ^ ( ‖ x ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} \right\|)} , or some other fixed point c {\textstyle \mathbf {c} } , called a center, so that φ ( x ) = φ ^ ( ‖ x − c ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} -\mathbf {c} \right\|)} . Any function φ {\textstyle \varphi } that satisfies the property φ ( x ) = φ ^ ( ‖ x ‖ ) {\textstyle \varphi (\mathbf {x} )={\hat {\varphi }}(\left\|\mathbf {x} \right\|)} is a radial function. The distance is usually Euclidean distance, although other metrics are sometimes used. They are often used as a collection { φ k } k {\displaystyle \{\varphi _{k}\}_{k}} which forms a basis for some function space of interest, hence the name. Sums of radial basis functions are typically used to approximate given functions. This approximation process can also be interpreted as a simple kind of neural network; this was the context in which they were originally applied to machine learning, in work by David Broomhead and David Lowe in 1988, which stemmed from Michael J. D. Powell's seminal research from 1977. RBFs are also used as a kernel in support vector classification. The technique has proven effective and flexible enough that radial basis functions are now applied in a variety of engineering applications. == Definition == A radial function is a function φ : [ 0 , ∞ ) → R {\textstyle \varphi :[0,\infty )\to \mathbb {R} } . When paired with a norm ‖ ⋅ ‖ : V → [ 0 , ∞ ) {\textstyle \|\cdot \|:V\to [0,\infty )} on a vector space, a function of the form φ c = φ ( ‖ x − c ‖ ) {\textstyle \varphi _{\mathbf {c} }=\varphi (\|\mathbf {x} -\mathbf {c} \|)} is said to be a radial kernel centered at c ∈ V {\textstyle \mathbf {c} \in V} . A radial function and the associated radial kernels are said to be radial basis functions if, for any finite set of nodes { x k } k = 1 n ⊆ V {\displaystyle \{\mathbf {x} _{k}\}_{k=1}^{n}\subseteq V} , all of the following conditions are true: === Examples === Commonly used types of radial basis functions include (writing r = ‖ x − x i ‖ {\textstyle r=\left\|\mathbf {x} -\mathbf {x} _{i}\right\|} and using ε {\textstyle \varepsilon } to indicate a shape parameter that can be used to scale the input of the radial kernel): == Approximation == Radial basis functions are typically used to build up function approximations of the form where the approximating function y ( x ) {\textstyle y(\mathbf {x} )} is represented as a sum of N {\displaystyle N} radial basis functions, each associated with a different center x i {\textstyle \mathbf {x} _{i}} , and weighted by an appropriate coefficient w i . {\textstyle w_{i}.} The weights w i {\textstyle w_{i}} can be estimated using the matrix methods of linear least squares, because the approximating function is linear in the weights w i {\textstyle w_{i}} . Approximation schemes of this kind have been particularly used in time series prediction and control of nonlinear systems exhibiting sufficiently simple chaotic behaviour and 3D reconstruction in computer graphics (for example, hierarchical RBF and Pose Space Deformation). == RBF Network == The sum can also be interpreted as a rather simple single-layer type of artificial neural network called a radial basis function network, with the radial basis functions taking on the role of the activation functions of the network. It can be shown that any continuous function on a compact interval can in principle be interpolated with arbitrary accuracy by a sum of this form, if a sufficiently large number N {\textstyle N} of radial basis functions is used. The approximant y ( x ) {\textstyle y(\mathbf {x} )} is differentiable with respect to the weights w i {\textstyle w_{i}} . The weights could thus be learned using any of the standard iterative methods for neural networks. Using radial basis functions in this manner yields a reasonable interpolation approach provided that the fitting set has been chosen such that it covers the entire range systematically (equidistant data points are ideal). However, without a polynomial term that is orthogonal to the radial basis functions, estimates outside the fitting set tend to perform poorly. == RBFs for PDEs == Radial basis functions are used to approximate functions and so can be used to discretize and numerically solve Partial Differential Equations (PDEs). This was first done in 1990 by E. J. Kansa who developed the first RBF based numerical method. It is called the Kansa method and was used to solve the elliptic Poisson equation and the linear advection-diffusion equation. The function values at points x {\displaystyle \mathbf {x} } in the domain are approximated by the linear combination of RBFs: The derivatives are approximated as such: where N {\displaystyle N} are the number of points in the discretized domain, d {\displaystyle d} the dimension of the domain and λ {\displaystyle \lambda } the scalar coefficients that are unchanged by the differential operator. Different numerical methods based on Radial Basis Functions were developed thereafter. Some methods are the RBF-FD method, the RBF-QR method and the RBF-PUM method.

    Read more →
  • Latent space

    Latent space

    A latent space, also known as a latent feature space or embedding space, is an embedding of a set of items within a manifold in which items resembling each other are positioned closer to one another. Position within the latent space can be viewed as being defined by a set of latent variables that emerge from the resemblances between the objects. In most cases, the dimensionality of the latent space is chosen to be lower than the dimensionality of the feature space from which the data points are drawn, making the construction of a latent space an example of dimensionality reduction, which can also be viewed as a form of data compression. Latent spaces are usually fit via machine learning, and they can then be used as feature spaces in machine learning models, including classifiers and other supervised predictors. The interpretation of latent spaces in machine learning models is an ongoing area of research, but achieving clear interpretations remains challenging. The black-box nature of these models often makes the latent space unintuitive, while its high-dimensional, complex, and nonlinear characteristics further complicate the task of understanding it. Analysis of the latent space geometry of diffusion models reveals a fractal structure of phase transitions in the latent space, characterized by abrupt changes in the Fisher information metric. Some visualization techniques have been developed to connect the latent space to the visual world, but there is often not a direct connection between the latent space interpretation and the model itself. Such techniques include t-distributed stochastic neighbor embedding (t-SNE), where the latent space is mapped to two dimensions for visualization. Latent space distances lack physical units, so the interpretation of these distances may depend on the application. == Embedding models == Several embedding models have been developed to perform this transformation to create latent space embeddings given a set of data items and a similarity function. These models learn the embeddings by leveraging statistical techniques and machine learning algorithms. Here are some commonly used embedding models: Word2Vec: Word2Vec is a popular embedding model used in natural language processing (NLP). It learns word embeddings by training a neural network on a large corpus of text. Word2Vec captures semantic and syntactic relationships between words, allowing for meaningful computations like word analogies. GloVe: GloVe (Global Vectors for Word Representation) is another widely used embedding model for NLP. It combines global statistical information from a corpus with local context information to learn word embeddings. GloVe embeddings are known for capturing both semantic and relational similarities between words. Siamese Networks: Siamese networks are a type of neural network architecture commonly used for similarity-based embedding. They consist of two identical subnetworks that process two input samples and produce their respective embeddings. Siamese networks are often used for tasks like image similarity, recommendation systems, and face recognition. Variational Autoencoders (VAEs): VAEs are generative models that simultaneously learn to encode and decode data. The latent space in VAEs acts as an embedding space. By training VAEs on high-dimensional data, such as images or audio, the model learns to encode the data into a compact latent representation. VAEs are known for their ability to generate new data samples from the learned latent space. == Multimodality == Multimodality refers to the integration and analysis of multiple modes or types of data within a single model or framework. Embedding multimodal data involves capturing relationships and interactions between different data types, such as images, text, audio, and structured data. Multimodal embedding models aim to learn joint representations that fuse information from multiple modalities, allowing for cross-modal analysis and tasks. These models enable applications like image captioning, visual question answering, and multimodal sentiment analysis. To embed multimodal data, specialized architectures such as deep multimodal networks or multimodal transformers are employed. These architectures combine different types of neural network modules to process and integrate information from various modalities. The resulting embeddings capture the complex relationships between different data types, facilitating multimodal analysis and understanding. == Applications == Embedding latent space and multimodal embedding models have found numerous applications across various domains: Information retrieval: Embedding techniques enable efficient similarity search and recommendation systems by representing data points in a compact space. Natural language processing: Word embeddings have revolutionized NLP tasks like sentiment analysis, machine translation, and document classification. Computer vision: Image and video embeddings enable tasks like object recognition, image retrieval, and video summarization. Recommendation systems: Embeddings help capture user preferences and item characteristics, enabling personalized recommendations. Healthcare: Embedding techniques have been applied to electronic health records, medical imaging, and genomic data for disease prediction, diagnosis, and treatment. Social systems: Embedding techniques can be used to learn latent representations of social systems such as internal migration systems, academic citation networks, and world trade networks.

    Read more →
  • GNU Binutils

    GNU Binutils

    The GNU Binary Utilities, or binutils, is a collection of programming tools maintained by the GNU Project for working with executable code including assembly, linking and many other development operations. The tools are originally from Cygnus Solutions. The tools are typically used along with other GNU tools such as GNU Compiler Collection, and the GNU Debugger. == Tools == The tools include: == elfutils == Ulrich Drepper wrote elfutils, to partially replace GNU Binutils, purely for Linux and with support only for ELF and DWARF. It distributes three libraries with it for programmatic access.

    Read more →
  • List of datasets for machine-learning research

    List of datasets for machine-learning research

    These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less intuitively, the availability of high-quality training datasets. High-quality labeled training datasets for supervised and semi-supervised machine-learning algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data. Although they do not need to be labeled, high-quality unlabeled datasets for unsupervised learning can also be difficult and costly to produce. Many organizations, including governments, publish and share their datasets, often using common metadata formats (such as Croissant). The datasets are classified, based on the licenses, into two groups: open data and non-open data. The datasets from various governmental-bodies are presented in List of open government data sites. The datasets are ported on open data portals. They are made available for searching, depositing and accessing through interfaces like Open API. The datasets are made available as various sorted types and subtypes. == List of sorting used for datasets == The data portal is classified based on its type of license. The open source license based data portals are known as open data portals which are used by many government organizations and academic institutions. == List of open data portals == == List of portals suitable for multiple types of applications == The data portal sometimes lists a wide variety of subtypes of datasets pertaining to many machine learning applications. == List of portals suitable for a specific subtype of applications == The data portals which are suitable for a specific subtype of machine learning application are listed in the subsequent sections. == Image data == == Text data == These datasets consist primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis. === Reviews === === News articles === === Messages === === Twitter and tweets === === Dialogues === === Legal === === Other text === == Sound data == These datasets consist of sounds and sound features used for tasks such as speech recognition and speech synthesis. === Speech === === Music === === Other sounds === == Signal data == Datasets containing electric signal information requiring some sort of signal processing for further analysis. === Electrical === === Motion-tracking === === Other signals === == Chemical data == Datasets from physical systems. === Chemical Reactions with transition states (TS) === === OpenReACT-CHON-EFH === OpenReACT-CHON-EFH (Open Reaction Dataset of Atomic ConfiguraTions comprising C, H, O and N with Energies, Forces and Hessians) is a 2025 open-access benchmark for machine-learning interatomic potentials. RTP set – 35,087 stationary-point geometries (reactant, transition state and product) drawn from 11,961 elementary reactions, each labeled with density-functional energies, atomic forces and full Hessian matrices at the ωB97X-D/6-31G(d) level. IRC set – 34,248 structures along 600 minimum-energy reaction paths, used to test extrapolation beyond trained stationary points. NMS set – 62,527 off-equilibrium geometries generated by normal-mode sampling to probe model robustness under thermal perturbations. The collection underpins the study Does Hessian Data Improve the Performance of Machine Learning Potentials? and was used to train and benchmark the machine-learning interatomic potentials reported therein. The dataset itself is distributed under a CC licence via Figshare. == Physical data == Datasets from physical systems. === High-energy physics === === Systems === === Astronomy === === Earth science === === Other physical === == Biological data == Datasets from biological systems. === Human === === Animal === === Fungi === === Plant === === Microbe === === Drug discovery === == Anomaly data == == Question answering data == This section includes datasets that deals with structured data. == Dialog or instruction prompted data == This section includes datasets that contains multi-turn text with at least two actors, a "user" and an "agent". The user makes requests for the agent, which performs the request. == Cybersecurity == == Climate and sustainability == == Code data == == Multivariate data == === Financial === === Weather === === Census === === Transit === === Internet === === Games === === Other multivariate === == Curated repositories of datasets == As datasets come in myriad formats and can sometimes be difficult to use, there has been considerable work put into curating and standardizing the format of datasets to make them easier to use for machine learning research. OpenML: Web platform with Python, R, Java, and other APIs for downloading hundreds of machine learning datasets, evaluating algorithms on datasets, and benchmarking algorithm performance against dozens of other algorithms. PMLB: A large, curated repository of benchmark datasets for evaluating supervised machine learning algorithms. Provides classification and regression datasets in a standardized format that are accessible through a Python API. Metatext NLP: https://metatext.io/datasets web repository maintained by community, containing nearly 1000 benchmark datasets, and counting. Provides many tasks from classification to QA, and various languages from English, Portuguese to Arabic. Appen: Off The Shelf and Open Source Datasets hosted and maintained by the company. These biological, image, physical, question answering, signal, sound, text, and video resources number over 250 and can be applied to over 25 different use cases.

    Read more →
  • Dimensionality reduction

    Dimensionality reduction

    Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics. Methods are commonly divided into linear and nonlinear approaches. Linear approaches can be further divided into feature selection and feature extraction. Dimensionality reduction can be used for noise reduction, data visualization, cluster analysis, or as an intermediate step to facilitate other analyses. == Feature selection == The process of feature selection aims to find a suitable subset of the input variables (features, or attributes) for the task at hand. The three strategies are: the filter strategy (e.g., information gain), the wrapper strategy (e.g., accuracy-guided search), and the embedded strategy (features are added or removed while building the model based on prediction errors). Data analysis such as regression or classification can be done in the reduced space more accurately than in the original space. == Feature projection == Feature projection (also called feature extraction) transforms the data from the high-dimensional space to a space of fewer dimensions. The data transformation may be linear, as in principal component analysis (PCA), but many nonlinear dimensionality reduction techniques also exist. For multidimensional data, tensor representation can be used in dimensionality reduction through multilinear subspace learning. === Principal component analysis (PCA) === The main linear technique for dimensionality reduction, principal component analysis, performs a linear mapping of the data to a lower-dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized. In practice, the covariance (and sometimes the correlation) matrix of the data is constructed and the eigenvectors on this matrix are computed. The eigenvectors that correspond to the largest eigenvalues (the principal components) can now be used to reconstruct a large fraction of the variance of the original data. Moreover, the first few eigenvectors can often be interpreted in terms of the large-scale physical behavior of the system, because they often contribute the vast majority of the system's energy, especially in low-dimensional systems. Still, this must be proved on a case-by-case basis as not all systems exhibit this behavior. The original space (with dimension of the number of points) has been reduced (with data loss, but hopefully retaining the most important variance) to the space spanned by a few eigenvectors. === Non-negative matrix factorization (NMF) === NMF decomposes a non-negative matrix to the product of two non-negative ones, which has been a promising tool in fields where only non-negative signals exist, such as astronomy. NMF is well known since the multiplicative update rule by Lee & Seung, which has been continuously developed: the inclusion of uncertainties, the consideration of missing data and parallel computation, sequential construction which leads to the stability and linearity of NMF, as well as other updates including handling missing data in digital image processing. With a stable component basis during construction, and a linear modeling process, sequential NMF is able to preserve the flux in direct imaging of circumstellar structures in astronomy, as one of the methods of detecting exoplanets, especially for the direct imaging of circumstellar discs. In comparison with PCA, NMF does not remove the mean of the matrices, which leads to physical non-negative fluxes; therefore NMF is able to preserve more information than PCA as demonstrated by Ren et al. === Kernel PCA === Principal component analysis can be employed in a nonlinear way by means of the kernel trick. The resulting technique is capable of constructing nonlinear mappings that maximize the variance in the data. The resulting technique is called kernel PCA. === Graph-based kernel PCA === Other prominent nonlinear techniques include manifold learning techniques such as Isomap, locally linear embedding (LLE), Hessian LLE, Laplacian eigenmaps, and methods based on tangent space analysis. These techniques assume that the high-dimensional input data lies near a low-dimensional manifold embedded in the ambient space, and construct a low-dimensional representation using a cost function that retains local properties of the data; they can be viewed as defining a graph-based kernel for Kernel PCA. More recently, techniques have been proposed that, instead of defining a fixed kernel, try to learn the kernel using semidefinite programming. The most prominent example of such a technique is maximum variance unfolding (MVU). The central idea of MVU is to exactly preserve all pairwise distances between nearest neighbors (in the inner product space) while maximizing the distances between points that are not nearest neighbors. An alternative approach to neighborhood preservation is through the minimization of a cost function that measures differences between distances in the input and output spaces. Important examples of such techniques include: classical multidimensional scaling, which is identical to PCA; Isomap, which uses geodesic distances in the data space; diffusion maps, which use diffusion distances in the data space; t-distributed stochastic neighbor embedding (t-SNE), which minimizes the divergence between distributions over pairs of points; and curvilinear component analysis. A different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special kind of feedforward neural networks with a bottleneck hidden layer. The training of deep encoders is typically performed using a greedy layer-wise pre-training (e.g., using a stack of restricted Boltzmann machines) that is followed by a finetuning stage based on backpropagation. === Linear discriminant analysis (LDA) === Linear discriminant analysis (LDA) is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. === Generalized discriminant analysis (GDA) === GDA deals with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the support-vector machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high-dimensional feature space. Similar to LDA, the objective of GDA is to find a projection for the features into a lower dimensional space by maximizing the ratio of between-class scatter to within-class scatter. === Autoencoder === Autoencoders can be used to learn nonlinear dimension reduction functions and codings together with an inverse function from the coding to the original representation. === t-SNE === T-distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear dimensionality reduction technique useful for the visualization of high-dimensional datasets. It is not recommended for use in analysis such as clustering or outlier detection since it does not necessarily preserve densities or distances well. === UMAP === Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction technique. Visually, it is similar to t-SNE, but it assumes that the data is uniformly distributed on a locally connected Riemannian manifold and that the Riemannian metric is locally constant or approximately locally constant. == Dimension reduction == For high-dimensional datasets, dimension reduction is usually performed prior to applying a k-nearest neighbors (k-NN) algorithm in order to mitigate the curse of dimensionality. Feature extraction and dimension reduction can be combined in one step, using principal component analysis (PCA), linear discriminant analysis (LDA), canonical correlation analysis (CCA), or non-negative matrix factorization (NMF) techniques to pre-process the data, followed by clustering via k-NN on feature vectors in a reduced-dimension space. In machine learning, this process is also called low-dimensional embedding. For high-dimensional datasets (e.g., when performing similarity search on live video streams, DNA data, or high-dimensional time series), running a fast approximate k-NN search using locality-sensitive hashing, random projection, "sketches", or other high-dimensional similarity search techniques from the VLDB conference toolbox may be the only fe

    Read more →