AI Analytics Hub

AI Analytics Hub — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • CPU modes

    CPU modes

    CPU modes (also called processor modes, CPU states, CPU privilege levels and other names) are operating modes for the central processing unit of most computer architectures that place restrictions on the type and scope of operations that can be performed by instructions being executed by the CPU. For example, this design allows an operating system to run with more privileges than application software by running the operating systems and applications in different modes. Ideally, only highly trusted kernel code is allowed to execute in the unrestricted mode; everything else (including non-supervisory portions of the operating system) runs in a restricted mode and must use a system call (via interrupt) to request the kernel perform on its behalf any operation that could damage or compromise the system, making it impossible for untrusted programs to alter or damage other programs (or the computing system itself). Device drivers are designed to be part of the kernel due to the need for frequent I/O access. Multiple modes can be implemented, e.g. allowing a hypervisor to run multiple operating system supervisors beneath it, which is the basic design of many virtual machine systems available today. == Mode types == The unrestricted mode is often called kernel mode, but many other designations exist (master mode, supervisor mode, privileged mode, etc.). Restricted modes are usually referred to as user modes, but are also known by many other names (slave mode, problem state, etc.). Hypervisor Hypervisor mode is used to support virtualization, allowing the simultaneous operation of multiple operating systems. Kernel and user In kernel mode, the CPU may perform any operation allowed by its architecture; any instruction may be executed, any I/O operation initiated, any area of memory accessed, and so on. In the other CPU modes, certain restrictions on CPU operations are enforced by the hardware. Typically, certain instructions are not permitted (especially those—including I/O operations—that could alter the global state of the machine), some memory areas cannot be accessed, etc. User-mode capabilities of the CPU are typically a subset of those available in kernel mode, but in some cases, such as hardware emulation of non-native architectures, they may be significantly different from those available in standard kernel mode. Some CPU architectures support more modes than those, often with a hierarchy of privileges. These architectures are often said to have ring-based security, wherein the hierarchy of privileges resembles a set of concentric rings, with the kernel mode in the center. Multics hardware was the first significant implementation of ring security, but many other hardware platforms have been designed along similar lines, including the Intel 80286 protected mode, and the IA-64 as well, though it is referred to by a different name in these cases. Mode protection may extend to resources beyond the CPU hardware itself. Hardware registers track the current operating mode of the CPU, but additional virtual-memory registers, page-table entries, and other data may track mode identifiers for other resources. For example, a CPU may be operating in Ring 0 as indicated by a status word in the CPU itself, but every access to memory may additionally be validated against a separate ring number for the virtual-memory segment targeted by the access, and/or against a ring number for the physical page (if any) being targeted. This has been demonstrated with the PSP handheld system. Hardware that meets the Popek and Goldberg virtualization requirements makes writing software to efficiently support a virtual machine much simpler. Such a system can run software that "believes" it is running in supervisor mode, but is actually running in user mode. == Architectures == Several computer systems introduced in the 1960s, such as the IBM System/360, DEC PDP-6/PDP-10, the GE-600/Honeywell 6000 series, and the Burroughs B5000 series and B6500 series, support two CPU modes; a mode that grants full privileges to code running in that mode, and a mode that prevents direct access to input/output devices and some other hardware facilities to code running in that mode. The first mode is referred to by names such as supervisor state (System/360), executive mode (PDP-6/PDP-10), master mode (GE-600 series), control mode (B5000 series), and control state (B6500 series). The second mode is referred to by names such as problem state (System/360), user mode (PDP-6/PDP-10), slave mode (GE-600 series), and normal state (B6500 series); there are multiple non-control modes in the B5000 series. === RISC-V === RISC-V has three main CPU modes: User Mode (U), Supervisor Mode (S), and Machine Mode (M). Virtualization is supported via an orthogonal CSR setting instead of a fourth mode.

    Read more →
  • Linear classifier

    Linear classifier

    In machine learning, a linear classifier makes a classification decision for each object based on a linear combination of its features. A simpler definition is to say that a linear classifier is one whose decision boundaries are linear. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use. == Definition == If the input feature vector to the classifier is a real vector x → {\displaystyle {\vec {x}}} , then the output score is y = f ( w → ⋅ x → ) = f ( ∑ j w j x j ) , {\displaystyle y=f({\vec {w}}\cdot {\vec {x}})=f\left(\sum _{j}w_{j}x_{j}\right),} where w → {\displaystyle {\vec {w}}} is a real vector of weights and f is a function that converts the dot product of the two vectors into the desired output. (In other words, w → {\displaystyle {\vec {w}}} is a one-form or linear functional mapping x → {\displaystyle {\vec {x}}} onto R.) The weight vector w → {\displaystyle {\vec {w}}} is learned from a set of labeled training samples. Often f is a threshold function, which maps all values of w → ⋅ x → {\displaystyle {\vec {w}}\cdot {\vec {x}}} above a certain threshold to the first class and all other values to the second class; e.g., f ( x ) = { 1 if w T ⋅ x > θ , 0 otherwise {\displaystyle f(\mathbf {x} )={\begin{cases}1&{\text{if }}\ \mathbf {w} ^{T}\cdot \mathbf {x} >\theta ,\\0&{\text{otherwise}}\end{cases}}} The superscript T indicates the transpose and θ {\displaystyle \theta } is a scalar threshold. A more complex f might give the probability that an item belongs to a certain class. For a two-class classification problem, one can visualize the operation of a linear classifier as splitting a high-dimensional input space with a hyperplane: all points on one side of the hyperplane are classified as "yes", while the others are classified as "no". A linear classifier is often used in situations where the speed of classification is an issue, since it is often the fastest classifier, especially when x → {\displaystyle {\vec {x}}} is sparse. Also, linear classifiers often work very well when the number of dimensions in x → {\displaystyle {\vec {x}}} is large, as in document classification, where each element in x → {\displaystyle {\vec {x}}} is typically the number of occurrences of a word in a document (see document-term matrix). In such cases, the classifier should be well-regularized. == Generative models vs. discriminative models == There are two broad classes of methods for determining the parameters of a linear classifier w → {\displaystyle {\vec {w}}} . They can be generative and discriminative models. Methods of the former model joint probability distribution, whereas methods of the latter model conditional density functions P ( c l a s s | x → ) {\displaystyle P({\rm {class}}|{\vec {x}})} . Examples of such algorithms include: Linear Discriminant Analysis (LDA)—assumes Gaussian conditional density models Naive Bayes classifier with multinomial or multivariate Bernoulli event models. The second set of methods includes discriminative models, which attempt to maximize the quality of the output on a training set. Additional terms in the training cost function can easily perform regularization of the final model. Examples of discriminative training of linear classifiers include: Logistic regression—maximum likelihood estimation of w → {\displaystyle {\vec {w}}} assuming that the observed training set was generated by a binomial model that depends on the output of the classifier. Perceptron—an algorithm that attempts to fix all errors encountered in the training set Fisher's Linear Discriminant Analysis—an algorithm (different than "LDA") that maximizes the ratio of between-class scatter to within-class scatter, without any other assumptions. It is in essence a method of dimensionality reduction for binary classification. Support vector machine—an algorithm that maximizes the margin between the decision hyperplane and the examples in the training set. Note: Despite its name, LDA does not belong to the class of discriminative models in this taxonomy. However, its name makes sense when we compare LDA to the other main linear dimensionality reduction algorithm: principal components analysis (PCA). LDA is a supervised learning algorithm that utilizes the labels of the data, while PCA is an unsupervised learning algorithm that ignores the labels. To summarize, the name is a historical artifact. Discriminative training often yields higher accuracy than modeling the conditional density functions. However, handling missing data is often easier with conditional density models. All of the linear classifier algorithms listed above can be converted into non-linear algorithms operating on a different input space φ ( x → ) {\displaystyle \varphi ({\vec {x}})} , using the kernel trick. === Discriminative training === Discriminative training of linear classifiers usually proceeds in a supervised way, by means of an optimization algorithm that is given a training set with desired outputs and a loss function that measures the discrepancy between the classifier's outputs and the desired outputs. Thus, the learning algorithm solves an optimization problem of the form arg ⁡ min w R ( w ) + C ∑ i = 1 N L ( y i , w T x i ) {\displaystyle {\underset {\mathbf {w} }{\arg \min }}\;R(\mathbf {w} )+C\sum _{i=1}^{N}L(y_{i},\mathbf {w} ^{\mathsf {T}}\mathbf {x} _{i})} where w is a vector of classifier parameters, L(yi, wTxi) is a loss function that measures the discrepancy between the classifier's prediction and the true output yi for the i'th training example, R(w) is a regularization function that prevents the parameters from getting too large (causing overfitting), and C is a scalar constant (set by the user of the learning algorithm) that controls the balance between the regularization and the loss function. Popular loss functions include the hinge loss (for linear SVMs) and the log loss (for linear logistic regression). If the regularization function R is convex, then the above is a convex problem. Many algorithms exist for solving such problems; popular ones for linear classification include (stochastic) gradient descent, L-BFGS, coordinate descent and Newton methods.

    Read more →
  • LPBoost

    LPBoost

    Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes, and thus also belongs to the class of margin classifier algorithms. Consider a classification function f : X → { − 1 , 1 } , {\displaystyle f:{\mathcal {X}}\to \{-1,1\},} which classifies samples from a space X {\displaystyle {\mathcal {X}}} into one of two classes, labelled 1 and -1, respectively. LPBoost is an algorithm for learning such a classification function, given a set of training examples with known class labels. LPBoost is a machine learning technique especially suited for joint classification and feature selection in structured domains. == LPBoost overview == As in all boosting classifiers, the final classification function is of the form f ( x ) = ∑ j = 1 J α j h j ( x ) , {\displaystyle f({\boldsymbol {x}})=\sum _{j=1}^{J}\alpha _{j}h_{j}({\boldsymbol {x}}),} where α j {\displaystyle \alpha _{j}} are non-negative weightings for weak classifiers h j : X → { − 1 , 1 } {\displaystyle h_{j}:{\mathcal {X}}\to \{-1,1\}} . Each individual weak classifier h j {\displaystyle h_{j}} may be just a little bit better than random, but the resulting linear combination of many weak classifiers can perform very well. LPBoost constructs f {\displaystyle f} by starting with an empty set of weak classifiers. Iteratively, a single weak classifier to add to the set of considered weak classifiers is selected, added and all the weights α {\displaystyle {\boldsymbol {\alpha }}} for the current set of weak classifiers are adjusted. This is repeated until no weak classifiers to add remain. The property that all classifier weights are adjusted in each iteration is known as totally-corrective property. Early boosting methods, such as AdaBoost do not have this property and converge slower. == Linear program == More generally, let H = { h ( ⋅ ; ω ) | ω ∈ Ω } {\displaystyle {\mathcal {H}}=\{h(\cdot ;\omega )|\omega \in \Omega \}} be the possibly infinite set of weak classifiers, also termed hypotheses. One way to write down the problem LPBoost solves is as a linear program with infinitely many variables. The primal linear program of LPBoost, optimizing over the non-negative weight vector α {\displaystyle {\boldsymbol {\alpha }}} , the non-negative vector ξ {\displaystyle {\boldsymbol {\xi }}} of slack variables and the margin ρ {\displaystyle \rho } is the following. min α , ξ , ρ − ρ + D ∑ n = 1 ℓ ξ n sb.t. ∑ ω ∈ Ω y n α ω h ( x n ; ω ) + ξ n ≥ ρ , n = 1 , … , ℓ , ∑ ω ∈ Ω α ω = 1 , ξ n ≥ 0 , n = 1 , … , ℓ , α ω ≥ 0 , ω ∈ Ω , ρ ∈ R . {\displaystyle {\begin{array}{cl}{\underset {{\boldsymbol {\alpha }},{\boldsymbol {\xi }},\rho }{\min }}&-\rho +D\sum _{n=1}^{\ell }\xi _{n}\\{\textrm {sb.t.}}&\sum _{\omega \in \Omega }y_{n}\alpha _{\omega }h({\boldsymbol {x}}_{n};\omega )+\xi _{n}\geq \rho ,\qquad n=1,\dots ,\ell ,\\&\sum _{\omega \in \Omega }\alpha _{\omega }=1,\\&\xi _{n}\geq 0,\qquad n=1,\dots ,\ell ,\\&\alpha _{\omega }\geq 0,\qquad \omega \in \Omega ,\\&\rho \in {\mathbb {R} }.\end{array}}} Note the effects of slack variables ξ ≥ 0 {\displaystyle {\boldsymbol {\xi }}\geq 0} : their one-norm is penalized in the objective function by a constant factor D {\displaystyle D} , which—if small enough—always leads to a primal feasible linear program. Here we adopted the notation of a parameter space Ω {\displaystyle \Omega } , such that for a choice ω ∈ Ω {\displaystyle \omega \in \Omega } the weak classifier h ( ⋅ ; ω ) : X → { − 1 , 1 } {\displaystyle h(\cdot ;\omega ):{\mathcal {X}}\to \{-1,1\}} is uniquely defined. When the above linear program was first written down in early publications about boosting methods it was disregarded as intractable due to the large number of variables α {\displaystyle {\boldsymbol {\alpha }}} . Only later it was discovered that such linear programs can indeed be solved efficiently using the classic technique of column generation. === Column generation for LPBoost === In a linear program a column corresponds to a primal variable. Column generation is a technique to solve large linear programs. It typically works in a restricted problem, dealing only with a subset of variables. By generating primal variables iteratively and on-demand, eventually the original unrestricted problem with all variables is recovered. By cleverly choosing the columns to generate the problem can be solved such that while still guaranteeing the obtained solution to be optimal for the original full problem, only a small fraction of columns has to be created. ==== LPBoost dual problem ==== Columns in the primal linear program corresponds to rows in the dual linear program. The equivalent dual linear program of LPBoost is the following linear program. max λ , γ γ sb.t. ∑ n = 1 ℓ y n h ( x n ; ω ) λ n + γ ≤ 0 , ω ∈ Ω , 0 ≤ λ n ≤ D , n = 1 , … , ℓ , ∑ n = 1 ℓ λ n = 1 , γ ∈ R . {\displaystyle {\begin{array}{cl}{\underset {{\boldsymbol {\lambda }},\gamma }{\max }}&\gamma \\{\textrm {sb.t.}}&\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}+\gamma \leq 0,\qquad \omega \in \Omega ,\\&0\leq \lambda _{n}\leq D,\qquad n=1,\dots ,\ell ,\\&\sum _{n=1}^{\ell }\lambda _{n}=1,\\&\gamma \in \mathbb {R} .\end{array}}} For linear programs the optimal value of the primal and dual problem are equal. For the above primal and dual problems, the optimal value is equal to the negative 'soft margin'. The soft margin is the size of the margin separating positive from negative training instances minus positive slack variables that carry penalties for margin-violating samples. Thus, the soft margin may be positive although not all samples are linearly separated by the classification function. The latter is called the 'hard margin' or 'realized margin'. ==== Convergence criterion ==== Consider a subset of the satisfied constraints in the dual problem. For any finite subset we can solve the linear program and thus satisfy all constraints. If we could prove that of all the constraints which we did not add to the dual problem no single constraint is violated, we would have proven that solving our restricted problem is equivalent to solving the original problem. More formally, let γ ∗ {\displaystyle \gamma ^{}} be the optimal objective function value for any restricted instance. Then, we can formulate a search problem for the 'most violated constraint' in the original problem space, namely finding ω ∗ ∈ Ω {\displaystyle \omega ^{}\in \Omega } as ω ∗ = argmax ω ∈ Ω ∑ n = 1 ℓ y n h ( x n ; ω ) λ n . {\displaystyle \omega ^{}={\underset {\omega \in \Omega }{\textrm {argmax}}}\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}.} That is, we search the space H {\displaystyle {\mathcal {H}}} for a single decision stump h ( ⋅ ; ω ∗ ) {\displaystyle h(\cdot ;\omega ^{})} maximizing the left hand side of the dual constraint. If the constraint cannot be violated by any choice of decision stump, none of the corresponding constraint can be active in the original problem and the restricted problem is equivalent. ==== Penalization constant ==== D {\displaystyle D} The positive value of penalization constant D {\displaystyle D} has to be found using model selection techniques. However, if we choose D = 1 ℓ ν {\displaystyle D={\frac {1}{\ell \nu }}} , where ℓ {\displaystyle \ell } is the number of training samples and 0 < ν < 1 {\displaystyle 0<\nu <1} , then the new parameter ν {\displaystyle \nu } has the following properties. ν {\displaystyle \nu } is an upper bound on the fraction of training errors; that is, if k {\displaystyle k} denotes the number of misclassified training samples, then k ℓ ≤ ν {\displaystyle {\frac {k}{\ell }}\leq \nu } . ν {\displaystyle \nu } is a lower bound on the fraction of training samples outside or on the margin. == Algorithm == Input: Training set X = { x 1 , … , x ℓ } {\displaystyle X=\{{\boldsymbol {x}}_{1},\dots ,{\boldsymbol {x}}_{\ell }\}} , x i ∈ X {\displaystyle {\boldsymbol {x}}_{i}\in {\mathcal {X}}} Training labels Y = { y 1 , … , y ℓ } {\displaystyle Y=\{y_{1},\dots ,y_{\ell }\}} , y i ∈ { − 1 , 1 } {\displaystyle y_{i}\in \{-1,1\}} Convergence threshold θ ≥ 0 {\displaystyle \theta \geq 0} Output: Classification function f : X → { − 1 , 1 } {\displaystyle f:{\mathcal {X}}\to \{-1,1\}} Initialization Weights, uniform λ n ← 1 ℓ , n = 1 , … , ℓ {\displaystyle \lambda _{n}\leftarrow {\frac {1}{\ell }},\quad n=1,\dots ,\ell } Edge γ ← 0 {\displaystyle \gamma \leftarrow 0} Hypothesis count J ← 1 {\displaystyle J\leftarrow 1} Iterate h ^ ← argmax ω ∈ Ω ∑ n = 1 ℓ y n h ( x n ; ω ) λ n {\displaystyle {\hat {h}}\leftarrow {\underset {\omega \in \Omega }{\textrm {argmax}}}\sum _{n=1}^{\ell }y_{n}h({\boldsymbol {x}}_{n};\omega )\lambda _{n}} if ∑ n = 1 ℓ y n h ^ ( x n ) λ n + γ ≤ θ {\displaystyle \sum _{n=1}^{\ell }y_{n}{\hat {h}}({\boldsymbol {x}}_{n})\lambda _{n}+\gamma \leq \theta } then break h J ← h ^ {\displaystyle h_{J}\leftarrow {\hat {h}}} J

    Read more →
  • Quadratic classifier

    Quadratic classifier

    In statistics, a quadratic classifier is a statistical classifier that uses a quadratic decision surface to separate measurements of two or more classes of objects or events. It is a more general version of the linear classifier. == The classification problem == Statistical classification considers a set of vectors of observations x of an object or event, each of which has a known type y. This set is referred to as the training set. The problem is then to determine, for a given new observation vector, what the best class should be. For a quadratic classifier, the correct solution is assumed to be quadratic in the measurements, so y will be decided based on x T A x + b T x + c {\displaystyle \mathbf {x^{T}Ax} +\mathbf {b^{T}x} +c} In the special case where each observation consists of two measurements, this means that the surfaces separating the classes will be conic sections (i.e., either a line, a circle or ellipse, a parabola or a hyperbola). In this sense, we can state that a quadratic model is a generalization of the linear model, and its use is justified by the desire to extend the classifier's ability to represent more complex separating surfaces. == Quadratic discriminant analysis == Quadratic discriminant analysis (QDA) is closely related to linear discriminant analysis (LDA), where it is assumed that the measurements from each class are normally distributed. Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical. When the normality assumption is true, the best possible test for the hypothesis that a given measurement is from a given class is the likelihood ratio test. Suppose there are only two groups, with means μ 0 , μ 1 {\displaystyle \mu _{0},\mu _{1}} and covariance matrices Σ 0 , Σ 1 {\displaystyle \Sigma _{0},\Sigma _{1}} corresponding to y = 0 {\displaystyle y=0} and y = 1 {\displaystyle y=1} respectively. Then the likelihood ratio is given by Likelihood ratio = | 2 π Σ 1 | − 1 exp ⁡ ( − 1 2 ( x − μ 1 ) T Σ 1 − 1 ( x − μ 1 ) ) | 2 π Σ 0 | − 1 exp ⁡ ( − 1 2 ( x − μ 0 ) T Σ 0 − 1 ( x − μ 0 ) ) < t {\displaystyle {\text{Likelihood ratio}}={\frac {{\sqrt {|2\pi \Sigma _{1}|}}^{-1}\exp \left(-{\frac {1}{2}}(\mathbf {x} -{\boldsymbol {\mu }}_{1})^{T}\Sigma _{1}^{-1}(\mathbf {x} -{\boldsymbol {\mu }}_{1})\right)}{{\sqrt {|2\pi \Sigma _{0}|}}^{-1}\exp \left(-{\frac {1}{2}}(\mathbf {x} -{\boldsymbol {\mu }}_{0})^{T}\Sigma _{0}^{-1}(\mathbf {x} -{\boldsymbol {\mu }}_{0})\right)}} Read more →

  • Secure element

    Secure element

    A secure element (SE) is a secure operating system (OS) in a tamper-resistant processor chip or secure component. It can protect assets (root of trust, sensitive data, keys, certificates, applications) against high-level software and hardware attacks. Applications that process this sensitive data on an SE are isolated and so operate within a controlled environment not affected by software (including possible malware) found elsewhere on the OS. The hardware and embedded software meet the requirements of the Security IC Platform Protection Profile [PP 0084] including resistance to physical tampering scenarios described within it. More than 96 billion secure elements were produced and shipped between 2010 and 2021. SEs exist in various form factors, as devices such as smart cards, UICCs, or smart microSD cards, or embedded, or integrated, as parts of larger devices. SEs are an evolution of the chips in earlier smart cards, which have been adapted to suit the needs of numerous use cases, such as smartphones, tablets, set-top boxes, wearables, connected cars, and other internet of things (IoT) devices. The technology is widely used by technology firms such as Oracle, Apple and Samsung. SEs provide secure isolation, storage and processing for applications (called applets) they host while being isolated from the external world (e.g. rich OS and application processor when embedded in a smartphone) and from other applications running on the SE. Java Card and MULTOS are the most deployed standardized multi-application operating systems currently used to develop applications running on SEs. Since 1999, GlobalPlatform has been the body responsible for standardizing secure element technologies to support a dynamic model of application management in a multi-actor model. GlobalPlatform also runs Functional and Security Certification programmes for secure elements, and hosts a list of Functional Certified and Security Certified products. GlobalPlatform technology is also embedded in other standards such as ETSI SCP (now SET) since release 7. A Common Criteria Secure Element Protection Profile has been released targeting EAL4+ level with ALC_DVS.2 and AVA_VAN.5 extension to standardize the security features of a secure element across markets.

    Read more →
  • Generalized blockmodeling of valued networks

    Generalized blockmodeling of valued networks

    Generalized blockmodeling of valued networks is an approach of the generalized blockmodeling, dealing with valued networks (e.g., non-binary). While the generalized blockmodeling signifies a "formal and integrated approach for the study of the underlying functional anatomies of virtually any set of relational data", it is in principle used for binary networks. This is evident from the set of ideal blocks, which are used to interpret blockmodels, that are binary, based on the characteristic link patterns. Because of this, such templates are "not readily comparable with valued empirical blocks". To allow generalized blockmodeling of valued directional (one-mode) networks (e.g. allowing the direct comparisons of empirical valued blocks with ideal binary blocks), a non–parametric approach is used. With this, "an optional parameter determines the prominence of valued ties as a minimum percentile deviation between observed and expected flows". Such two–sided application of parameter then introduces "the possibility of non–determined ties, i.e. valued relations that are deemed neither prominent (1) nor non–prominent (0)." Resulted occurrences of links then motivate the modification of the calculation of inconsistencies between empirical and ideal blocks. At the same time, such links also give a possibility to measure the interpretational certainty, which is specific to each ideal block. Such maximum two–sided deviation threshold, holding the aggregate uncertainty score at zero or near–zero levels, is then proposed as "a measure of interpretational certainty for valued blockmodels, in effect transforming the optional parameter into an outgoing state". Problem with blockmodeling is the standard set of ideal block, as they are all specified using binary link (tie) patters; this results in "a non–trivial exercise to match and count inconsistencies between such ideal binary ties and empirical valued ties". One approach to solve this is by using dichotomization to transform the network into a binary version. The other two approaches were first proposed by Aleš Žiberna in 2007 by introducing valued (generalized) blockmodeling and also homogeneity blockmodeling. The basic idea of the latter is "that the inconsistency of an empirical block with its ideal block can be measured by within block variability of appropriate values". The newly–formed ideal blocks, which are appropriate for blockmodeling of valued networks, are then presented together with the definitions of their block inconsistencies. Two other approaches were later suggested by Carl Nordlund in 2019: deviational approach and correlation-based generalized approach. Both Nordlund's approaches are based on the idea, that valued networks can be compared with the ideal block without values. With this approach, more information is retained for analysis, which also means, that there are fewer partitions having identical values of the criterion function. This means, that the generalized blockmodeling of valued networks measures the inconsistencies more precisely. Usually, only one optimal partition is found in this approach, especially when it is used by homogeneity blockmodeling. Contrary, while using binary blockmodeling on the same sample, usually more than one optimal partition had occurred on several occasions.

    Read more →
  • Charge based boundary element fast multipole method

    Charge based boundary element fast multipole method

    The charge-based formulation of the boundary element method (BEM) is a dimensionality reduction numerical technique that is used to model quasistatic electromagnetic phenomena in highly complex conducting media (targeting, e.g., the human brain) with a very large (up to approximately 1 billion) number of unknowns. The charge-based BEM solves an integral equation of the potential theory written in terms of the induced surface charge density. This formulation is naturally combined with fast multipole method (FMM) acceleration, and the entire method is known as charge-based BEM-FMM. The combination of BEM and FMM is a common technique in different areas of computational electromagnetics and, in the context of bioelectromagnetism, it provides improvements over the finite element method. == Historical development == Along with more common electric potential-based BEM, the quasistatic charge-based BEM, derived in terms of the single-layer (charge) density, for a single-compartment medium has been known in the potential theory since the beginning of the 20th century. For multi-compartment conducting media, the surface charge density formulation first appeared in discretized form (for faceted interfaces) in the 1964 paper by Gelernter and Swihart. A subsequent continuous form, including time-dependent and dielectric effects, appeared in the 1967 paper by Barnard, Duck, and Lynn. The charge-based BEM has also been formulated for conducting, dielectric, and magnetic media, and used in different applications. In 2009, Greengard et al. successfully applied the charge-based BEM with fast multipole acceleration to molecular electrostatics of dielectrics. A similar approach to realistic modeling of the human brain with multiple conducting compartments was first described by Makarov et al. in 2018. Along with this, the BEM-based multilevel fast multipole method has been widely used in radar and antenna studies at microwave frequencies as well as in acoustics. == Physical background - surface charges in biological media == The charge-based BEM is based on the concept of an impressed (or primary) electric field E i {\displaystyle \mathbf {E} ^{i}} and a secondary electric field E s {\displaystyle \mathbf {E} ^{s}} . The impressed field is usually known a priori or is trivial to find. For the human brain, the impressed electric field can be classified as one of the following: A conservative field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed density of EEG or MEG current sources in a homogeneous infinite medium with the conductivity σ {\displaystyle \sigma } at the source location; An instantaneous solenoidal field E i {\displaystyle \mathbf {E} ^{i}} of an induction coil obtained from Faraday's law of induction in a homogeneous infinite medium (air), when transcranial magnetic stimulation (TMS) problems are concerned; A surface field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed surface current density J i = σ E i {\displaystyle \mathbf {J} ^{i}=\sigma \mathbf {E} ^{i}} of current electrodes injecting electric current at a boundary of a compartment with conductivity σ {\displaystyle \sigma } when transcranial direct-current stimulation (tDCS) or deep brain stimulation (DBS) are concerned; A conservative field E i {\displaystyle \mathbf {E} ^{i}} of charges deposited on voltage electrodes for tDCS or DBS. This specific problem requires a coupled treatment since these charges will depend on the environment; In application to multiscale modeling, a field E i {\displaystyle \mathbf {E} ^{i}} obtained from any other macroscopic numerical solution in a small (mesoscale or microscale) spatial domain within the brain. For example, a constant field can be used. When the impressed field is "turned on", free charges located within a conducting volume D immediately begin to redistribute and accumulate at the boundaries (interfaces) of regions of different conductivity in D. A surface charge density ρ ( r ) {\displaystyle \rho (\mathbf {r} )} appears on the conductivity interfaces. This charge density induces a secondary conservative electric field E s {\displaystyle \mathbf {E} ^{s}} following Coulomb's law. One example is a human under a direct current powerline with the known field E i {\displaystyle \mathbf {E} ^{i}} directed down. The superior surface of the human's conducting body will be charged negatively while its inferior portion is charged positively. These surface charges create a secondary electric field that effectively cancels or blocks the primary field everywhere in the body so that no current will flow within the body under DC steady state conditions. Another example is a human head with electrodes attached. At any conductivity interface with a normal vector n {\displaystyle \mathbf {n} } pointing from an "inside" (-) compartment of conductivity σ − {\displaystyle \sigma ^{-}} to an "outside" (+) compartment of conductivity σ + {\displaystyle \sigma ^{+}} , Kirchhoff's current law requires continuity of the normal component of the electric current density. This leads to the interfacial boundary condition in the form for every facet at a triangulated interface. As long as σ ± {\displaystyle \sigma ^{\pm }} are different from each other, the two normal components of the electric field, E ± ⋅ n {\displaystyle \mathbf {E} ^{\pm }\cdot \mathbf {n} } , must also be different. Such a jump across the interface is only possible when a sheet of surface charge exists at that interface. Thus, if an electric current or voltage is applied, the surface charge density follows. The goal of the numerical analysis is to find the unknown surface charge distribution and thus the total electric field E = E i + E s {\displaystyle \mathbf {E} =\mathbf {E} ^{i}+\mathbf {E} ^{s}} (and the total electric potential if required) anywhere in space. == System of equations for surface charges == Below, a derivation is given based on Gauss's law and Coulomb's law. All conductivity interfaces, denoted by S, are discretized into planar triangular facets t m {\displaystyle t_{m}} with centers r m {\displaystyle \mathbf {r} _{m}} . Assume that an m-th facet with the normal vector n m {\displaystyle \mathbf {n} _{m}} and area A m {\displaystyle A_{m}} carries a uniform surface charge density ρ m {\displaystyle \rho _{m}} . If a volumetric tetrahedral mesh were present, the charged facets would belong to tetrahedra with different conductivity values. We first compute the electric field E m + {\displaystyle \mathbf {E} _{m}^{+}} at the point r m + δ n m {\displaystyle \mathbf {r} _{m}+\delta \mathbf {n} _{m}} , for δ → 0 + {\displaystyle \delta \rightarrow 0^{+}} i.e., just outside facet 𝑚 at its center. This field contains three contributions: The continuous impressed electric field E i {\displaystyle \mathbf {E} ^{i}} itself; An electric field of the m-th charged facet itself. Very close to the facet, it can be approximated as the electric field of an infinite sheet of uniform surface charge ρ m {\displaystyle \rho _{m}} . By Gauss's law, it is given by + ρ m / 2 ε 0 ⋅ n m {\displaystyle +\rho _{m}/2\varepsilon _{0}\cdot \mathbf {n} _{m}} where ε 0 {\displaystyle \varepsilon _{0}} is a background electrical permittivity; An electric field generated by all other facets t n {\displaystyle t_{n}} , which we approximate as point charges of charge A n ρ n {\displaystyle A_{n}\rho _{n}} at each center r n {\displaystyle \mathbf {r} _{n}} . A similar treatment holds for the electric field E m − {\displaystyle \mathbf {E} _{m}^{-}} just inside facet 𝑚, but the electric field of the flat sheet of charge changes its sign. Using Coulomb's law to calculate the contribution of facets different from t m {\displaystyle t_{m}} , we find From this equation, we see that the normal component of the electric field indeed undergoes a jump through the charged interface. This is equivalent to a jump relation of the potential theory. As a second step, the two expressions for E m ± {\displaystyle \mathbf {E} _{m}^{\pm }} are substituted into the interfacial boundary condition σ − E m − ⋅ n m = σ + E m + ⋅ n m {\displaystyle \sigma ^{-}\mathbf {E} _{m}^{-}\cdot \mathbf {n} _{m}=\sigma ^{+}\mathbf {E} _{m}^{+}\cdot \mathbf {n} _{m}} , applied to every facet 𝑚. This operation leads to a system of linear equations for unknown charge densities ρ m {\displaystyle \rho _{m}} which solves the problem: where K m = σ − − σ + σ − + σ + {\displaystyle K_{m}={\frac {\sigma ^{-}-\sigma ^{+}}{\sigma ^{-}+\sigma ^{+}}}} is the electric conductivity contrast at the m-th facet. The normalization constant ε 0 {\displaystyle \varepsilon _{0}} will cancel out after the solution is substituted in the expression for E s {\displaystyle \mathbf {E} ^{s}} and becomes redundant. == Application of fast multipole method == For modern characterizations of brain topologies with ever-increasing levels of complexity, the above system of equations for ρ m {\displaystyle \rho _{m}} is very large; it is t

    Read more →
  • Clustering illusion

    Clustering illusion

    The clustering illusion is the tendency to erroneously consider the inevitable "streaks" or "clusters" arising in small samples from random distributions to be non-random. The illusion is caused by a human tendency to underpredict the amount of variability likely to appear in a small sample of random or pseudorandom data. Thomas Gilovich, an early author on the subject, argued that the effect occurs for different types of random dispersions. Some might perceive patterns in stock market price fluctuations over time, or clusters in two-dimensional data such as the locations of impact of World War II V-1 flying bombs on maps of London. Although Londoners developed specific theories about the pattern of impacts within London, a statistical analysis by R. D. Clarke originally published in 1946 showed that the impacts of V-2 rockets on London were a close fit to a random distribution. == Similar biases == Using this cognitive bias in causal reasoning may result in the Texas sharpshooter fallacy, in which differences in data are ignored and similarities are overemphasized. More general forms of erroneous pattern recognition are pareidolia and apophenia. Related biases are the illusion of control which the clustering illusion could contribute to, and insensitivity to sample size in which people don't expect greater variation in smaller samples. A different cognitive bias involving misunderstanding of chance streams is the gambler's fallacy. == Possible causes == Daniel Kahneman and Amos Tversky explained this kind of misprediction as being caused by the representativeness heuristic (which itself they also first proposed).

    Read more →
  • Right to explanation

    Right to explanation

    In the regulation of algorithms, particularly artificial intelligence and its subfield of machine learning, a right to [an] explanation is a right to be given an explanation for an output of the algorithm. Such rights primarily refer to individual rights to be given an explanation for decisions that significantly affect an individual, particularly legally or financially. For example, a person who applies for a loan and is denied may ask for an explanation, which could be "Credit bureau X reports that you declared bankruptcy last year; this is the main factor in considering you too likely to default, and thus we will not give you the loan you applied for." Some such legal rights already exist, while the scope of a general "right to explanation" is a matter of ongoing debate. There have been arguments made that a "social right to explanation" is a crucial foundation for an information society, particularly as the institutions of that society will need to use digital technologies, artificial intelligence, machine learning. In other words, that the related automated decision making systems that use explainability would be more trustworthy and transparent. Without this right, which could be constituted both legally and through professional standards, the public will be left without much recourse to challenge the decisions of automated systems. == Examples == === Credit scoring in the United States === Under the Equal Credit Opportunity Act (Regulation B of the Code of Federal Regulations), Title 12, Chapter X, Part 1002, §1002.9, creditors are required to notify applicants who are denied credit with specific reasons for the detail. As detailed in §1002.9(b)(2): (2) Statement of specific reasons. The statement of reasons for adverse action required by paragraph (a)(2)(i) of this section must be specific and indicate the principal reason(s) for the adverse action. Statements that the adverse action was based on the creditor's internal standards or policies or that the applicant, joint applicant, or similar party failed to achieve a qualifying score on the creditor's credit scoring system are insufficient. The official interpretation of this section details what types of statements are acceptable. Creditors comply with this regulation by providing a list of reasons (generally at most 4, per interpretation of regulations), consisting of a numeric reason code (as identifier) and an associated explanation, identifying the main factors affecting a credit score. An example might be: 32: Balances on bankcard or revolving accounts too high compared to credit limits === European Union === The European Union General Data Protection Regulation (GDPR, enacted 2016, taking effect 2018) extends the automated decision-making rights in the 1995 Data Protection Directive to provide a legally disputed form of a right to an explanation, stated as such in Recital 71: "[the data subject should have] the right ... to obtain an explanation of the decision reached". In full: The data subject should have the right not to be subject to a decision, which may include a measure, evaluating personal aspects relating to him or her which is based solely on automated processing and which produces legal effects concerning him or her or similarly significantly affects him or her, such as automatic refusal of an online credit application or e-recruiting practices without any human intervention. ... In any case, such processing should be subject to suitable safeguards, which should include specific information to the data subject and the right to obtain human intervention, to express his or her point of view, to obtain an explanation of the decision reached after such assessment and to challenge the decision. However, the extent to which the regulations themselves provide a "right to explanation" is heavily debated. There are two main strands of criticism. There are significant legal issues with the right as found in Article 22 — as recitals are not binding, and the right to an explanation is not mentioned in the binding articles of the text, having been removed during the legislative process. In addition, there are significant restrictions on the types of automated decisions that are covered — which must be both "solely" based on automated processing, and have legal or similarly significant effects — which significantly limits the range of automated systems and decisions to which the right would apply. In particular, the right is unlikely to apply in many of the cases of algorithmic controversy that have been picked up in the media. The UK has also recently amended its implementation of Article 22. A second potential source of such a right has been pointed to in Article 15, the "right of access by the data subject". This restates a similar provision from the 1995 Data Protection Directive, allowing the data subject access to "meaningful information about the logic involved" in the same significant, solely automated decision-making, found in Article 22. Yet this too suffers from alleged challenges that relate to the timing of when this right can be drawn upon, as well as practical challenges that mean it may not be binding in many cases of public concern. Other EU legislative instruments contain explanation rights. The European Union's Artificial Intelligence Act provides in Article 86 a "[r]ight to explanation of individual decision-making" of certain high risk systems which produce significant, adverse effects to an individual's health, safety or fundamental rights. The right provides for "clear and meaningful explanations of the role of the AI system in the decision-making procedure and the main elements of the decision taken", although only applies to the extent other law does not provide such a right. The Digital Services Act in Article 27, and the Platform to Business Regulation in Article 5, both contain rights to have the main parameters of certain recommender systems to be made clear, although these provisions have been criticised as not matching the way that such systems work. The Platform Work Directive, which provides for regulation of automation in gig economy work as an extension of data protection law, further contains explanation provisions in Article 11, using the specific language of "explanation" in a binding article rather than a recital as is the case in the GDPR. Scholars note that remains uncertainty as to whether these provisions imply sufficiently tailored explanation in practice which will need to be resolved by courts. === France === In France the 2016 Loi pour une République numérique (Digital Republic Act or loi numérique) amends the country's administrative code to introduce a new provision for the explanation of decisions made by public sector bodies about individuals. It notes that where there is "a decision taken on the basis of an algorithmic treatment", the rules that define that treatment and its "principal characteristics" must be communicated to the citizen upon request, where there is not an exclusion (e.g. for national security or defence). These should include the following: the degree and the mode of contribution of the algorithmic processing to the decision- making; the data processed and its source; the treatment parameters, and where appropriate, their weighting, applied to the situation of the person concerned; the operations carried out by the treatment. Scholars have noted that this right, while limited to administrative decisions, goes beyond the GDPR right to explicitly apply to decision support rather than decisions "solely" based on automated processing, as well as provides a framework for explaining specific decisions. Indeed, the GDPR automated decision-making rights in the European Union, one of the places a "right to an explanation" has been sought within, find their origins in French law in the late 1970s. == Criticism == Some argue that a "right to explanation" is at best unnecessary, at worst harmful, and threatens to stifle innovation. Specific criticisms include: favoring human decisions over machine decisions, being redundant with existing laws, and focusing on process over outcome. Authors of study "Slave to the Algorithm? Why a 'Right to an Explanation' Is Probably Not the Remedy You Are Looking For" Lilian Edwards and Michael Veale argue that a right to explanation is not the solution to harms caused to stakeholders by algorithmic decisions. They also state that the right of explanation in the GDPR is narrowly defined, and is not compatible with how modern machine learning technologies are being developed. With these limitations, defining transparency within the context of algorithmic accountability remains a problem. For example, providing the source code of algorithms may not be sufficient and may create other problems in terms of privacy disclosures and the gaming of technical systems. To mitigate this issue, Edwards and Veale argue that an auditing system could be more effective, to allow auditors to loo

    Read more →
  • Plate notation

    Plate notation

    In Bayesian inference, plate notation is a method of representing variables that repeat in a graphical model. Instead of drawing each repeated variable individually, a plate or rectangle is used to group variables into a subgraph that repeat together, and a number is drawn on the plate to represent the number of repetitions of the subgraph in the plate. The assumptions are that the subgraph is duplicated that many times, the variables in the subgraph are indexed by the repetition number, and any links that cross a plate boundary are replicated once for each subgraph repetition. == Example == In this example, we consider Latent Dirichlet allocation, a Bayesian network that models how documents in a corpus are topically related. There are two variables not in any plate; α is the parameter of the uniform Dirichlet prior on the per-document topic distributions, and β is the parameter of the uniform Dirichlet prior on the per-topic word distribution. The outermost plate represents all the variables related to a specific document, including θ i {\displaystyle \theta _{i}} , the topic distribution for document i. The M in the corner of the plate indicates that the variables inside are repeated M times, once for each document. The inner plate represents the variables associated with each of the N i {\displaystyle N_{i}} words in document i: z i j {\displaystyle z_{ij}} is the topic distribution for the jth word in document i, and w i j {\displaystyle w_{ij}} is the actual word used. The N in the corner represents the repetition of the variables in the inner plate N j {\displaystyle N_{j}} times, once for each word in document i. The circle representing the individual words is shaded, indicating that each w i j {\displaystyle w_{ij}} is observable, and the other circles are empty, indicating that the other variables are latent variables. The directed edges between variables indicate dependencies between the variables: for example, each w i j {\displaystyle w_{ij}} depends on z i j {\displaystyle z_{ij}} and β. == Extensions == A number of extensions have been created by various authors to express more information than simply the conditional relationships. However, few of these have become standard. Perhaps the most commonly used extension is to use rectangles in place of circles to indicate non-random variables—either parameters to be computed, hyperparameters given a fixed value (or computed through empirical Bayes), or variables whose values are computed deterministically from a random variable. The diagram on the right shows a few more non-standard conventions used in some articles in Wikipedia (e.g. variational Bayes): Variables that are actually random vectors are indicated by putting the vector size in brackets in the middle of the node. Variables that are actually random matrices are similarly indicated by putting the matrix size in brackets in the middle of the node, with commas separating row size from column size. Categorical variables are indicated by placing their size (without a bracket) in the middle of the node. Categorical variables that act as "switches", and which pick one or more other random variables to condition on from a large set of such variables (e.g. mixture components), are indicated with a special type of arrow containing a squiggly line and ending in a T junction. Boldface is consistently used for vector or matrix nodes (but not categorical nodes). == Software implementation == Plate notation has been implemented in various TeX/LaTeX drawing packages, but also as part of graphical user interfaces to Bayesian statistics programs such as BUGS and BayesiaLab and PyMC.

    Read more →
  • Probit model

    Probit model

    In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model. A probit model is a popular specification for a binary response model. As such it treats the same set of problems as does logistic regression using similar techniques. When viewed in the generalized linear model framework, the probit model employs a probit link function. It is most often estimated using the maximum likelihood procedure, such an estimation being called a probit regression. == Conceptual framework == Suppose a response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0. For example, Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the model takes the form P ( Y = 1 ∣ X ) = Φ ( X T β ) , {\displaystyle P(Y=1\mid X)=\Phi (X^{\operatorname {T} }\beta ),} where P is the probability and Φ {\displaystyle \Phi } is the cumulative distribution function (CDF) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood. It is possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable Y ∗ = X T β + ε , {\displaystyle Y^{\ast }=X^{T}\beta +\varepsilon ,} where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive: Y = { 1 Y ∗ > 0 0 otherwise } = { 1 X T β + ε > 0 0 otherwise } {\displaystyle Y=\left.{\begin{cases}1&Y^{}>0\\0&{\text{otherwise}}\end{cases}}\right\}=\left.{\begin{cases}1&X^{\operatorname {T} }\beta +\varepsilon >0\\0&{\text{otherwise}}\end{cases}}\right\}} The use of the standard normal distribution causes no loss of generality compared with the use of a normal distribution with an arbitrary mean and standard deviation, because adding a fixed amount to the mean can be compensated by subtracting the same amount from the intercept, and multiplying the standard deviation by a fixed amount can be compensated by multiplying the weights by the same amount. To see that the two models are equivalent, note that P ( Y = 1 ∣ X ) = P ( Y ∗ > 0 ) = P ( X T β + ε > 0 ) = P ( ε > − X T β ) = P ( ε < X T β ) by symmetry of the normal distribution = Φ ( X T β ) {\displaystyle {\begin{aligned}P(Y=1\mid X)&=P(Y^{\ast }>0)\\&=P(X^{\operatorname {T} }\beta +\varepsilon >0)\\&=P(\varepsilon >-X^{\operatorname {T} }\beta )\\&=P(\varepsilon 0 {\displaystyle t,\lim _{n\rightarrow \infty }n_{t}/n=c_{t}>0} . Denote p ^ t = r t / n t {\displaystyle {\hat {p}}_{t}=r_{t}/n_{t}} σ ^ t 2 = 1 n t p ^ t ( 1 − p ^ t ) φ 2 ( Φ − 1 ( p ^ t ) ) {\displaystyle {\hat {\sigma }}_{t}^{2}={\frac {1}{n_{t}}}{\frac {{\hat {p}}_{t}(1-{\hat {p}}_{t})}{\varphi ^{2}{\big (}\Phi ^{-1}({\hat {p}}_{t}){\big )}}}} Then Berkson's minimum chi-square estimator is a generalized least squares estimator in a regression of Φ − 1 ( p ^ t ) {\displaystyle \Phi ^{-1}({\hat {p}}_{t})} on x ( t ) {\displaystyle x_{(t)}} with weights σ ^ t − 2 {\displaystyle {\hat {\sigma }}_{t}^{-2}} : β ^ = ( ∑ t = 1 T σ ^ t − 2 x ( t ) x ( t ) T ) − 1 ∑ t = 1 T σ ^ t − 2 x ( t ) Φ − 1 ( p ^ t ) {\displaystyle {\hat {\beta }}={\Bigg (}\sum _{t=1}^{T}{\hat {\sigma }}_{t}^{-2}x_{(t)}x_{(t)}^{\operatorname {T} }{\Bigg )}^{-1}\sum _{t=1}^{T}{\hat {\sigma }}_{t}^{-2}x_{(t)}\Phi ^{-1}({\hat {p}}_{t})} It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient. Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts r t {\displaystyle r_{t}} , n t {\disp

    Read more →
  • Jpred

    Jpred

    Jpred v.4 is the latest version of the JPred Protein Secondary Structure Prediction Server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction, that has existed since 1998 in different versions. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 134 000 jobs per month and has carried out over 2 million predictions in total for users in 179 countries. == JPred 2 == The static HTML pages of JPred 2 are still available for reference. == JPred 3 == The JPred v3 followed on from previous versions of JPred developed and maintained by James Cuff and Jonathan Barber (see JPred References). This release added new functionality and fixed many bugs. The highlights are: New, friendlier user interface Retrained and optimised version of Jnet (v2) - mean secondary structure prediction accuracy of >81% Batch submission of jobs Better error checking of input sequences/alignments Predictions now (optionally) returned via e-mail Users may provide their own query names for each submission JPred now makes a prediction even when there are no PSI-BLAST hits to the query PS/PDF output now incorporates all the predictions == JPred 4 == The current version of JPred (v4) has the following improvements and updates incorporated: Retrained on the latest UniRef90 and SCOPe/ASTRAL version of Jnet (v2.3.1) - mean secondary structure prediction accuracy of >82%. Upgraded the Web Server to the latest technologies (Bootstrap framework, JavaScript) and updating the web pages – improving the design and usability through implementing responsive technologies. Added RESTful API and mass-submission and results retrieval scripts - resulting in peak throughput above 20,000 predictions per day. Added prediction jobs monitoring tools. Upgraded the results reporting – both, on the web-site, and through the optional email summary reports: improved batch submission, added results summary preview through Jalview results visualization summary in SVG and adding full multiple sequence alignments into the reports. Improved help-pages, incorporating tool-tips, and adding one-page step-by-step tutorials. Sequence residues are categorised or assigned to one of the secondary structure elements, such as alpha-helix, beta-sheet and coiled-coil. Jnet uses two neural networks for its prediction. The first network is fed with a window of 17 residues over each amino acid in the alignment plus a conservation number. It uses a hidden layer of nine nodes and has three output nodes, one for each secondary structure element. The second network is fed with a window of 19 residues (the result of first network) plus the conservation number. It has a hidden layer with nine nodes and has three output nodes.

    Read more →
  • AI washing

    AI washing

    AI washing is a deceptive marketing tactic that consists of promoting a product or a service by overstating the role of artificial intelligence (AI) and the integration of it. Companies often involve in the practice to mislead customers to boost their offerings, and to secure funding from investors. The practice raises concerns regarding transparency, and legal issues. == Definition == AI washing is a deceptive marketing practice. It involves promoting a product or a service by overstating the role of artificial intelligence (AI) and its integration in the design and manufacture of the same. The practice raises concerns regarding transparency, compliance with security regulations, and consumer trust in the AI industry potentially hampering legitimate advancements in AI. The term was first defined by the AI Now Institute, a research institute based at New York University in 2019. The term is derived from greenwashing, another deceptive marketing technique that misrepresents a product's environmental impact in a similar manner. AI washing might involve a company claiming to have used AI in the development or enhancement of its products or services without its actual involvement, or using buzzwords such as "smart" or "AI-powered" without the product actually offering it or making use of it. A company may overstate the usage of AI or misuse the term, which is also construed as AI washing. In 2026, The Washington Post defined AI washing as "a trend for bosses to blame layoffs on the productive capabilities of AI and its ability to replace workers, even when job cuts may have little to do with the technology". == Usage and effects == AI washing can lead to deception of customers and misleading of investors. It is also an illegal and unethical practice that lacks transparency regarding disclosing the details of a product or a service. Companies get involved in such a practice often in response to competition who might have used AI in their offerings. It might also be used as a ploy to secure funding and investment, assuming that it will attract them towards it. AI washing has been compared to dot-com bubble, when businesses appended "dot-com" to the end of the business name to boost their valuation. In September 2023, Coca-Cola released a new product called Coca-Cola Y3000, and the company stated that the Y3000 flavor had been "co-created with human and artificial intelligence". The company was accused of AI washing due to no proof of AI involvement in the creation of the product, and critics believed that AI was used as a way to grab consumer attention more than it was used in the actual product creation. In 2026, mass tech layoffs were attributed to AI washing from AI innovation instead of balance sheet restructuring. == Mitigation == Companies are expected to be transparent and clearer in communicating the usage of AI in their products or services. Consumers can mitigate the same by requesting for hard evidence from the companies regarding the usage of AI tools. Customers should evaluate the product or service as a whole rather than being swayed by the usage of AI. Informed decision making and purchasing can keep them from falling for such marketing gimmicks. The United States Securities and Exchange Commission (SEC) imposes penalties for companies indulging in such practices. In March 2024, the SEC imposed the first civil penalties on two companies for misleading statements about their use of AI, and in July 2024, it charged a corporate executive from a supposed AI hiring startup with fraud for the usage of buzzwords related to AI.

    Read more →
  • Fitness function

    Fitness function

    A fitness function is a particular type of objective or cost function that is used to summarize, as a single figure of merit, how close a given candidate solution is to achieving the set aims. It is an important component of evolutionary algorithms (EA), such as genetic programming, evolution strategies or genetic algorithms. An EA is a metaheuristic that reproduces the basic principles of biological evolution as a computer algorithm in order to solve challenging optimization or planning tasks, at least approximately. For this purpose, many candidate solutions are generated, which are evaluated using a fitness function in order to guide the evolutionary development towards the desired goal. Similar quality functions are also used in other metaheuristics, such as ant colony optimization or particle swarm optimization. In the field of EAs, each candidate solution, also called an individual, is commonly represented as a string of numbers (referred to as a chromosome). After each round of testing or simulation the idea is to delete the n worst individuals, and to breed n new ones from the best solutions. Each individual must therefore to be assigned a quality number indicating how close it has come to the overall specification, and this is generated by applying the fitness function to the test or simulation results obtained from that candidate solution. Two main classes of fitness functions exist: one where the fitness function does not change, as in optimizing a fixed function or testing with a fixed set of test cases; and one where the fitness function is mutable, as in niche differentiation or co-evolving the set of test cases. Another way of looking at fitness functions is in terms of a fitness landscape, which shows the fitness for each possible chromosome. In the following, it is assumed that the fitness is determined based on an evaluation that remains unchanged during an optimization run. A fitness function does not necessarily have to be able to calculate an absolute value, as it is sometimes sufficient to compare candidates in order to select the better one. A relative indication of fitness (candidate a is better than b) is sufficient in some cases, such as tournament selection or Pareto optimization. == Requirements of evaluation and fitness function == The quality of the evaluation and calculation of a fitness function is fundamental to the success of an EA optimisation. It implements Darwin's principle of "survival of the fittest". Without fitness-based selection mechanisms for mate selection and offspring acceptance, EA search would be blind and hardly distinguishable from the Monte Carlo method. When setting up a fitness function, one must always be aware that it is about more than just describing the desired target state. Rather, the evolutionary search on the way to the optimum should also be supported as much as possible (see also section on auxiliary objectives), if and insofar as this is not already done by the fitness function alone. If the fitness function is designed badly, the algorithm will either converge on an inappropriate solution, or will have difficulty converging at all. Definition of the fitness function is not straightforward in many cases and often is performed iteratively if the fittest solutions produced by an EA is not what is desired. Interactive genetic algorithms address this difficulty by outsourcing evaluation to external agents which are normally humans. == Computational efficiency == The fitness function should not only closely align with the designer's goal, but also be computationally efficient. Execution speed is crucial, as a typical evolutionary algorithm must be iterated many times in order to produce a usable result for a non-trivial problem. Fitness approximation may be appropriate, especially in the following cases: Fitness computation time of a single solution is extremely high Precise model for fitness computation is missing The fitness function is uncertain or noisy. Alternatively or also in addition to the fitness approximation, the fitness calculations can also be distributed to a parallel computer in order to reduce the execution times. Depending on the population model of the EA used, both the EA itself and the fitness calculations of all offspring of one generation can be executed in parallel. == Multi-objective optimization == Practical applications usually aim at optimizing multiple and at least partially conflicting objectives. Two fundamentally different approaches are often used for this purpose, Pareto optimization and optimization based on fitness calculated using the weighted sum. === Weighted sum and penalty functions === When optimizing with the weighted sum, the single values of the O {\displaystyle O} objectives are first normalized so that they can be compared. This can be done with the help of costs or by specifying target values and determining the current value as the degree of fulfillment. Costs or degrees of fulfillment can then be compared with each other and, if required, can also be mapped to a uniform fitness scale. Without loss of generality, fitness is assumed to represent a value to be maximized. Each objective o i {\displaystyle o_{i}} is assigned a weight w i {\displaystyle w_{i}} in the form of a percentage value so that the overall raw fitness f r a w {\displaystyle f_{raw}} can be calculated as a weighted sum: f r a w = ∑ i = 1 O o i ⋅ w i w i t h ∑ i = 1 O w i = 1 {\displaystyle f_{raw}=\sum _{i=1}^{O}{o_{i}\cdot w_{i}}\quad {\mathsf {with}}\quad \sum _{i=1}^{O}{w_{i}}=1} A violation of R {\displaystyle R} restrictions r j {\displaystyle r_{j}} can be included in the fitness determined in this way in the form of penalty functions. For this purpose, a function p f j ( r j ) {\displaystyle pf_{j}(r_{j})} can be defined for each restriction which returns a value between 0 {\displaystyle 0} and 1 {\displaystyle 1} depending on the degree of violation, with the result being 1 {\displaystyle 1} if there is no violation. The previously determined raw fitness is multiplied by the penalty function(s) and the result is then the final fitness f f i n a l {\displaystyle f_{final}} : f f i n a l = f r a w ⋅ ∏ j = 1 R p f j ( r j ) = ∑ i = 1 O ( o i ⋅ w i ) ⋅ ∏ j = 1 R p f j ( r j ) {\displaystyle f_{final}=f_{raw}\cdot \prod _{j=1}^{R}{pf_{j}(r_{j})}=\sum _{i=1}^{O}{(o_{i}\cdot w_{i})}\cdot \prod _{j=1}^{R}{pf_{j}(r_{j})}} This approach is simple and has the advantage of being able to combine any number of objectives and restrictions. The disadvantage is that different objectives can compensate each other and that the weights have to be defined before the optimization. This means that the compromise lines must be defined before optimization, which is why optimization with the weighted sum is also referred to as the a priori method. In addition, certain solutions may not be obtained, see the section on the comparison of both types of optimization. === Pareto optimization === A solution is called Pareto-optimal if the improvement of one objective is only possible with a deterioration of at least one other objective. The set of all Pareto-optimal solutions, also called Pareto set, represents the set of all optimal compromises between the objectives. The figure below on the right shows an example of the Pareto set of two objectives f 1 {\displaystyle f_{1}} and f 2 {\displaystyle f_{2}} to be maximized. The elements of the set form the Pareto front (green line). From this set, a human decision maker must subsequently select the desired compromise solution. Constraints are included in Pareto optimization in that solutions without constraint violations are per se better than those with violations. If two solutions to be compared each have constraint violations, the respective extent of the violations decides. It was recognized early on that EAs with their simultaneously considered solution set are well suited to finding solutions in one run that cover the Pareto front sufficiently well. They are therefore well suited as a-posteriori methods for multi-objective optimization, in which the final decision is made by a human decision maker after optimization and determination of the Pareto front. Besides the SPEA2, the NSGA-II and NSGA-III have established themselves as standard methods. The advantage of Pareto optimization is that, in contrast to the weighted sum, it provides all alternatives that are equivalent in terms of the objectives as an overall solution. The disadvantage is that a visualization of the alternatives becomes problematic or even impossible from four objectives on. Furthermore, the effort increases exponentially with the number of objectives. If there are more than three or four objectives, some have to be combined using the weighted sum or other aggregation methods. === Comparison of both types of assessment === With the help of the weighted sum, the total Pareto front can be obtained by a suitable choice of weights, provided that it is convex

    Read more →
  • Non-negative matrix factorization

    Non-negative matrix factorization

    Non-negative matrix factorization (NMF or NNMF), also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically. NMF finds applications in such fields as astronomy, computer vision, document clustering, missing data imputation, chemometrics, audio signal processing, recommender systems, and bioinformatics. == History == In chemometrics non-negative matrix factorization has a long history under the name "self modeling curve resolution". In this framework the vectors in the right matrix are continuous curves rather than discrete vectors. Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the 1990s under the name positive matrix factorization. It became more widely known as non-negative matrix factorization after Lee and Seung investigated the properties of the algorithm and published some simple and useful algorithms for two types of factorizations. == Background == Let matrix V be the product of the matrices W and H, V = W H . {\displaystyle \mathbf {V} =\mathbf {W} \mathbf {H} \,.} Matrix multiplication can be implemented as computing the column vectors of V as linear combinations of the column vectors in W using coefficients supplied by columns of H. That is, each column of V can be computed as follows: v i = W h i , {\displaystyle \mathbf {v} _{i}=\mathbf {W} \mathbf {h} _{i}\,,} where vi is the i-th column vector of the product matrix V and hi is the i-th column vector of the matrix H. When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix. For example, if V is an m × n matrix, W is an m × p matrix, and H is a p × n matrix then p can be significantly less than both m and n. Here is an example based on a text-mining application: Let the input matrix (the matrix to be factored) be V with 10000 rows and 500 columns where words are in rows and documents are in columns. That is, we have 500 documents indexed by 10000 words. It follows that a column vector v in V represents a document. Assume we ask the algorithm to find 10 features in order to generate a features matrix W with 10000 rows and 10 columns and a coefficients matrix H with 10 rows and 500 columns. The product of W and H is a matrix with 10000 rows and 500 columns, the same shape as the input matrix V and, if the factorization worked, it is a reasonable approximation to the input matrix V. From the treatment of matrix multiplication above it follows that each column in the product matrix WH is a linear combination of the 10 column vectors in the features matrix W with coefficients supplied by the coefficients matrix H. This last point is the basis of NMF because we can consider each original document in our example as being built from a small set of hidden features. NMF generates these features. It is useful to think of each feature (column vector) in the features matrix W as a document archetype comprising a set of words where each word's cell value defines the word's rank in the feature: The higher a word's cell value the higher the word's rank in the feature. A column in the coefficients matrix H represents an original document with a cell value defining the document's rank for a feature. We can now reconstruct a document (column vector) from our input matrix by a linear combination of our features (column vectors in W) where each feature is weighted by the feature's cell value from the document's column in H. == Clustering property == NMF has an inherent clustering property, i.e., it automatically clusters the columns of input data V = ( v 1 , … , v n ) {\displaystyle \mathbf {V} =(v_{1},\dots ,v_{n})} . More specifically, the approximation of V {\displaystyle \mathbf {V} } by V ≃ W H {\displaystyle \mathbf {V} \simeq \mathbf {W} \mathbf {H} } is achieved by finding W {\displaystyle W} and H {\displaystyle H} that minimize the error function (using the Frobenius norm) ‖ V − W H ‖ F , {\displaystyle \left\|V-WH\right\|_{F},} subject to W ≥ 0 , H ≥ 0. {\displaystyle W\geq 0,H\geq 0.} , If we furthermore impose an orthogonality constraint on H {\displaystyle \mathbf {H} } , i.e. H H T = I {\displaystyle \mathbf {H} \mathbf {H} ^{T}=I} , then the above minimization is mathematically equivalent to the minimization of K-means clustering. Furthermore, the computed H {\displaystyle H} gives the cluster membership, i.e., if H k j > H i j {\displaystyle \mathbf {H} _{kj}>\mathbf {H} _{ij}} for all i ≠ k, this suggests that the input data v j {\displaystyle v_{j}} belongs to k {\displaystyle k} -th cluster. The computed W {\displaystyle W} gives the cluster centroids, i.e., the k {\displaystyle k} -th column gives the cluster centroid of k {\displaystyle k} -th cluster. This centroid's representation can be significantly enhanced by convex NMF. When the orthogonality constraint H H T = I {\displaystyle \mathbf {H} \mathbf {H} ^{T}=I} is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too. When the error function to be used is Kullback–Leibler divergence, NMF is identical to the probabilistic latent semantic analysis (PLSA), a popular document clustering method. == Types == === Approximate non-negative matrix factorization === Usually the number of columns of W and the number of rows of H in NMF are selected so the product WH will become an approximation to V. The full decomposition of V then amounts to the two non-negative matrices W and H as well as a residual U, such that: V = WH + U. The elements of the residual matrix can either be negative or positive. When W and H are smaller than V they become easier to store and manipulate. Another reason for factorizing V into smaller matrices W and H, is that if one's goal is to approximately represent the elements of V by significantly less data, then one has to infer some latent structure in the data. === Convex non-negative matrix factorization === In standard NMF, matrix factor W ∈ R+m × k, i.e., W can be anything in that space. Convex NMF restricts the columns of W to convex combinations of the input data vectors ( v 1 , … , v n ) {\displaystyle (v_{1},\dots ,v_{n})} . This greatly improves the quality of data representation of W. Furthermore, the resulting matrix factor H becomes more sparse and orthogonal. === Nonnegative rank factorization === In case the nonnegative rank of V is equal to its actual rank, V = WH is called a nonnegative rank factorization (NRF). The problem of finding the NRF of V, if it exists, is known to be NP-hard. === Different cost functions and regularizations === There are different types of non-negative matrix factorizations. The different types arise from using different cost functions for measuring the divergence between V and WH and possibly by regularization of the W and/or H matrices. Two simple divergence functions studied by Lee and Seung are the squared error (or Frobenius norm) and an extension of the Kullback–Leibler divergence to positive matrices (the original Kullback–Leibler divergence is defined on probability distributions). Each divergence leads to a different NMF algorithm, usually minimizing the divergence using iterative update rules. The factorization problem in the squared error version of NMF may be stated as: Given a matrix V {\displaystyle \mathbf {V} } find nonnegative matrices W and H that minimize the function F ( W , H ) = ‖ V − W H ‖ F 2 {\displaystyle F(\mathbf {W} ,\mathbf {H} )=\left\|\mathbf {V} -\mathbf {WH} \right\|_{F}^{2}} Another type of NMF for images is based on the total variation norm. When L1 regularization (akin to Lasso) is added to NMF with the mean squared error cost function, the resulting problem may be called non-negative sparse coding due to the similarity to the sparse coding problem, although it may also still be referred to as NMF. === Online NMF === Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in streaming fashion. One such use is for collaborative filtering in recommendation systems, where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for o

    Read more →