HYPO CBR

HYPO CBR

HYPO is a computer program, an expert system, that models reasoning with cases and hypotheticals in the legal domain. It is the first of its kind and the most sophisticated of the case-based legal reasoners, which was designed by Kevin Ashley for his Ph.D dissertation in 1987 at the University of Massachusetts Amherst under the supervision of Edwina Rissland. HYPO's design represents a hybrid generalization/comparative evaluation method appropriate for a domain with a weak analytical theory and applies to tasks that rarely involve just one right answer. The domain covers US trade secret law, and is substantially a common law domain. Since Anglo-American common law operates under the doctrine of precedent, the definitive way of interpreting problems is of necessity and case-based. Thus, HYPO did not involve the analysis of a statute, as required by the Prolog program. Rissland and Ashley (1987) envisioned HYPO as employing the key tasks performed by lawyers when analyzing case law for precedence to generate arguments for the prosecution or the defence. HYPO was a successful example of a general category of legal expert systems (LESs), it applies artificial intelligence (A.I.) techniques to the domain of legal reasoning in patent law, implementing a case-based reasoning (CBR) system, in contrast to rule based systems like MYCIN, or mixed-paradigm systems integrating CBR with rule-based or model-based reasoning like IKBALS II. A legal case-based reasoning essentially reasons from prior tried cases, comparing the contextual information in the current input case with that of cases previously tried and entered into the system. As noted by Ashley and Rissland (1988) CBR is used to "... capture expertise in domains where rules are ill-defined, incomplete or inconsistent". The HYPO project set out to model the creation of hypotheticals in law, where no case matches well enough. HYPO uses hypotheticals for a variety of tasks necessary for good interpretation: "to redefine old situations in terms of new dimensions, to create new standard cases when an appropriate one doesn’t exist, to explore and test the limits of a concept, to refocus a case by excluding some issues and to organize or cluster cases". Hypotheticals can include facts that support two conflicting lines of reasoning. So, it makes and responds to arguments from competing viewpoints about who should win the dispute. HYPO use heuristics such as making a case weaker or stronger, making a case extreme, enabling a near-miss, disabling a near-hit to generate hypotheticals in the context of an argument by using the dimensions mechanism. Dimensions have a range of values, along which the supportive strength that may shift from one side to the other. What differentiated this expert system from others was its facility not only to return a primary to best-case response but to return near-best-fit responses also. == Components == Legal knowledge in HYPO is contained in: the case-knowledge-base (CKB) and the library of dimensions. The CKB contains HYPO's base of known cases that are highly structured objects and sub-objects both real and hypothetical in the area of trade secret law. Each case is represented as a hierarchical set of frames whose slots are important facets of the case (e.g. Plaintiff, defendant, secret knowledge, employer/employee data).Ashley’s HYPO system used a database of thirty cases in the area indexed by thirteen dimensions. A key mechanism in HYPO is a dimension i.e. a mechanism to allow retrieval from the CKB, in order to represent legal cases. Ashley's dimensions are composed of (i) prerequisites, which are a set of factual predicates that must be satisfied for the dimension to apply (ii) focal slots, which accommodate one or two of the dimension's prerequisites designated as being indicative of the case's strength along that dimension and (iii) range information, which tells how a change in focal slot value effects the strength of a party's case along a given dimension. Dimensions focus attention on important aspects of cases. In HYPO's domain of misappropriation of trade secrets the dimension called “secrets voluntary disclosed” captures the idea that the more disclosures the plaintiff has made of his/her putative secret, the less convincing is his/her argument that the defendant is responsible for letting the secret. HYPO, like any other CBR system has also the following components: Similarity/relevancy metrics: that is, standards by which to evaluate the closeness of cases, judge their relevancy to the instant case, and select “most on point” cases. Half-Order Theory of the Application Domain: that is, hierarchies and taxonomies of knowledge, especially regarding the application domain. Precedent-based argumentation abilities: that is, capabilities to generate and evaluate precedent-based arguments. Knowledge to generate hypotheticals: that is, the ability to generate hypothetical cases to deal with various circumstances, like testing the validity of an interpretation or argument by providing gedanken experiments such as test cases or to fill in a weak CKB. == Functions == HYPO's method of creating an argument and justifying a solution or position has several steps. HYPO begins its processing with the current fact situation (cfs) which is direct input by the user into HYPO's representation framework. Once the user inputs the case, HYPO begins its legal analysis. The cfc is analyzed for relevant factors. Based on these factors HYPO selects the relevant cases and produces a case-analysis-record that records which dimensions apply to the cfc and which nearly apply (i.e. are "near misses"). The combined list of applicable and near miss dimensions is called the D-list. At this point the fact gathered module may request additional information from the user in order to draw a legal conclusion. Once all the facts are in the case-positioner module it uses the case-analysis record to create the claim lattice. This is a technique that organizes the relevant retrieved cases from the point of view of the cfc and makes it easy for HYPO to ascertain the most-on point cases (mopc) and to least on-point-cases. HYPO's arguments are 3ply, leading to the construction of the skeleton of an argument: it makes a point for one side, drawing the analogy between the problem and the precedent, responds with an argument for the opponent side, endeavoring to differentiate the cited case and citing other cases as counterarguments. Then it makes a final rebuttal, attempting to differentiate the counterarguments. The claim lattice also enables the HYPO-generator module to produce legally hypotheticals. With its use of dimension-based heuristics, the HYPO-generator does a heuristic search of the space of all possible cases. Lastly, the Explanation module expands upon the argument skeleton and provides explanation and justification for the different lines of analysis and cases found by HYPO. == An intelligent legal tutoring system == Legal expert systems are specifically designed to teach an area of law and are useful for pedagogical purposes. Ashley's work was mainly concerned to build tools to help students understand legal reasoning. Explanation and argument are the bases of the case method used in many professional schools in the U.S., first introduced by the Dean of the Harvard Law School, Christopher Columbus Langdell in 1870. The case method focuses on close readings of cases and principles; it involves students in pointed Socratic dialogue and makes strong use of hypotheticals (hypos). Thus, CATO (Aleven 1997) was a research project to device and test an intelligent, case-based tutorial program for teaching law students how to argue with cases implementing the HYPO program. Within the tutor system, Ashley and Aleven (1991) proposed to leverage an understanding of legal reasoning against the standard case-based tutoring methodology. What makes this tutoring system stand out is the additional levels of abstraction involved in its results. The system presents exercises, including the facts of a problem and a set of on-line cases and instructions to make, or respond to, a legal argument about the problem. The student/user will have a set of tools to analyze the problem and fashion an answer comparing it to other cases. Instead of simply generating precedent cases, the system works to interpret student responses, comparing them against a list of possibilities and responding to student entries, for example, by citing counterexamples, and providing feedback on a student's problem solving activities with explanations of correctness or giving further hints as to what may be wrong with evaluating a student's ability to perform legal reasoning and argument, examples and follow-up assignments by employing HYPO's model of case-based structure. == HYPO’s progeny == The quality of HYPO's results speak for themselves, in that a number of sequent legal reasoning systems are either directly based upon H

Model compression

Model compression is a machine learning technique for reducing the size of trained models. Large models can achieve high accuracy, but often at the cost of significant resource requirements. Compression techniques aim to compress models without significant performance reduction. Smaller models require less storage space, and consume less memory and compute during inference. Compressed models enable deployment on resource-constrained devices such as smartphones, embedded systems, edge computing devices, and consumer electronics computers. Efficient inference is also valuable for large corporations that serve large model inference over an API, allowing them to reduce computational costs and improve response times for users. Model compression is not to be confused with knowledge distillation, in which a smaller "student" model is trained to imitate the input-output behavior of a larger "teacher" model (as opposed to using the "teacher"'s trained parameters or the "teacher"'s training targets). == Techniques == Several techniques are employed for model compression. === Pruning === Pruning sparsifies a large model by setting some parameters to exactly zero. This effectively reduces the number of parameters. This allows the use of sparse matrix operations, which are faster than dense matrix operations. Pruning criteria can be based on magnitudes of parameters, the statistical pattern of neural activations, Hessian values, etc. === Quantization === Quantization reduces the numerical precision of weights and activations. For example, instead of storing weights as 32-bit floating-point numbers, they can be represented using 8-bit integers. Low-precision parameters take up less space, and takes less compute to perform arithmetic with. It is also possible to quantize some parameters more aggressively than others, so for example, a less important parameter can have 8-bit precision while another, more important parameter, can have 16-bit precision. Inference with such models requires mixed-precision arithmetic. Quantized models can also be used during training (rather than after training). PyTorch implements automatic mixed-precision (AMP), which performs autocasting, gradient scaling, and loss scaling. === Low-rank factorization === Weight matrices can be approximated by low-rank matrices. Let W {\displaystyle W} be a weight matrix of shape m × n {\displaystyle m\times n} . A low-rank approximation is W ≈ U V T {\displaystyle W\approx UV^{T}} , where U {\displaystyle U} and V {\displaystyle V} are matrices of shapes m × k , n × k {\displaystyle m\times k,n\times k} . When k {\displaystyle k} is small, this both reduces the number of parameters needed to represent W {\displaystyle W} approximately, and accelerates matrix multiplication by W {\displaystyle W} . Low-rank approximations can be found by singular value decomposition (SVD). The choice of rank for each weight matrix is a hyperparameter, and jointly optimized as a mixed discrete-continuous optimization problem. The rank of weight matrices may also be pruned after training, taking into account the effect of activation functions like ReLU on the implicit rank of the weight matrices. == Training == Model compression may be decoupled from training, that is, a model is first trained without regard for how it might be compressed, then it is compressed. However, it may also be combined with training. The "train big, then compress" method trains a large model for a small number of training steps (less than it would be if it were trained to convergence), then heavily compress the model. It is found that at the same compute budget, this method results in a better model than lightly compressed, small models. In Deep Compression, the compression has three steps. First loop (pruning): prune all weights lower than a threshold, then finetune the network, then prune again, etc. Second loop (quantization): cluster weights, then enforce weight sharing among all weights in each cluster, then finetune the network, then cluster again, etc. Third step: Use Huffman coding to losslessly compress the model. The SqueezeNet paper reported that Deep Compression achieved a compression ratio of 35 on AlexNet, and a ratio of ~10 on SqueezeNets.

Sparse PCA

Sparse principal component analysis (SPCA or sparse PCA) is a technique used in statistical analysis and, in particular, in the analysis of multivariate data sets. It extends the classic method of principal component analysis (PCA) for the reduction of dimensionality of data by introducing sparsity structures to the input variables. A particular disadvantage of ordinary PCA is that the principal components are usually linear combinations of all input variables. SPCA overcomes this disadvantage by finding components that are linear combinations of just a few input variables (SPCs). This means that some of the coefficients of the linear combinations defining the SPCs, called loadings, are equal to zero. The number of nonzero loadings is called the cardinality of the SPC. == Mathematical formulation == Consider a data matrix, X {\displaystyle X} , where each of the p {\displaystyle p} columns represent an input variable, and each of the n {\displaystyle n} rows represents an independent sample from data population. One assumes each column of X {\displaystyle X} has mean zero, otherwise one can subtract column-wise mean from each element of X {\displaystyle X} . Let Σ = 1 n − 1 X ⊤ X {\displaystyle \Sigma ={\frac {1}{n-1}}X^{\top }X} be the empirical covariance matrix of X {\displaystyle X} , which has dimension p × p {\displaystyle p\times p} . Given an integer k {\displaystyle k} with 1 ≤ k ≤ p {\displaystyle 1\leq k\leq p} , the sparse PCA problem can be formulated as maximizing the variance along a direction represented by vector v ∈ R p {\displaystyle v\in \mathbb {R} ^{p}} while constraining its cardinality: max v T Σ v subject to ‖ v ‖ 2 = 1 ‖ v ‖ 0 ≤ k . {\displaystyle {\begin{aligned}\max \quad &v^{T}\Sigma v\\{\text{subject to}}\quad &\left\Vert v\right\Vert _{2}=1\\&\left\Vert v\right\Vert _{0}\leq k.\end{aligned}}} Eq. 1 The first constraint specifies that v is a unit vector. In the second constraint, ‖ v ‖ 0 {\displaystyle \left\Vert v\right\Vert _{0}} represents the ℓ 0 {\displaystyle \ell _{0}} pseudo-norm of v, which is defined as the number of its non-zero components. So the second constraint specifies that the number of non-zero components in v is less than or equal to k, which is typically an integer that is much smaller than dimension p. The optimal value of Eq. 1 is known as the k-sparse largest eigenvalue. If one takes k=p, the problem reduces to the ordinary PCA, and the optimal value becomes the largest eigenvalue of covariance matrix Σ. After finding the optimal solution v, one deflates Σ to obtain a new matrix Σ 1 = Σ − ( v T Σ v ) v v T , {\displaystyle \Sigma _{1}=\Sigma -(v^{T}\Sigma v)vv^{T},} and iterate this process to obtain further principal components. However, unlike PCA, sparse PCA cannot guarantee that different principal components are orthogonal. In order to achieve orthogonality, additional constraints must be enforced. The following equivalent definition is in matrix form. Let V {\displaystyle V} be a p×p symmetric matrix, one can rewrite the sparse PCA problem as max T r ( Σ V ) subject to T r ( V ) = 1 ‖ V ‖ 0 ≤ k 2 R a n k ( V ) = 1 , V ⪰ 0. {\displaystyle {\begin{aligned}\max \quad &Tr(\Sigma V)\\{\text{subject to}}\quad &Tr(V)=1\\&\Vert V\Vert _{0}\leq k^{2}\\&Rank(V)=1,V\succeq 0.\end{aligned}}} Eq. 2 Tr is the matrix trace, and ‖ V ‖ 0 {\displaystyle \Vert V\Vert _{0}} represents the non-zero elements in matrix V. The last line specifies that V has matrix rank one and is positive semidefinite. The last line means that one has V = v v T {\displaystyle V=vv^{T}} , so Eq. 2 is equivalent to Eq. 1. Moreover, the rank constraint in this formulation is actually redundant, and therefore sparse PCA can be cast as the following mixed-integer semidefinite program max T r ( Σ V ) subject to T r ( V ) = 1 | V i , i | ≤ z i , ∀ i ∈ { 1 , . . . , p } , | V i , j | ≤ 1 2 z i , ∀ i , j ∈ { 1 , . . . , p } : i ≠ j , V ⪰ 0 , z ∈ { 0 , 1 } p , ∑ i z i ≤ k {\displaystyle {\begin{aligned}\max \quad &Tr(\Sigma V)\\{\text{subject to}}\quad &Tr(V)=1\\&\vert V_{i,i}\vert \leq z_{i},\forall i\in \{1,...,p\},\vert V_{i,j}\vert \leq {\frac {1}{2}}z_{i},\forall i,j\in \{1,...,p\}:i\neq j,\\&V\succeq 0,z\in \{0,1\}^{p},\sum _{i}z_{i}\leq k\end{aligned}}} Eq. 3 Because of the cardinality constraint, the maximization problem is hard to solve exactly, especially when dimension p is high. In fact, the sparse PCA problem in Eq. 1 is NP-hard in the strong sense. == Computational considerations == As most sparse problems, variable selection in SPCA is a computationally intractable non-convex NP-hard problem, therefore greedy sub-optimal algorithms are often employed to find solutions. Note also that SPCA introduces hyperparameters quantifying in what capacity large parameter values are penalized. These might need tuning to achieve satisfactory performance, thereby adding to the total computational cost. == Algorithms for SPCA == Several alternative approaches (of Eq. 1) have been proposed, including a regression framework, a penalized matrix decomposition framework, a convex relaxation/semidefinite programming framework, a generalized power method framework an alternating maximization framework forward-backward greedy search and exact methods using branch-and-bound techniques, a certifiably optimal branch-and-bound approach Bayesian formulation framework. A certifiably optimal mixed-integer semidefinite branch-and-cut approach The methodological and theoretical developments of Sparse PCA as well as its applications in scientific studies are recently reviewed in a survey paper. === Notes on Semidefinite Programming Relaxation === It has been proposed that sparse PCA can be approximated by semidefinite programming (SDP). If one drops the rank constraint and relaxes the cardinality constraint by a 1-norm convex constraint, one gets a semidefinite programming relaxation, which can be solved efficiently in polynomial time: max T r ( Σ V ) subject to T r ( V ) = 1 1 T | V | 1 ≤ k V ⪰ 0. {\displaystyle {\begin{aligned}\max \quad &Tr(\Sigma V)\\{\text{subject to}}\quad &Tr(V)=1\\&\mathbf {1} ^{T}|V|\mathbf {1} \leq k\\&V\succeq 0.\end{aligned}}} Eq. 3 In the second constraint, 1 {\displaystyle \mathbf {1} } is a p×1 vector of ones, and |V| is the matrix whose elements are the absolute values of the elements of V. The optimal solution V {\displaystyle V} to the relaxed problem Eq. 3 is not guaranteed to have rank one. In that case, V {\displaystyle V} can be truncated to retain only the dominant eigenvector. While the semidefinite program does not scale beyond n=300 covariates, it has been shown that a second-order cone relaxation of the semidefinite relaxation is almost as tight and successfully solves problems with n=1000s of covariates == Applications == === Financial Data Analysis === Suppose ordinary PCA is applied to a dataset where each input variable represents a different asset, it may generate principal components that are weighted combination of all the assets. In contrast, sparse PCA would produce principal components that are weighted combination of only a few input assets, so one can easily interpret its meaning. Furthermore, if one uses a trading strategy based on these principal components, fewer assets imply less transaction costs. === Biology === Consider a dataset where each input variable corresponds to a specific gene. Sparse PCA can produce a principal component that involves only a few genes, so researchers can focus on these specific genes for further analysis. === High-dimensional Hypothesis Testing === Contemporary datasets often have the number of input variables ( p {\displaystyle p} ) comparable with or even much larger than the number of samples ( n {\displaystyle n} ). It has been shown that if p / n {\displaystyle p/n} does not converge to zero, the classical PCA is not consistent. In other words, if we let k = p {\displaystyle k=p} in Eq. 1, then the optimal value does not converge to the largest eigenvalue of data population when the sample size n → ∞ {\displaystyle n\rightarrow \infty } , and the optimal solution does not converge to the direction of maximum variance. But sparse PCA can retain consistency even if p ≫ n . {\displaystyle p\gg n.} The k-sparse largest eigenvalue (the optimal value of Eq. 1) can be used to discriminate an isometric model, where every direction has the same variance, from a spiked covariance model in high-dimensional setting. Consider a hypothesis test where the null hypothesis specifies that data X {\displaystyle X} are generated from a multivariate normal distribution with mean 0 and covariance equal to an identity matrix, and the alternative hypothesis specifies that data X {\displaystyle X} is generated from a spiked model with signal strength θ {\displaystyle \theta } : H 0 : X ∼ N ( 0 , I p ) , H 1 : X ∼ N ( 0 , I p + θ v v T ) , {\displaystyle H_{0}:X\sim N(0,I_{p}),\quad H_{1}:X\sim N(0,I_{p}+\theta vv^{T}),} where v ∈ R p {\displaystyle v\in \mathbb {R} ^{p}

Causal Markov condition

The Causal Markov (CM) condition states that, conditional on the set of all its direct causes, a node is independent of all variables which are not effects or direct causes of that node. In the event that the structure of a Bayesian network accurately depicts causality, the two conditions are equivalent. This is related to the Markov condition, an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket. A network may accurately embody the Markov condition without depicting causality, in which case it should not be assumed to embody the causal Markov condition. == Motivation == Statisticians are enormously interested in the ways in which certain events and variables are connected. The precise notion of what constitutes a cause and effect is necessary to understand the connections between them. The central idea behind the philosophical study of probabilistic causation is that causes raise the probabilities of their effects, all else being equal. A deterministic interpretation of causation means that if A causes B, then A must always be followed by B. In this sense, smoking does not cause cancer because some smokers never develop cancer. On the other hand, a probabilistic interpretation simply means that causes raise the probability of their effects. In this sense, changes in meteorological readings associated with a storm do cause that storm, since they raise its probability. (However, simply looking at a barometer does not change the probability of the storm, for a more detailed analysis, see:). == Examples == In a simple view, releasing one's hand from a hammer causes the hammer to fall. However, doing so in outer space does not produce the same outcome, calling into question if releasing one's fingers from a hammer always causes it to fall. A causal graph could be created to acknowledge that both the presence of gravity and the release of the hammer contribute to its falling. However, it would be very surprising if the surface underneath the hammer affected its falling. This essentially states the Causal Markov Condition, that given the existence of gravity the release of the hammer, it will fall regardless of what is beneath it. == Implications == === Dependence and Causation === It follows from the definition that if X and Y are in V and are probabilistically dependent, then either X causes Y, Y causes X, or X and Y are both effects of some common cause Z in V. This definition was seminally introduced by Hans Reichenbach as the Common Cause Principle (CCP). === Screening === It once again follows from the definition that the parents of X screen X from other "indirect causes" of X (parents of Parents(X)) and other effects of Parents(X) which are not also effects of X.

KNIME

KNIME ( ), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of Java Database Connectivity (JDBC) allows assembly of nodes blending different data sources, including preprocessing (extract, transform, load, or ETL), for modeling, data analysis and visualization with minimal, or no, programming. It is free and open-source software released under a GNU General Public License. Since 2006, KNIME has been used in pharmaceutical research, and in other areas including customer relationship management (CRM) and data analysis, business intelligence, text mining and financial data analysis. Recently, attempts were made to use KNIME as robotic process automation (RPA) tool. KNIME's headquarters are based in Zurich, with other offices in Konstanz, Berlin, and Austin (USA). == History == Development of KNIME began in January 2004, with a team of software engineers at the University of Konstanz, as an open-source platform. The original team, headed by Michael Berthold, came from a Silicon Valley pharmaceutical industry software company. The initial goal was to create a modular, highly scalable and open data processing platform that allows easy integration of different data loading, processing, transforming, analyzing, and visual exploring modules, without focus on any one application area. The platform was intended for collaborating, research, and for integrating various other data analysis projects. In 2006, the first version of KNIME was released. Several pharmaceutical companies began using KNIME, and several life science software vendors began integrating their tools into the platform. Later that year, after an article in the German magazine c't, users from a number of other areas joined ship. As of 2012, KNIME is in use by over 15,000 actual users (i.e. not counting downloads, but users regularly retrieving updates) in the life sciences and at banks, publishers, car manufacturer, telcos, consulting firms, and various other industries, and a large number of research groups, worldwide. Latest updates to KNIME Server and KNIME Big Data Extensions, provide support for Apache Spark 2.3, Parquet and HDFS-type storage. For the sixth year in a row, KNIME has been placed as a leader for data science and machine learning platforms in Gartner's Magic Quadrant. == Design philosophy, features == These are the design principles and features that KNIME software follows: Visual, Interactive Framework: KNIME Software prioritizes a user-friendly and intuitive approach to data analysis. This is achieved through a visual and interactive framework where data flows can be combined using a drag-and-drop interface. Users can develop customized and interactive applications by creating simple to advanced and highly-automated data pipelines. These may include, for example, access to databases, machine learning libraries, logic for workflow control (e.g., loops, switches, etc.), abstraction (e.g., interactive widgets), invocation, dynamic data apps, integrated deployment, or error handling. Modularity: processing units and data containers should remain independent of each other. This design choice enables easy distribution of computation and allows for the independent development of different algorithms. Data types within KNIME are encapsulated, meaning no types are predefined. This design choice facilitates adding new data types, and integrating them with extant types, while including type-specific renderers and comparators. This principle also enables inspecting results at the end of each single data operation. Extensibility: KNIME Software is designed to be extensible. Adding new processing nodes or views is made simple through a plug-in mechanism. This mechanism ensures that users can distribute their custom functionalities without the need for complicated install or uninstall procedures. Interleaving No-Code with Code: the platform supports integrating both visual programming (no-code) and script-based programming (e.g., Python, R, JavaScript) approaches to data analysis. This design principle is termed low-code. Automation and Scalability: for example, the use of parameterization via flow variables, or the encapsulation of workflow segments in components contribute to reduce manual work and errors in analyses. Further, the scheduling of workflow execution (available in KNIME Business Hub and KNIME Community Hub for Teams) reduces dependency on human resources. In terms of scalability, a few examples include the ability to handle large datasets (millions of rows), execute multiple processes simultaneously out of the box and reuse workflow segments. Full Usability: due to the open source nature, KNIME Analytics Platform provides free full usability with no limited trial periods. == Internals == KNIME allows users to visually create data flows (or pipelines), selectively execute some or all analysis steps, and later inspect the results, models, using interactive widgets and views. KNIME is written in Java and based on Eclipse. It makes use of an extension mechanism to add plug-ins providing added functions. The core version includes hundreds of modules for data integration (file input/output (I/O), database nodes supporting all common database management systems through JDBC or native connectors: SQLite, MS-Access, SQL Server, MySQL, Oracle, PostgreSQL, Vertica and H2), data transformation (filter, converter, splitter, combiner, joiner), and the commonly used methods of statistics, data mining, analysis and text analytics. Visualization is supported with the Report Designer extension. KNIME workflows can be used as data sets to create report templates that can be exported to document formats such as doc, ppt, xls, pdf and others. Other KNIME abilities are: KNIMEs core-architecture allows processing of large data volumes that are only limited by the available hard disk space (not limited to the available RAM). E.g., KNIME allows analyzing 300 million customer addresses, 20 million cell images, and 10 million molecular structures. Added plug-ins allow integrating methods for text mining, image mining, time series analysis, and networking. KNIME integrates various other open-source projects, e.g., machine learning algorithms from Weka, H2O, Keras, Spark, the R project and LIBSVM; plotly, JFreeChart, ImageJ, and the Chemistry Development Kit. KNIME is implemented in Java, allows for wrappers calling other code, in addition to providing nodes that allow it to run Java, Python, R, Ruby and other code fragments. Since 2021, KNIME's Python Integration utilizes Anaconda for Python distribution and environment management. == License == In 2024, KNIME version 5.3 is released under the same GPLv3 license as previous versions. As of version 2.1, KNIME is released under the GPLv3 license, with an exception that allow commercial software vendors to use the well-defined node application programming interface (API) to add proprietary extensions, or wrappers calling their tools from KNIME. == Courses == KNIME allows the performance of data analysis without programming skills. Several free, online courses are provided.

VideoPoet

VideoPoet is a large language model developed by Google Research in 2023 for video making. It can be asked to animate still images. The model accepts text, images, and videos as inputs, with a program to add feature for any input to any format generated content. VideoPoet was publicly announced on December 19, 2023. It uses an autoregressive language model.

Gaussian adaptation

Gaussian adaptation (GA), also called normal or natural adaptation (NA) is an evolutionary algorithm designed for the maximization of manufacturing yield due to statistical deviation of component values of signal processing systems. In short, GA is a stochastic adaptive process where a number of samples of an n-dimensional vector x[xT = (x1, x2, ..., xn)] are taken from a multivariate Gaussian distribution, N(m, M), having mean m and moment matrix M. The samples are tested for fail or pass. The first- and second-order moments of the Gaussian restricted to the pass samples are m and M. The outcome of x as a pass sample is determined by a function s(x), 0 < s(x) < q ≤ 1, such that s(x) is the probability that x will be selected as a pass sample. The average probability of finding pass samples (yield) is P ( m ) = ∫ s ( x ) N ( x − m ) d x {\displaystyle P(m)=\int s(x)N(x-m)\,dx} Then the theorem of GA states: For any s(x) and for any value of P < q, there always exist a Gaussian p. d. f. [ probability density function ] that is adapted for maximum dispersion. The necessary conditions for a local optimum are m = m and M proportional to M. The dual problem is also solved: P is maximized while keeping the dispersion constant (Kjellström, 1991). Proofs of the theorem may be found in the papers by Kjellström, 1970, and Kjellström & Taxén, 1981. Since dispersion is defined as the exponential of entropy/disorder/average information it immediately follows that the theorem is valid also for those concepts. Altogether, this means that Gaussian adaptation may carry out a simultaneous maximisation of yield and average information (without any need for the yield or the average information to be defined as criterion functions). The theorem is valid for all regions of acceptability and all Gaussian distributions. It may be used by cyclic repetition of random variation and selection (like the natural evolution). In every cycle a sufficiently large number of Gaussian distributed points are sampled and tested for membership in the region of acceptability. The centre of gravity of the Gaussian, m, is then moved to the centre of gravity of the approved (selected) points, m. Thus, the process converges to a state of equilibrium fulfilling the theorem. A solution is always approximate because the centre of gravity is always determined for a limited number of points. It was used for the first time in 1969 as a pure optimization algorithm making the regions of acceptability smaller and smaller (in analogy to simulated annealing, Kirkpatrick 1983). Since 1970 it has been used for both ordinary optimization and yield maximization. == Natural evolution and Gaussian adaptation == It has also been compared to the natural evolution of populations of living organisms. In this case s(x) is the probability that the individual having an array x of phenotypes will survive by giving offspring to the next generation; a definition of individual fitness given by Hartl 1981. The yield, P, is replaced by the mean fitness determined as a mean over the set of individuals in a large population. Phenotypes are often Gaussian distributed in a large population and a necessary condition for the natural evolution to be able to fulfill the theorem of Gaussian adaptation, with respect to all Gaussian quantitative characters, is that it may push the centre of gravity of the Gaussian to the centre of gravity of the selected individuals. This may be accomplished by the Hardy–Weinberg law. This is possible because the theorem of Gaussian adaptation is valid for any region of acceptability independent of the structure (Kjellström, 1996). In this case the rules of genetic variation such as crossover, inversion, transposition etcetera may be seen as random number generators for the phenotypes. So, in this sense Gaussian adaptation may be seen as a genetic algorithm. == How to climb a mountain == Mean fitness may be calculated provided that the distribution of parameters and the structure of the landscape is known. The real landscape is not known, but figure below shows a fictitious profile (blue) of a landscape along a line (x) in a room spanned by such parameters. The red curve is the mean based on the red bell curve at the bottom of figure. It is obtained by letting the bell curve slide along the x-axis, calculating the mean at every location. As can be seen, small peaks and pits are smoothed out. Thus, if evolution is started at A with a relatively small variance (the red bell curve), then climbing will take place on the red curve. The process may get stuck for millions of years at B or C, as long as the hollows to the right of these points remain, and the mutation rate is too small. If the mutation rate is sufficiently high, the disorder or variance may increase and the parameter(s) may become distributed like the green bell curve. Then the climbing will take place on the green curve, which is even more smoothed out. Because the hollows to the right of B and C have now disappeared, the process may continue up to the peaks at D. But of course the landscape puts a limit on the disorder or variability. Besides — dependent on the landscape — the process may become very jerky, and if the ratio between the time spent by the process at a local peak and the time of transition to the next peak is very high, it may as well look like a punctuated equilibrium as suggested by Gould (see Ridley). == Computer simulation of Gaussian adaptation == Thus far the theory only considers mean values of continuous distributions corresponding to an infinite number of individuals. In reality however, the number of individuals is always limited, which gives rise to an uncertainty in the estimation of m and M (the moment matrix of the Gaussian). And this may also affect the efficiency of the process. Unfortunately very little is known about this, at least theoretically. The implementation of normal adaptation on a computer is a fairly simple task. The adaptation of m may be done by one sample (individual) at a time, for example m(i + 1) = (1 – a) m(i) + ax where x is a pass sample, and a < 1 a suitable constant so that the inverse of a represents the number of individuals in the population. M may in principle be updated after every step y leading to a feasible point x = m + y according to: M(i + 1) = (1 – 2b) M(i) + 2byyT, where yT is the transpose of y and b << 1 is another suitable constant. In order to guarantee a suitable increase of average information, y should be normally distributed with moment matrix μ2M, where the scalar μ > 1 is used to increase average information (information entropy, disorder, diversity) at a suitable rate. But M will never be used in the calculations. Instead we use the matrix W defined by WWT = M. Thus, we have y = Wg, where g is normally distributed with the moment matrix μU, and U is the unit matrix. W and WT may be updated by the formulas W = (1 – b)W + bygT and WT = (1 – b)WT + bgyT because multiplication gives M = (1 – 2b)M + 2byyT, where terms including b2 have been neglected. Thus, M will be indirectly adapted with good approximation. In practice it will suffice to update W only W(i + 1) = (1 – b)W(i) + bygT. This is the formula used in a simple 2-dimensional model of a brain satisfying the Hebbian rule of associative learning; see the next section (Kjellström, 1996 and 1999). The figure below illustrates the effect of increased average information in a Gaussian p.d.f. used to climb a mountain Crest (the two lines represent the contour line). Both the red and green cluster have equal mean fitness, about 65%, but the green cluster has a much higher average information making the green process much more efficient. The effect of this adaptation is not very salient in a 2-dimensional case, but in a high-dimensional case, the efficiency of the search process may be increased by many orders of magnitude. == The evolution in the brain == In the brain the evolution of DNA-messages is supposed to be replaced by an evolution of signal patterns and the phenotypic landscape is replaced by a mental landscape, the complexity of which will hardly be second to the former. The metaphor with the mental landscape is based on the assumption that certain signal patterns give rise to a better well-being or performance. For instance, the control of a group of muscles leads to a better pronunciation of a word or performance of a piece of music. In this simple model it is assumed that the brain consists of interconnected components that may add, multiply and delay signal values. A nerve cell kernel may add signal values, a synapse may multiply with a constant and An axon may delay values. This is a basis of the theory of digital filters and neural networks consisting of components that may add, multiply and delay signalvalues and also of many brain models, Levine 1991. In the figure below the brain stem is supposed to deliver Gaussian distributed signal patterns. This may be possible since certai