AI Art Legality

AI Art Legality — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Lexalytics

    Lexalytics

    Lexalytics, Inc. provides sentiment and intent analysis to an array of companies using SaaS and cloud based technology. Salience 6, the engine behind Lexalytics, was built as an on-premises, multi-lingual text analysis engine. It is leased to other companies who use it to power filtering and reputation management programs. In July, 2015 Lexalytics acquired Semantria to be used as a cloud option for its technology. In September, 2021 Lexalytics was acquired by CX company InMoment. == History == Lexalytics spun into existence in January 2003 out of a content management startup called Lightspeed. Lightspeed consolidated on America's West Coast. Jeff Catlin, a Lightspeed General Manager, and Mike Marshall, a Lighstpeed Principal Engineer, convinced investors to give them the East Coast company so as to avoid shutdown costs. Catlin and Marshall renamed the operation Lexalytics. Catlin took on the role of chief executive officer with Marshall working as Chief Technology Officer. Lexalytics opted to not accept venture cash. Instead, the company initially shared sales and marketing expenses with U.K. based document management company Infonic. The partner companies soon formed a joint venture in July 2008, which was later dissolved. Since then, Lexalytics has worked with many other companies, like Bottlenose, Salesforce, Thomson Reuters, Oracle and DataSift. Relationships with social media monitoring companies like Datasift tend to find Lexalytics’ Salience engine baked into the product itself. Lexalytics is used similarly to monitor sentiment as it relates to stock trading. In December 2014, Lexalytics announced the latest iteration to its sentiment analysis engine, Salience 6. Earlier that year Lexalytics acquired Semantria in a bid to appeal to a wider variety of business models. Created by former Lexalytics Marketing Director Oleg Rogynskyy, Semantria is a SaaS text mining service offered as an API and Excel based plugin that measures sentiment. The goal of the acquisition, which cost Lexalytics less than US$10 million, was to expand the customer base both within the United States and abroad with multilingual support. The engine that powers Semantria, Salience, is grounded in its deep learning ability. An example of this is its concept matrix, which allows Salience an understanding of concepts and relationship between concepts based on a detailed reading of the entire repository of Wikipedia. This matrix allows Salience to use Wikipedia for automatic categorization. Along with features like the concept matrix, Salience supports 16 international languages. The engine has earned Lexalytics a spot on EContent's “Top 100 Companies in the Digital Content Industry” List for 2014–2015. In September 2018, Lexalytics launched document data extraction market using natural language processing (NLP).

    Read more →
  • Sufficient dimension reduction

    Sufficient dimension reduction

    In statistics, sufficient dimension reduction (SDR) is a paradigm for analyzing data that combines the ideas of dimension reduction with the concept of sufficiency. Dimension reduction has long been a primary goal of regression analysis. Given a response variable y and a p-dimensional predictor vector x {\displaystyle {\textbf {x}}} , regression analysis aims to study the distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} , the conditional distribution of y {\displaystyle y} given x {\displaystyle {\textbf {x}}} . A dimension reduction is a function R ( x ) {\displaystyle R({\textbf {x}})} that maps x {\displaystyle {\textbf {x}}} to a subset of R k {\displaystyle \mathbb {R} ^{k}} , k < p, thereby reducing the dimension of x {\displaystyle {\textbf {x}}} . For example, R ( x ) {\displaystyle R({\textbf {x}})} may be one or more linear combinations of x {\displaystyle {\textbf {x}}} . A dimension reduction R ( x ) {\displaystyle R({\textbf {x}})} is said to be sufficient if the distribution of y ∣ R ( x ) {\displaystyle y\mid R({\textbf {x}})} is the same as that of y ∣ x {\displaystyle y\mid {\textbf {x}}} . In other words, no information about the regression is lost in reducing the dimension of x {\displaystyle {\textbf {x}}} if the reduction is sufficient. == Graphical motivation == In a regression setting, it is often useful to summarize the distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} graphically. For instance, one may consider a scatterplot of y {\displaystyle y} versus one or more of the predictors or a linear combination of the predictors. A scatterplot that contains all available regression information is called a sufficient summary plot. When x {\displaystyle {\textbf {x}}} is high-dimensional, particularly when p ≥ 3 {\displaystyle p\geq 3} , it becomes increasingly challenging to construct and visually interpret sufficiency summary plots without reducing the data. Even three-dimensional scatter plots must be viewed via a computer program, and the third dimension can only be visualized by rotating the coordinate axes. However, if there exists a sufficient dimension reduction R ( x ) {\displaystyle R({\textbf {x}})} with small enough dimension, a sufficient summary plot of y {\displaystyle y} versus R ( x ) {\displaystyle R({\textbf {x}})} may be constructed and visually interpreted with relative ease. Hence sufficient dimension reduction allows for graphical intuition about the distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} , which might not have otherwise been available for high-dimensional data. Most graphical methodology focuses primarily on dimension reduction involving linear combinations of x {\displaystyle {\textbf {x}}} . The rest of this article deals only with such reductions. == Dimension reduction subspace == Suppose R ( x ) = A T x {\displaystyle R({\textbf {x}})=A^{T}{\textbf {x}}} is a sufficient dimension reduction, where A {\displaystyle A} is a p × k {\displaystyle p\times k} matrix with rank k ≤ p {\displaystyle k\leq p} . Then the regression information for y ∣ x {\displaystyle y\mid {\textbf {x}}} can be inferred by studying the distribution of y ∣ A T x {\displaystyle y\mid A^{T}{\textbf {x}}} , and the plot of y {\displaystyle y} versus A T x {\displaystyle A^{T}{\textbf {x}}} is a sufficient summary plot. Without loss of generality, only the space spanned by the columns of A {\displaystyle A} need be considered. Let η {\displaystyle \eta } be a basis for the column space of A {\displaystyle A} , and let the space spanned by η {\displaystyle \eta } be denoted by S ( η ) {\displaystyle {\mathcal {S}}(\eta )} . It follows from the definition of a sufficient dimension reduction that F y ∣ x = F y ∣ η T x , {\displaystyle F_{y\mid x}=F_{y\mid \eta ^{T}x},} where F {\displaystyle F} denotes the appropriate distribution function. Another way to express this property is y ⊥ ⊥ x ∣ η T x , {\displaystyle y\perp \!\!\!\perp {\textbf {x}}\mid \eta ^{T}{\textbf {x}},} or y {\displaystyle y} is conditionally independent of x {\displaystyle {\textbf {x}}} , given η T x {\displaystyle \eta ^{T}{\textbf {x}}} . Then the subspace S ( η ) {\displaystyle {\mathcal {S}}(\eta )} is defined to be a dimension reduction subspace (DRS). === Structural dimensionality === For a regression y ∣ x {\displaystyle y\mid {\textbf {x}}} , the structural dimension, d {\displaystyle d} , is the smallest number of distinct linear combinations of x {\displaystyle {\textbf {x}}} necessary to preserve the conditional distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} . In other words, the smallest dimension reduction that is still sufficient maps x {\displaystyle {\textbf {x}}} to a subset of R d {\displaystyle \mathbb {R} ^{d}} . The corresponding DRS will be d-dimensional. === Minimum dimension reduction subspace === A subspace S {\displaystyle {\mathcal {S}}} is said to be a minimum DRS for y ∣ x {\displaystyle y\mid {\textbf {x}}} if it is a DRS and its dimension is less than or equal to that of all other DRSs for y ∣ x {\displaystyle y\mid {\textbf {x}}} . A minimum DRS S {\displaystyle {\mathcal {S}}} is not necessarily unique, but its dimension is equal to the structural dimension d {\displaystyle d} of y ∣ x {\displaystyle y\mid {\textbf {x}}} , by definition. If S {\displaystyle {\mathcal {S}}} has basis η {\displaystyle \eta } and is a minimum DRS, then a plot of y versus η T x {\displaystyle \eta ^{T}{\textbf {x}}} is a minimal sufficient summary plot, and it is (d + 1)-dimensional. == Central subspace == If a subspace S {\displaystyle {\mathcal {S}}} is a DRS for y ∣ x {\displaystyle y\mid {\textbf {x}}} , and if S ⊂ S drs {\displaystyle {\mathcal {S}}\subset {\mathcal {S}}_{\text{drs}}} for all other DRSs S drs {\displaystyle {\mathcal {S}}_{\text{drs}}} , then it is a central dimension reduction subspace, or simply a central subspace, and it is denoted by S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} . In other words, a central subspace for y ∣ x {\displaystyle y\mid {\textbf {x}}} exists if and only if the intersection ⋂ S drs {\textstyle \bigcap {\mathcal {S}}_{\text{drs}}} of all dimension reduction subspaces is also a dimension reduction subspace, and that intersection is the central subspace S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} . The central subspace S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} does not necessarily exist because the intersection ⋂ S drs {\textstyle \bigcap {\mathcal {S}}_{\text{drs}}} is not necessarily a DRS. However, if S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} does exist, then it is also the unique minimum dimension reduction subspace. === Existence of the central subspace === While the existence of the central subspace S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} is not guaranteed in every regression situation, there are some rather broad conditions under which its existence follows directly. For example, consider the following proposition from Cook (1998): Let S 1 {\displaystyle {\mathcal {S}}_{1}} and S 2 {\displaystyle {\mathcal {S}}_{2}} be dimension reduction subspaces for y ∣ x {\displaystyle y\mid {\textbf {x}}} . If x {\displaystyle {\textbf {x}}} has density f ( a ) > 0 {\displaystyle f(a)>0} for all a ∈ Ω x {\displaystyle a\in \Omega _{x}} and f ( a ) = 0 {\displaystyle f(a)=0} everywhere else, where Ω x {\displaystyle \Omega _{x}} is convex, then the intersection S 1 ∩ S 2 {\displaystyle {\mathcal {S}}_{1}\cap {\mathcal {S}}_{2}} is also a dimension reduction subspace. It follows from this proposition that the central subspace S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} exists for such x {\displaystyle {\textbf {x}}} . == Methods for dimension reduction == There are many existing methods for dimension reduction, both graphical and numeric. For example, sliced inverse regression (SIR) and sliced average variance estimation (SAVE) were introduced in the 1990s and continue to be widely used. Although SIR was originally designed to estimate an effective dimension reducing subspace, it is now understood that it estimates only the central subspace, which is generally different. More recent methods for dimension reduction include likelihood-based sufficient dimension reduction, estimating the central subspace based on the inverse third moment (or kth moment), estimating the central solution space, graphical regression, envelope model, and the principal support vector machine. For more details on these and other methods, consult the statistical literature. Principal components analysis (PCA) and similar methods for dimension reduction are not based on the sufficiency principle. === Example: linear regression === Consider the regression model y = α + β T x + ε , where ε ⊥ ⊥ x . {\displaystyle y=\alpha +\beta ^{T}{\textbf {x}}+\varepsilon ,{\text{ where }}\varepsilon \perp \!\!\!\perp {\textbf {x}}.} Note that the distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} is the same as the distribution of y ∣ β T x {\displ

    Read more →
  • Multifactor dimensionality reduction

    Multifactor dimensionality reduction

    Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches, for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. MDR was designed specifically to identify nonadditive interactions among discrete variables that influence a binary outcome and is considered a nonparametric and model-free alternative to traditional statistical methods such as logistic regression. The basis of the MDR method is a constructive induction or feature engineering algorithm that converts two or more variables or attributes to a single attribute. This process of constructing a new attribute changes the representation space of the data. The end goal is to create or discover a representation that facilitates the detection of nonlinear or nonadditive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data. == Illustrative example == Consider the following simple example using the exclusive OR (XOR) function. XOR is a logical operator that is commonly used in data mining and machine learning as an example of a function that is not linearly separable. The table below represents a simple dataset where the relationship between the attributes (X1 and X2) and the class variable (Y) is defined by the XOR function such that Y = X1 XOR X2. Table 1 A machine learning algorithm would need to discover or approximate the XOR function in order to accurately predict Y using information about X1 and X2. An alternative strategy would be to first change the representation of the data using constructive induction to facilitate predictive modeling. The MDR algorithm would change the representation of the data (X1 and X2) in the following manner. MDR starts by selecting two attributes. In this simple example, X1 and X2 are selected. Each combination of values for X1 and X2 are examined and the number of times Y=1 and/or Y=0 is counted. In this simple example, Y=1 occurs zero times and Y=0 occurs once for the combination of X1=0 and X2=0. With MDR, the ratio of these counts is computed and compared to a fixed threshold. Here, the ratio of counts is 0/1 which is less than our fixed threshold of 1. Since 0/1 < 1 we encode a new attribute (Z) as a 0. When the ratio is greater than one we encode Z as a 1. This process is repeated for all unique combinations of values for X1 and X2. Table 2 illustrates our new transformation of the data. Table 2 The machine learning algorithm now has much less work to do to find a good predictive function. In fact, in this very simple example, the function Y = Z has a classification accuracy of 1. A nice feature of constructive induction methods such as MDR is the ability to use any data mining or machine learning method to analyze the new representation of the data. Decision trees, neural networks, or a naive Bayes classifier could be used in combination with measures of model quality such as balanced accuracy and mutual information. == Machine learning with MDR == As illustrated above, the basic constructive induction algorithm in MDR is very simple. However, its implementation for mining patterns from real data can be computationally complex. As with any machine learning algorithm there is always concern about overfitting. That is, machine learning algorithms are good at finding patterns in completely random data. It is often difficult to determine whether a reported pattern is an important signal or just chance. One approach is to estimate the generalizability of a model to independent datasets using methods such as cross-validation. Models that describe random data typically don't generalize. Another approach is to generate many random permutations of the data to see what the data mining algorithm finds when given the chance to overfit. Permutation testing makes it possible to generate an empirical p-value for the result. Replication in independent data may also provide evidence for an MDR model but can be sensitive to difference in the data sets. These approaches have all been shown to be useful for choosing and evaluating MDR models. An important step in a machine learning exercise is interpretation. Several approaches have been used with MDR including entropy analysis and pathway analysis. Tips and approaches for using MDR to model gene-gene interactions have been reviewed. == Extensions to MDR == Numerous extensions to MDR have been introduced. These include family-based methods, fuzzy methods, covariate adjustment, odds ratios, risk scores, survival methods, robust methods, methods for quantitative traits, and many others. == Applications of MDR == MDR has mostly been applied to detecting gene-gene interactions or epistasis in genetic studies of common human diseases such as atrial fibrillation, autism, bladder cancer, breast cancer, cardiovascular disease, hypertension, obesity, pancreatic cancer, prostate cancer and tuberculosis. It has also been applied to other biomedical problems such as the genetic analysis of pharmacology outcomes. A central challenge is the scaling of MDR to big data such as that from genome-wide association studies (GWAS). Several approaches have been used. One approach is to filter the features prior to MDR analysis. This can be done using biological knowledge through tools such as BioFilter. It can also be done using computational tools such as ReliefF. Another approach is to use stochastic search algorithms such as genetic programming to explore the search space of feature combinations. Yet another approach is a brute-force search using high-performance computing. == Implementations == www.epistasis.org provides an open-source and freely-available MDR software package. An R package for MDR. An sklearn-compatible Python implementation. An R package for Model-Based MDR. MDR in Weka. Generalized MDR.

    Read more →
  • Iris flower data set

    Iris flower data set

    The Iris flower data set or Fisher's Iris data set is a multivariate data set used and made famous by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis. It is sometimes called Anderson's Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. Two of the three species were collected in the Gaspé Peninsula "all from the same pasture, and picked on the same day and measured at the same time by the same person with the same apparatus". The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish each species. Fisher's paper was published in the Annals of Eugenics (today the Annals of Human Genetics). == Use of the data set == Originally used as an example data set on which Fisher's linear discriminant analysis was applied, it became a typical test case for many statistical classification techniques in machine learning such as support vector machines. The use of this data set in cluster analysis however is not common, since the data set only contains two clusters with rather obvious separation. One of the clusters contains Iris setosa, while the other cluster contains both Iris virginica and Iris versicolor and is not separable without the species information Fisher used. This makes the data set a good example to explain the difference between supervised and unsupervised techniques in data mining: Fisher's linear discriminant model can only be obtained when the object species are known: class labels and clusters are not necessarily the same. Nevertheless, all three species of Iris are separable in the projection on the nonlinear and branching principal component. The data set is approximated by the closest tree with some penalty for the excessive number of nodes, bending and stretching. Then the so-called "metro map" is constructed. The data points are projected into the closest node. For each node the pie diagram of the projected points is prepared. The area of the pie is proportional to the number of the projected points. It is clear from the diagram (left) that the absolute majority of the samples of the different Iris species belong to the different nodes. Only a small fraction of Iris-virginica is mixed with Iris-versicolor (the mixed blue-green nodes in the diagram). Therefore, the three species of Iris (Iris setosa, Iris virginica and Iris versicolor) are separable by the unsupervising procedures of nonlinear principal component analysis. To discriminate them, it is sufficient just to select the corresponding nodes on the principal tree. == Data set == The data set contains a set of 150 records under five attributes: sepal length, sepal width, petal length, petal width and species. The iris data set is widely used as a beginner's data set for machine learning purposes. The data set is included in R base and Python in the machine learning library scikit-learn, so that users can access it without having to find a source for it. Several versions of the data set have been published. === R code illustrating usage === The example R code shown below reproduce the scatterplot displayed at the top of this article: === Python code illustrating usage === This code gives:

    Read more →
  • Human–robot interaction

    Human–robot interaction

    Human–robot interaction (HRI) is the study of interactions between humans and robots. Human–robot interaction is a multidisciplinary field with contributions from human–computer interaction, artificial intelligence, robotics, natural language processing, design, psychology and philosophy. A subfield known as physical human–robot interaction (pHRI) has tended to focus on device design to enable people to safely interact with robotic systems. == Origins == Human–robot interaction has been a topic of both science fiction and academic speculation even before any robots existed. Because much of active HRI development depends on natural language processing, many aspects of HRI are continuations of human communications, a field of research which is much older than robotics. The origin of HRI as a discrete problem was stated by 20th-century author Isaac Asimov in 1941, in his novel I, Robot. Asimov coined Three Laws of Robotics, namely: A robot may not injure a human being or, through inaction, allow a human being to come to harm. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law. A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws. These three laws provide an overview of the goals engineers and researchers hold for safety in the HRI field, although the fields of robot ethics and machine ethics are more complex than these three principles. However, generally human–robot interaction prioritizes the safety of humans that interact with potentially dangerous robotics equipment. Solutions to this problem range from the philosophical approach of treating robots as ethical agents (individuals with moral agency), to the practical approach of creating safety zones. These safety zones use technologies such as lidar to detect human presence or physical barriers to protect humans by preventing any contact between machine and operator. Although initially robots in the human–robot interaction field required some human intervention to function, research has expanded this to the extent that fully autonomous systems are now far more common than in the early 2000s. Autonomous systems include from simultaneous localization and mapping systems which provide intelligent robot movement to natural-language processing and natural-language generation systems which allow for natural, human-esque interaction which meet well-defined psychological benchmarks. Anthropomorphic robots (machines which imitate human body structure) are better described by the biomimetics field, but overlap with HRI in many research applications. Examples of robots which demonstrate this trend include Willow Garage's PR2 robot, the NASA Robonaut, and Honda ASIMO. However, robots in the human–robot interaction field are not limited to human-like robots: Paro and Kismet are both robots designed to elicit emotional response from humans, and so fall into the category of human–robot interaction. Goals in HRI range from industrial manufacturing through Cobots, medical technology through rehabilitation, autism intervention, and elder care devices, entertainment, human augmentation, and human convenience. Future research therefore covers a wide range of fields, much of which focuses on assistive robotics, robot-assisted search-and-rescue, and space exploration. == The goal of friendly human–robot interactions == Robots are artificial agents with capacities of perception and action in the physical world often referred by researchers as workspace. Their use has been generalized in factories but nowadays they tend to be found in the most technologically advanced societies in such critical domains as search and rescue, military battle, mine and bomb detection, scientific exploration, law enforcement, entertainment and hospital care. These new domains of applications imply a closer interaction with the user, sharing the workspace but also goals in terms of task achievement. The subfield of physical human–robot interaction (pHRI) has largely focused on device design to enable people to safely interact with robotic systems but is increasingly developing algorithmic approaches in an attempt to support fluent and expressive interactions between humans and robotic systems. With the advance in AI, the research is focusing on one part towards the safest physical interaction but also on a socially correct interaction, dependent on cultural criteria. The goal is to build an intuitive, and easy communication with the robot through speech, gestures, and facial expressions. Kerstin Dautenhahn refers to friendly Human–robot interaction as "Robotiquette" defining it as the "social rules for robot behaviour (a 'robotiquette') that is comfortable and acceptable to humans" The robot has to adapt itself to our way of expressing desires and orders and not the contrary. But every day environments such as homes have much more complex social rules than those implied by factories or even military environments. Thus, the robot needs perceiving and understanding capacities to build dynamic models of its surroundings. It needs to categorize objects, recognize and locate humans and further recognize their emotions. The need for dynamic capacities pushes forward every sub-field of robotics. Furthermore, by understanding and perceiving social cues, robots can enable collaborative scenarios with humans. For example, with the rapid rise of personal fabrication machines such as desktop 3D printers, laser cutters, etc., entering our homes, scenarios may arise where robots can collaboratively share control, co-ordinate and achieve tasks together. Industrial robots have already been integrated into industrial assembly lines and are collaboratively working with humans. The social impact of such robots have been studied and has indicated that workers still treat robots and social entities, rely on social cues to understand and work together. On the other end of HRI research the cognitive modelling of the "relationship" between human and the robots benefits the psychologists and robotic researchers the user study are often of interests on both sides. This research endeavours part of human society. For effective human – humanoid robot interaction numerous communication skills and related features should be implemented in the design of such artificial agents/systems. == General HRI research == HRI research spans a wide range of fields, some general to the nature of HRI. === Methods for perceiving humans === Methods for perceiving humans in the environment are based on sensor information. Research on sensing components and software led by Microsoft provide useful results for extracting the human kinematics (see Kinect). An example of older technique is to use colour information for example the fact that for light skinned people the hands are lighter than the clothes worn. In any case a human modelled a priori can then be fitted to the sensor data. The robot builds or has (depending on the level of autonomy the robot has) a 3D mapping of its surroundings to which is assigned the humans locations. Most methods intend to build a 3D model through vision of the environment. The proprioception sensors permit the robot to have information over its own state. This information is relative to a reference. Theories of proxemics may be used to perceive and plan around a person's personal space. A speech recognition system is used to interpret human desires or commands. By combining the information inferred by proprioception, sensor and speech the human position and state (standing, seated). In this matter, natural-language processing is concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural-language data. For instance, neural-network architectures and learning algorithms that can be applied to various natural-language processing tasks including part-of-speech tagging, chunking, named-entity recognition, and semantic role labeling. === Methods for motion planning === Motion planning in dynamic environments is a challenge that can at the moment only be achieved for robots with 3 to 10 degrees of freedom. Humanoid robots or even 2 armed robots, which can have up to 40 degrees of freedom, are unsuited for dynamic environments with today's technology. However lower-dimensional robots can use the potential field method to compute trajectories which avoid collisions with humans. === Cognitive models and theory of mind === Humans exhibit negative social and emotional responses as well as decreased trust toward some robots that closely, but imperfectly, resemble humans; this phenomenon has been termed the "Uncanny Valley". However recent research in telepresence robots has established that mimicking human body postures and expressive gestures has made the robots likeable and engaging in a remote setting. Further, the presence o

    Read more →
  • Logic learning machine

    Logic learning machine

    Logic learning machine (LLM) is a machine learning method based on the generation of intelligible rules. LLM is an efficient implementation of the Switching Neural Network (SNN) paradigm, developed by Marco Muselli, Senior Researcher at the Italian National Research Council CNR-IEIIT in Genoa. LLM has been employed in many different sectors, including the field of medicine (orthopedic patient classification, DNA micro-array analysis and Clinical Decision Support Systems), financial services and supply chain management. == History == The Switching Neural Network approach was developed in the 1990s to overcome the drawbacks of the most commonly used machine learning methods. In particular, black box methods, such as multilayer perceptron and support vector machine, had good accuracy but could not provide deep insight into the studied phenomenon. On the other hand, decision trees were able to describe the phenomenon but often lacked accuracy. Switching Neural Networks made use of Boolean algebra to build sets of intelligible rules able to obtain very good performance. In 2014, an efficient version of Switching Neural Network was developed and implemented in the Rulex suite with the name Logic Learning Machine. Also, an LLM version devoted to regression problems was developed. == General == Like other machine learning methods, LLM uses data to build a model able to perform a good forecast about future behaviors. LLM starts from a table including a target variable (output) and some inputs and generates a set of rules that return the output value y {\displaystyle y} corresponding to a given configuration of inputs. A rule is written in the form: if premise then consequence where consequence contains the output value whereas premise includes one or more conditions on the inputs. According to the input type, conditions can have different forms: for categorical variables the input value must be in a given subset: x 1 ∈ { A , B , C , . . . } {\displaystyle x_{1}\in \{A,B,C,...\}} . for ordered variables the condition is written as an inequality or an interval: x 2 ≤ α {\displaystyle x_{2}\leq \alpha } or β ≤ x 3 ≤ γ {\displaystyle \beta \leq x_{3}\leq \gamma } A possible rule is therefore in the form if x 1 ∈ { A , B , C , . . . } {\displaystyle x_{1}\in \{A,B,C,...\}} AND x 2 ≤ α {\displaystyle x_{2}\leq \alpha } AND β ≤ x 3 ≤ γ {\displaystyle \beta \leq x_{3}\leq \gamma } then y = y ¯ {\displaystyle y={\bar {y}}} == Types == According to the output type, different versions of the Logic Learning Machine have been developed: Logic Learning Machine for classification, when the output is a categorical variable, which can assume values in a finite set Logic Learning Machine for regression, when the output is an integer or real number.

    Read more →
  • Self-organizing map

    Self-organizing map

    A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher-dimensional data set while preserving the topological structure of the data. For example, a data set with p {\displaystyle p} variables measured in n {\displaystyle n} observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze. A SOM is a type of artificial neural network but is trained using competitive learning rather than the error-correction learning (e.g., backpropagation with gradient descent) used by other artificial neural networks. The SOM was introduced by the Finnish professor Teuvo Kohonen in the 1980s and therefore is sometimes called a Kohonen map or Kohonen network. The Kohonen map or network is a computationally convenient abstraction building on biological models of neural systems from the 1970s and morphogenesis models dating back to Alan Turing in the 1950s. SOMs create internal representations reminiscent of the cortical homunculus, a distorted representation of the human body, based on a neurological "map" of the areas and proportions of the human brain dedicated to processing sensory functions, for different parts of the body. == Overview == Self-organizing maps, like most artificial neural networks, operate in two modes: training and mapping. First, training uses an input data set (the "input space") to generate a lower-dimensional representation of the input data (the "map space"). Second, mapping classifies additional input data using the generated map. The goal of training is to represent an input space with p dimensions as a map space with n dimensions, where p > n. Specifically, an input space with p variables is said to have p dimensions. A map space consists of components called "nodes" or "neurons", which are arranged as a hexagonal or rectangular grid with two dimensions. The number of nodes and their arrangement are specified beforehand based on the larger goals of the analysis and exploration of the data. Each node in the map space is associated with a "weight" vector, which is the position of the node in the input space. While nodes in the map space stay fixed, training consists in moving weight vectors toward the input data (reducing a distance metric such as Euclidean distance) without spoiling the topology induced from the map space. After training, the map can be used to classify additional observations for the input space by finding the node with the closest weight vector (smallest distance metric) to the input space vector. == Learning algorithm == The goal of learning in the self-organizing map is to cause different parts of the network to respond similarly to certain input patterns. This is partly motivated by how visual, auditory or other sensory information is handled in separate parts of the cerebral cortex in the human brain. The weights of the neurons are initialized either to small random values or sampled evenly from the subspace spanned by the two largest principal component eigenvectors. With the latter alternative, learning is much faster because the initial weights already give a good approximation of SOM weights. The network must be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping. The examples are usually administered several times as iterations. The training utilizes competitive learning. When a training example is fed to the network, its Euclidean distance to all weight vectors is computed. The neuron whose weight vector is most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM grid are adjusted towards the input vector. The magnitude of the change decreases with time and with the grid-distance from the BMU. The update formula for a neuron v with weight vector Wv(s) is W v ( s + 1 ) = W v ( s ) + θ ( u , v , s ) ⋅ α ( s ) ⋅ ( D ( t ) − W v ( s ) ) {\displaystyle W_{v}(s+1)=W_{v}(s)+\theta (u,v,s)\cdot \alpha (s)\cdot (D(t)-W_{v}(s))} , where s is the step index, t is an index into the training sample, u is the index of the BMU for the input vector D(t), α(s) is a monotonically decreasing learning coefficient; θ(u, v, s) is the neighborhood function which gives the distance between the neuron u and the neuron v in step s. Depending on the implementations, t can scan the training data set systematically (t is 0, 1, 2...T-1, then repeat, T being the training sample's size), be randomly drawn from the data set (bootstrap sampling), or implement some other sampling method (such as jackknifing). The neighborhood function θ(u, v, s) (also called function of lateral interaction) depends on the grid-distance between the BMU (neuron u) and neuron v. In the simplest form, it is 1 for all neurons close enough to BMU and 0 for others, but the Gaussian and Mexican-hat functions are common choices, too. Regardless of the functional form, the neighborhood function shrinks with time. At the beginning when the neighborhood is broad, the self-organizing takes place on the global scale. When the neighborhood has shrunk to just a couple of neurons, the weights are converging to local estimates. In some implementations, the learning coefficient α and the neighborhood function θ decrease steadily with increasing s, in others (in particular those where t scans the training data set) they decrease in step-wise fashion, once every T steps. This process is repeated for each input vector for a (usually large) number of cycles λ. The network winds up associating output nodes with groups or patterns in the input data set. If these patterns can be named, the names can be attached to the associated nodes in the trained net. During mapping, there will be one single winning neuron: the neuron whose weight vector lies closest to the input vector. This can be simply determined by calculating the Euclidean distance between input vector and weight vector. While representing input data as vectors has been emphasized in this article, any kind of object which can be represented digitally, which has an appropriate distance measure associated with it, and in which the necessary operations for training are possible can be used to construct a self-organizing map. This includes matrices, continuous functions or even other self-organizing maps. === Algorithm === Randomize the node weight vectors in a map For s = 0 , 1 , 2 , . . . , λ {\displaystyle s=0,1,2,...,\lambda } Randomly pick an input vector D ( t ) {\displaystyle {D}(t)} Find the node in the map closest to the input vector. This node is the best matching unit (BMU). Denote it by u {\displaystyle u} For each node v {\displaystyle v} , update its vector by pulling it closer to the input vector: W v ( s + 1 ) = W v ( s ) + θ ( u , v , s ) ⋅ α ( s ) ⋅ ( D ( t ) − W v ( s ) ) {\displaystyle W_{v}(s+1)=W_{v}(s)+\theta (u,v,s)\cdot \alpha (s)\cdot (D(t)-W_{v}(s))} The variable names mean the following, with vectors in bold, s {\displaystyle s} is the current iteration λ {\displaystyle \lambda } is the iteration limit t {\displaystyle t} is the index of the target input data vector in the input data set D {\displaystyle \mathbf {D} } D ( t ) {\displaystyle {D}(t)} is a target input data vector v {\displaystyle v} is the index of the node in the map W v {\displaystyle \mathbf {W} _{v}} is the current weight vector of node v {\displaystyle v} u {\displaystyle u} is the index of the best matching unit (BMU) in the map θ ( u , v , s ) {\displaystyle \theta (u,v,s)} is the neighbourhood function, α ( s ) {\displaystyle \alpha (s)} is the learning rate schedule. The key design choices are the shape of the SOM, the neighbourhood function, and the learning rate schedule. The idea of the neighborhood function is to make it such that the BMU is updated the most, its immediate neighbors are updated a little less, and so on. The idea of the learning rate schedule is to make it so that the map updates are large at the start, and gradually stop updating. For example, if we want to learn a SOM using a square grid, we can index it using ( i , j ) {\displaystyle (i,j)} where both i , j ∈ 1 : N {\displaystyle i,j\in 1:N} . The neighborhood function can make it so that the BMU updates in full, the nearest neighbors update in half, and their neighbors update in half again, etc. θ ( ( i , j ) , ( i ′ , j ′ ) , s ) = 1 2 | i − i ′ | + | j − j ′ | = { 1 if i = i ′ , j = j ′ 1 / 2 if | i − i ′ | + | j − j ′ | = 1 1 / 4 if | i − i ′ | + | j − j ′ | = 2 ⋯ ⋯ {\displaystyle \theta ((i,j),(i',j'),s)={\frac {1}{2^{|i-i'|+|j-j'|}}}={\begin{cases}1&{\text{if }}i=i',j=j'\\1/2&{\text{if

    Read more →
  • Jpred

    Jpred

    Jpred v.4 is the latest version of the JPred Protein Secondary Structure Prediction Server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction, that has existed since 1998 in different versions. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 134 000 jobs per month and has carried out over 2 million predictions in total for users in 179 countries. == JPred 2 == The static HTML pages of JPred 2 are still available for reference. == JPred 3 == The JPred v3 followed on from previous versions of JPred developed and maintained by James Cuff and Jonathan Barber (see JPred References). This release added new functionality and fixed many bugs. The highlights are: New, friendlier user interface Retrained and optimised version of Jnet (v2) - mean secondary structure prediction accuracy of >81% Batch submission of jobs Better error checking of input sequences/alignments Predictions now (optionally) returned via e-mail Users may provide their own query names for each submission JPred now makes a prediction even when there are no PSI-BLAST hits to the query PS/PDF output now incorporates all the predictions == JPred 4 == The current version of JPred (v4) has the following improvements and updates incorporated: Retrained on the latest UniRef90 and SCOPe/ASTRAL version of Jnet (v2.3.1) - mean secondary structure prediction accuracy of >82%. Upgraded the Web Server to the latest technologies (Bootstrap framework, JavaScript) and updating the web pages – improving the design and usability through implementing responsive technologies. Added RESTful API and mass-submission and results retrieval scripts - resulting in peak throughput above 20,000 predictions per day. Added prediction jobs monitoring tools. Upgraded the results reporting – both, on the web-site, and through the optional email summary reports: improved batch submission, added results summary preview through Jalview results visualization summary in SVG and adding full multiple sequence alignments into the reports. Improved help-pages, incorporating tool-tips, and adding one-page step-by-step tutorials. Sequence residues are categorised or assigned to one of the secondary structure elements, such as alpha-helix, beta-sheet and coiled-coil. Jnet uses two neural networks for its prediction. The first network is fed with a window of 17 residues over each amino acid in the alignment plus a conservation number. It uses a hidden layer of nine nodes and has three output nodes, one for each secondary structure element. The second network is fed with a window of 19 residues (the result of first network) plus the conservation number. It has a hidden layer with nine nodes and has three output nodes.

    Read more →
  • Cybernetic Serendipity

    Cybernetic Serendipity

    Cybernetic Serendipity was an exhibition of cybernetic art curated by Jasia Reichardt, shown at the Institute of Contemporary Arts, London, England, from 2 August to 20 October 1968, and then toured across the United States. Two stops in the United States were the Corcoran Annex (Corcoran Gallery of Art), Washington, D.C., from 16 July to 31 August 1969, and the newly opened Exploratorium in San Francisco, from 1 November to 18 December 1969. == Content == One part of the exhibition was concerned with algorithms and devices for generating music. Some exhibits were pamphlets describing the algorithms, whilst others showed musical notation produced by computers. Devices made musical effects and played tapes of sounds made by computers. Peter Zinovieff lent part of his studio equipment - visitors could sing or whistle a tune into a microphone and his equipment would improvise a piece of music based on the tune. Another part described computer projects such as Gustav Metzger's self-destructive Five Screens With Computer, a design for a new hospital, a computer programmed structure, and dance choreography. The machines and installations were a very noticeable part of the exhibition. Gordon Pask produced a collection of large mobiles (Colloquy of Mobiles (1968)) with interacting parts that let the viewers join in the conversation. Many machines formed kinetic environments or displayed moving images. Bruce Lacey contributed his radio-controlled robots and a light-sensitive owl. Nam June Paik was represented by Robot K-456 and televisions with distorted images. Jean Tinguely provided two of his painting machines. Edward Ihnatowicz's biomorphic hydraulic ear (Sound Activated Mobile (SAM, 1968)) turned toward sounds and John Billingsley's Albert 1967 turned to face light. Wen-Ying Tsai presented his interactive cybernetic sculptures of vibrating stainless-steel rods, stroboscopic light, and audio feedback control. Several artists exhibited machines that drew patterns that the visitor could take away, or involved visitors in games. Cartoonist Rowland Emett designed the mechanical computer Forget-me-not, which was commissioned by Honeywell. Another section explored the computer's ability to produce text - both essays and poetry. Different programs produced Haiku, children's stories, and essays. One of the first computer-generated poems, by Alison Knowles and James Tenney, was included in the exhibition and catalogue. Computer-generated movies were represented by John Whitney's Permutations and a Bell Labs movie on their technology for producing movies. Some samples included images of tesseracts rotating in four dimensions, a satellite orbiting the Earth, and an animated data structure. Computer graphics were also represented, including pictures produced on cathode ray oscilloscopes and digital plotters. There was a variety of posters and graphics demonstrating the power of computers to do complex (and apparently random) calculations. Other graphics showed a simulated Mondrian and the iconic decreasing squares spiral that appeared on the exhibition's poster and book. The Boeing Company exhibited their use of wireframe graphics. The innovative computer-generated sculpture, Quad 1, was displayed at the Cybernetic Serendipity exhibit. Created by the American abstract expressionist sculptor, Robert Mallary, in 1968, Quad 1 is widely believed to be the world's first Computer Aided Design sculpture. Keith Albarn & Partners contributed to the design of the exhibition. Reflecting the prominence of music in the show, a ten-track album Cybernetic Serendipity Music was released by the ICA to accompany the show. Artists featured included Iannis Xenakis, John Cage, and Peter Zinovieff, a detail of whose graphic score for 'Four Sacred April Rounds’ (1968) was used as the cover artwork. == Attendance == Time magazine noted that there had been 40,000 visitors to the London exhibition. Other reports suggested visitor numbers were as high as 44,000 to 60,000. However, the ICA did not accurately count visitors. == After-effects == The exhibition provided the energy for the formation of British Computer Arts Society which continued to explore the interaction between science, technology and art, and put on exhibitions (for example Event One at the Royal College of Art). Several pieces were purchased by the Exploratorium in 1971, some of which are on display to this day. In 2014 the ICA held a retrospective exhibition Cybernetic Serendipity: A Documentation which included documents, installation photographs, press reviews and publications and a series of discussions in one of which Peter Zinovieff took part. To coincide with the exhibition, Cybernetic Serendipity Music was re-released as a limited-edition vinyl LP by The Vinyl Factory. The Victoria and Albert Museum marked the 50th anniversary with an exhibition in 2018 entitled "Chance and Control: Art in the Age of Computers". The V&A exhibition included many works by artists who featured in the original ICA show, plus related ephemera. "Chance and Control" subsequently toured to Chester Visual Arts and Firstsite, Colchester. In 2020, The Centre Pompidou exhibited the replica of Gordon Pask's 1968 Colloquy of Mobiles, reproduced by Paul Pangaro and TJ McLeish in 2018. In 2022 the Australian National University's School of Cybernetics launched the school by presenting an exhibition Australian Cybernetic: a point through time. The exhibition included works from Cybernetic Serendipity (1968), Australia ‘75: Festival of Creative Arts and Science (1975), and contemporary pieces curated by the School of Cybernetics. In describing Reichardt's Cybernetic Serendipity exhibition the school stated that it "represented points of expanding the cybernetic imagination" and was a "ground-breaking" "glimpse of a future in which computers were entangled with people and cultures, and through this she fashioned a blueprint for the future of computing that has since inspired generations".

    Read more →
  • C4.5 algorithm

    C4.5 algorithm

    C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. In 2011, authors of the Weka machine learning software described the C4.5 algorithm as "a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date". It became quite popular after ranking #1 in the Top 10 Algorithms in Data Mining pre-eminent paper published by Springer LNCS in 2008. == Algorithm == C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy. The training data is a set S = s 1 , s 2 , . . . {\displaystyle S={s_{1},s_{2},...}} of already classified samples. Each sample s i {\displaystyle s_{i}} consists of a p-dimensional vector ( x 1 , i , x 2 , i , . . . , x p , i ) {\displaystyle (x_{1,i},x_{2,i},...,x_{p,i})} , where the x j {\displaystyle x_{j}} represent attribute values or features of the sample, as well as the class in which s i {\displaystyle s_{i}} falls. At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurses on the partitioned sublists. This algorithm has a few base cases. All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class. None of the features provide any information gain. In this case, C4.5 creates a decision node higher up the tree using the expected value of the class. Instance of previously unseen class encountered. Again, C4.5 creates a decision node higher up the tree using the expected value. === Pseudocode === In pseudocode, the general algorithm for building decision trees is: Check for the above base cases. For each attribute a, find the normalized information gain ratio from splitting on a. Let a_best be the attribute with the highest normalized information gain. Create a decision node that splits on a_best. Recurse on the sublists obtained by splitting on a_best, and add those nodes as children of node. == Improvements from ID3 algorithm == C4.5 made a number of improvements to ID3. Some of these are: Handling both continuous and discrete attributes: In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Handling training data with missing attribute values: C4.5 allows attribute values to be marked as missing. Missing attribute values are simply not used in gain and entropy calculations. Handling attributes with differing costs. Pruning trees after creation: C4.5 goes back through the tree once it's been created and attempts to remove branches that do not help by replacing them with leaf nodes. == Improvements in C5.0/See5 algorithm == Quinlan went on to create C5.0 and See5 (C5.0 for Unix/Linux, See5 for Windows) which he markets commercially. C5.0 offers a number of improvements on C4.5. Some of these are: Speed - C5.0 is significantly faster than C4.5 (several orders of magnitude) Memory usage - C5.0 is more memory efficient than C4.5 Smaller decision trees - C5.0 gets similar results to C4.5 with considerably smaller decision trees. Support for boosting - Boosting improves the trees and gives them more accuracy. Weighting - C5.0 allows you to weight different cases and misclassification types. Winnowing - a C5.0 option automatically winnows the attributes to remove those that may be unhelpful. Source for a single-threaded Linux version of C5.0 is available under the GNU General Public License (GPL).

    Read more →
  • Waffles (machine learning)

    Waffles (machine learning)

    Waffles is a collection of command-line tools for performing machine learning operations developed at Brigham Young University. These tools are written in C++, and are available under the GNU Lesser General Public License. == Description == The Waffles machine learning toolkit contains command-line tools for performing various operations related to machine learning, data mining, and predictive modeling. The primary focus of Waffles is to provide tools that are simple to use in scripted experiments or processes. For example, the supervised learning algorithms included in Waffles are all designed to support multi-dimensional labels, classification and regression, automatically impute missing values, and automatically apply necessary filters to transform the data to a type that the algorithm can support, such that arbitrary learning algorithms can be used with arbitrary data sets. Many other machine learning toolkits provide similar functionality, but require the user to explicitly configure data filters and transformations to make it compatible with a particular learning algorithm. The algorithms provided in Waffles also have the ability to automatically tune their own parameters (with the cost of additional computational overhead). Because Waffles is designed for script-ability, it deliberately avoids presenting its tools in a graphical environment. It does, however, include a graphical "wizard" tool that guides the user to generate a command that will perform a desired task. This wizard does not actually perform the operation, but requires the user to paste the command that it generates into a command terminal or a script. The idea motivating this design is to prevent the user from becoming "locked in" to a graphical interface. All of the Waffles tools are implemented as thin wrappers around functionality in a C++ class library. This makes it possible to convert scripted processes into native applications with minimal effort. Waffles was first released as an open source project in 2005. Since that time, it has been developed at Brigham Young University, with a new version having been released approximately every 6–9 months. Waffles is not an acronym—the toolkit was named after the food for historical reasons. == Advantages == Some of the advantages of Waffles in contrast with other popular open source machine learning toolkits include: Waffles automatically takes care of many issues related to data format in order to simplify its tools. Because it is implemented in C++, many of its algorithms are particularly fast. Also, the lack of dependency on any virtual machine makes it easier to deploy in conjunction with other applications. The functionality included in Waffles is very broad, including algorithms for dimensionality reduction, collaborative filtering, visualization, clustering, supervised learning, optimization, linear algebra, data transformation, image and signal processing, policy learning, and sparse matrix operations. == Disadvantages == Although Waffles provides significant breadth, it lacks the depth of many toolkits that focus on a particular area of machine learning. The Weka (machine learning) toolkit, for example, provides many more classification algorithms than Waffles provides. Waffles only has a limited graphical interface.

    Read more →
  • Multispectral pattern recognition

    Multispectral pattern recognition

    Multispectral remote sensing is the collection and analysis of reflected, emitted, or back-scattered energy from an object or an area of interest in multiple bands of regions of the electromagnetic spectrum (Jensen, 2005). Subcategories of multispectral remote sensing include hyperspectral, in which hundreds of bands are collected and analyzed, and ultraspectral remote sensing where many hundreds of bands are used (Logicon, 1997). The main purpose of multispectral imaging is the potential to classify the image using multispectral classification. This is a much faster method of image analysis than is possible by human interpretation. == Multispectral remote sensing systems == Remote sensing systems gather data via instruments typically carried on satellites in orbit around the Earth. The remote sensing scanner detects the energy that radiates from the object or area of interest. This energy is recorded as an analog electrical signal and converted into a digital value though an A-to-D conversion. There are several multispectral remote sensing systems that can be categorized in the following way: === Multispectral imaging using discrete detectors and scanning mirrors === Landsat Multispectral Scanner (MSS) Landsat Thematic Mapper (TM) NOAA Geostationary Operational Environmental Satellite (GOES) NOAA Advanced Very High Resolution Radiometer (AVHRR) NASA and ORBIMAGE, Inc., Sea-viewing Wide field-of-view Sensor (SeaWiFS) Daedalus, Inc., Aircraft Multispectral Scanner (AMS) NASA Airborne Terrestrial Applications Sensor (ATLAS) === Multispectral imaging using linear arrays === SPOT 1, 2, and 3 High Resolution Visible (HRV) sensors and Spot 4 and 5 High Resolution Visible Infrared (HRVIR) and vegetation sensor Indian Remote Sensing System (IRS) Linear Imaging Self-scanning Sensor (LISS) Space Imaging, Inc. (IKONOS) Digital Globe, Inc. (QuickBird) ORBIMAGE, Inc. (OrbView-3) ImageSat International, Inc. (EROS A1) NASA Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) NASA Terra Multiangle Imaging Spectroradiometer (MISR) === Imaging spectrometry using linear and area arrays === NASA Jet Propulsion Laboratory Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Compact Airborne Spectrographic Imager 3 (CASI 3) NASA Terra Moderate Resolution Imaging Spectrometer (MODIS) NASA Earth Observer (EO-1) Advanced Land Imager (ALI), Hyperion, and LEISA Atmospheric Corrector (LAC) === Satellite analog and digital photographic systems === Russian SPIN-2 TK-350, and KVR-1000 NASA Space Shuttle and International Space Station Imagery == Multispectral classification methods == A variety of methods can be used for the multispectral classification of images: Algorithms based on parametric and nonparametric statistics that use ratio-and interval-scaled data and nonmetric methods that can also incorporate nominal scale data (Duda et al., 2001), Supervised or unsupervised classification logic, Hard or soft (fuzzy) set classification logic to create hard or fuzzy thematic output products, Per-pixel or object-oriented classification logic, and Hybrid approaches == Supervised classification == In this classification method, the identity and location of some of the land-cover types are obtained beforehand from a combination of fieldwork, interpretation of aerial photography, map analysis, and personal experience. The analyst would locate sites that have similar characteristics to the known land-cover types. These areas are known as training sites because the known characteristics of these sites are used to train the classification algorithm for eventual land-cover mapping of the remainder of the image. Multivariate statistical parameters (means, standard deviations, covariance matrices, correlation matrices, etc.) are calculated for each training site. All pixels inside and outside of the training sites are evaluated and allocated to the class with the more similar characteristics. === Classification scheme === The first step in the supervised classification method is to identify the land-cover and land-use classes to be used. Land-cover refers to the type of material present on the site (e.g. water, crops, forest, wet land, asphalt, and concrete). Land-use refers to the modifications made by people to the land cover (e.g. agriculture, commerce, settlement). All classes should be selected and defined carefully to properly classify remotely sensed data into the correct land-use and/or land-cover information. To achieve this purpose, it is necessary to use a classification system that contains taxonomically correct definitions of classes. If a hard classification is desired, the following classes should be used: Mutually exclusive: there is not any taxonomic overlap of any classes (i.e., rain forest and evergreen forest are distinct classes). Exhaustive: all land-covers in the area have been included. Hierarchical: sub-level classes (e.g., single-family residential, multiple-family residential) are created, allowing that these classes can be included in a higher category (e.g., residential). Some examples of hard classification schemes are: American Planning Association Land-Based Classification System United States Geological Survey Land-use/Land-cover Classification System for Use with Remote Sensor Data U.S. Department of the Interior Fish and Wildlife Service U.S. National Vegetation and Classification System International Geosphere-Biosphere Program IGBP Land Cover Classification System === Training sites === Once the classification scheme is adopted, the image analyst may select training sites in the image that are representative of the land-cover or land-use of interest. If the environment where the data was collected is relatively homogeneous, the training data can be used. If different conditions are found in the site, it would not be possible to extend the remote sensing training data to the site. To solve this problem, a geographical stratification should be done during the preliminary stages of the project. All differences should be recorded (e.g. soil type, water turbidity, crop species, etc.). These differences should be recorded on the imagery and the selection training sites made based on the geographical stratification of this data. The final classification map would be a composite of the individual stratum classifications. After the data are organized in different training sites, a measurement vector is created. This vector would contain the brightness values for each pixel in each band in each training class. The mean, standard deviation, variance-covariance matrix, and correlation matrix are calculated from the measurement vectors. Once the statistics from each training site are determined, the most effective bands for each class should be selected. The objective of this discrimination is to eliminate the bands that can provide redundant information. Graphical and statistical methods can be used to achieve this objective. Some of the graphic methods are: Bar graph spectral plots Cospectral mean vector plots Feature space plots Cospectral parallelepiped or ellipse plots === Classification algorithm === The last step in supervised classification is selecting an appropriate algorithm. The choice of a specific algorithm depends on the input data and the desired output. Parametric algorithms are based on the fact that the data is normally distributed. If the data is not normally distributed, nonparametric algorithms should be used. The more common nonparametric algorithms are: One-dimensional density slicing Parallelipiped Minimum distance Nearest-neighbor Expert system analysis Convolutional neural network == Unsupervised classification == Unsupervised classification (also known as clustering) is a method of partitioning remote sensor image data in multispectral feature space and extracting land-cover information. Unsupervised classification require less input information from the analyst compared to supervised classification because clustering does not require training data. This process consists in a series of numerical operations to search for the spectral properties of pixels. From this process, a map with m spectral classes is obtained. Using the map, the analyst tries to assign or transform the spectral classes into thematic information of interest (i.e. forest, agriculture, urban). This process may not be easy because some spectral clusters represent mixed classes of surface materials and may not be useful. The analyst has to understand the spectral characteristics of the terrain to be able to label clusters as a specific information class. There are hundreds of clustering algorithms. Two of the most conceptually simple algorithms are the chain method and the ISODATA method. === Chain method === The algorithm used in this method operates in a two-pass mode (it passes through the multispectral dataset two times. In the first pass, the program reads through the dataset and sequentially builds clusters (groups of p

    Read more →
  • Logic form

    Logic form

    Logic forms are simple, first-order logic knowledge representations of natural language sentences formed by the conjunction of concept predicates related through shared arguments. Each noun, verb, adjective, adverb, pronoun, preposition and conjunction generates a predicate. Logic forms can be decorated with word senses to disambiguate the semantics of the word. There are two types of predicates: events are marked with e, and entities are marked with x. The shared arguments connect the subjects and objects of verbs and prepositions together. Example input/output might look like this: Input: The Earth provides the food we eat every day. Output: Earth:n_#1(x1) provide:v_#2(e1, x1, x2) food:n_#1(x2) we(x3) eat:v_#1(e2, x3, x2; x4) day:n_#1(x4) Logic forms are used in some natural language processing techniques, such as question answering, as well as in inference both for database systems and QA systems.

    Read more →
  • Spatial Analysis of Principal Components

    Spatial Analysis of Principal Components

    Spatial Principal Component Analysis (sPCA) is a multivariate statistical technique that complements the traditional Principal Component Analysis (PCA) by incorporating spatial information into the analysis of genetic variation. While traditional PCA can be used to find spatial patterns, it focuses on reducing data dimensionality by identifying uncorrelated principal components that capture maximum variance, thus often lacking power to identify non-trivial spatial genetic patterns. By accounting for spatial autocorrelation, sPCA is able to uncover spatial patterns in the data and find the spatial structure of datasets where observations are either geographically or topologically linked. This statistical power improvement allows the investigation of cryptic spatial patterns of genetic variability otherwise overlooked. sPCA has been applied in various fields, including geography, ecology and genetics. == History == sPCA was introduced in 2008 by Thibaut Jombart, Sébastien Devillard, Anne-Béatrice Dufour, and D. Pontier as a spatially explicit method to investigate the spatial pattern of genetic variation among individuals or populations. In 2017, Valeria Montano and Thibaut Jombart published an alternative non-parametric test to evaluate the significance of global and local spatial genetic patterns with improved statistical power. == Details == sPCA modifies the PCA framework by integrating spatial weights, typically in the form of connectivity matrices or spatial adjacency graphs. It identifies principal components (PCs) that maximize both genentic variance and spatial autocorreation, as measured by Moran's I. These weights represent relationships between observations based on geographic distance or other spatial criteria. The method decomposes variance into two components: Global structures, correspond to positive autocorrelation, that is, reflect broad-scale spatial patterns where similar values cluster over large regions. Local structures, correspond to negative autocorrelation, that is, capture fine-scale spatial variations or localized patterns. The core of sPCA relies on the eigenanalysis of a spatially weighted covariance or correlation matrix. The spatial weight matrix can be constructed using techniques such as Delaunay triangulation, nearest-neighbor graphs, or distance-based criteria. Applications of sPCA should be used only as an explorative tool. == Applications == sPCA has been widely used in many fields, including: Ecology: To find spatial patterns in species distributions and environmental gradients. Genetics: Population structure and gene flow analysis while allowing for spatial autocorrelation considerations. Biogeography: To identify historical dispersal routes, and barriers to gene flow, providing insights into species distribution patterns and evolutionary history. == Software/Source Code == sPCA implementations are available in R in adegenet and ntbox . These tools facilitate the application of sPCA by providing functions for constructing spatial weight matrices, performing eigenanalysis, and obtaining spatial principal components in an easy-to-read form.

    Read more →
  • Natarajan dimension

    Natarajan dimension

    In the theory of Probably Approximately Correct Machine Learning, the Natarajan dimension characterizes the complexity of learning a set of functions, generalizing from the Vapnik–Chervonenkis dimension for boolean functions to multi-class functions. Originally introduced as the Generalized Dimension by Natarajan, it was subsequently renamed the Natarajan Dimension by Haussler and Long. == Definition == Let H {\displaystyle H} be a set of functions from a set X {\displaystyle X} to a set Y {\displaystyle Y} . H {\displaystyle H} shatters a set C ⊂ X {\displaystyle C\subset X} if there exist two functions f 0 , f 1 ∈ H {\displaystyle f_{0},f_{1}\in H} such that For every x ∈ C , f 0 ( x ) ≠ f 1 ( x ) {\displaystyle x\in C,f_{0}(x)\neq f_{1}(x)} . For every B ⊂ C {\displaystyle B\subset C} , there exists a function h ∈ H {\displaystyle h\in H} such that for all x ∈ B , h ( x ) = f 0 ( x ) {\displaystyle x\in B,h(x)=f_{0}(x)} and for all x ∈ C − B , h ( x ) = f 1 ( x ) {\displaystyle x\in C-B,h(x)=f_{1}(x)} . The Natarajan dimension of H is the maximal cardinality of a set shattered by H {\displaystyle H} . It is easy to see that if | Y | = 2 {\displaystyle |Y|=2} , the Natarajan dimension collapses to the Vapnik–Chervonenkis dimension. Shalev-Shwartz and Ben-David present comprehensive material on multi-class learning and the Natarajan dimension, including uniform convergence and learnability. Recently, Cohen et al showed that the Natarajan dimension is the dominant term governing agnostic multi-class PAC learnability.

    Read more →