AI Art Or Not

AI Art Or Not — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Subvocal recognition

    Subvocal recognition

    Subvocal recognition (SVR) is the process of taking subvocalization and converting the detected results to a digital output, aural or text-based. A silent speech interface is a device that allows speech communication without using the sound made when people vocalize their speech sounds. It works by the computer identifying the phonemes that an individual pronounces from nonauditory sources of information about their speech movements. These are then used to recreate the speech using speech synthesis. == Input methods == Silent speech interface systems have been created using ultrasound and optical camera input of tongue and lip movements. Electromagnetic devices are another technique for tracking tongue and lip movements. The detection of speech movements by electromyography of speech articulator muscles and the larynx is another technique. Another source of information is the vocal tract resonance signals that get transmitted through bone conduction called non-audible murmurs. They have also been created as a brain–computer interface using brain activity in the motor cortex obtained from intracortical microelectrodes. == Uses == Such devices are created as aids to those unable to create the sound phonation needed for audible speech such as after laryngectomies. Another use is for communication when speech is masked by background noise or distorted by self-contained breathing apparatus. A further practical use is where a need exists for silent communication, such as when privacy is required in a public place, or hands-free data silent transmission is needed during a military or security operation. In 2002, the Japanese company NTT DoCoMo announced it had created a silent mobile phone using electromyography and imaging of lip movement. The company stated that "the spur to developing such a phone was ridding public places of noise," adding that, "the technology is also expected to help people who have permanently lost their voice." The feasibility of using silent speech interfaces for practical communication has since then been shown. In 2019, Arnav Kapur, a researcher from the Massachusetts Institute of Technology, conducted a study known as AlterEgo. Its implementation of the silent speech interface enables direct communication between the human brain and external devices through stimulation of the speech muscles. By leveraging neural signals associated with speech and language, the AlterEgo system deciphers the user's intended words and translates them into text or commands without the need for audible speech. == Research and patents == With a grant from the U.S. Army, research into synthetic telepathy using subvocalization is taking place at the University of California, Irvine under lead scientist Mike D'Zmura. NASA's Ames Research Laboratory in Mountain View, California, under the supervision of Charles Jorgensen is conducting subvocalization research. The Brain Computer Interface R&D program at Wadsworth Center under the New York State Department of Health has confirmed the existing ability to decipher consonants and vowels from imagined speech, which allows for brain-based communication using imagined speech, however using EEGs instead of subvocalization techniques. US Patents on silent communication technologies include: US Patent 6587729 "Apparatus for audibly communicating speech using the radio frequency hearing effect", US Patent 5159703 "Silent subliminal presentation system", US Patent 6011991 "Communication system and method including brain wave analysis and/or use of brain activity", US Patent 3951134 "Apparatus and method for remotely monitoring and altering brain waves". Latter two rely on brain wave analysis. == In fiction == The decoding of silent speech using a computer played an important role in Arthur C. Clarke's story and Stanley Kubrick's associated film A Space Odyssey. In this, HAL 9000, a computer controlling spaceship Discovery One, bound for Jupiter, discovers a plot to deactivate it by the mission astronauts Dave Bowman and Frank Poole through lip reading their conversations. In Orson Scott Card's series (including Ender's Game), the artificial intelligence can be spoken to while the protagonist wears a movement sensor in his jaw, enabling him to converse with the AI without making noise. He also wears an ear implant. In Speaker for the Dead and subsequent novels, author Orson Scott Card described an ear implant, called a "jewel", that allows subvocal communication with computer systems. Author Robert J. Sawyer made use of subvocal recognition to allow silent commands to the cybernetic 'companion implants' used by the advanced Neanderthal characters in his Neanderthal Parallax trilogy of science fiction novels. In Earth, David Brin depicts this technology and its uses as a normal gear in the near future. In Down and Out in the Magic Kingdom, Cory Doctorow has cellphone technology become silent through a cochlear implant and miking the throat to pick up subvocalization. William Gibson's Sprawl Trilogy frequently uses sub-vocalization systems in various devices. In Kage Baker's Company novels, the immortal cyborgs communicate subvocally. In the Hugo Award-winning Hyperion Cantos by Dan Simmons, the characters often use subvocalization to communicate. In the Culture novels by Iain M. Banks, more highly advanced species often communicate subvocally through their technology. In Deus Ex: Human Revolution (2011), the protagonist is augmented with a subvocalization implant for sending covert communications (and a corresponding cochlear implant for receiving covert communications). In the tabletop RPG and video game series Shadowrun, player characters can communicate via subvocal microphones in some instances. In Paranoia, all citizens can speak to the computer via their "cerebral cortech" implants. Alistair Reynolds Revelation Space trilogy frequently uses sub-vocalization systems in various devices.

    Read more →
  • Reservoir computing

    Reservoir computing

    Reservoir computing is a framework for computation derived from recurrent neural network theory that maps input signals into higher dimensional computational spaces through the dynamics of a fixed, non-linear system called a reservoir. After the input signal is fed into the reservoir, which is treated as a "black box," a simple readout mechanism is trained to read the state of the reservoir and map it to the desired output. The first key benefit of this framework is that training is performed only at the readout stage, as the reservoir dynamics are fixed. The second is that the computational power of naturally available systems, both classical and quantum mechanical, can be used to reduce the effective computational cost. == History == The first examples of reservoir neural networks demonstrated that randomly connected recurrent neural networks could be used for sensorimotor sequence learning, and simple forms of interval and speech discrimination. In these early models the memory in the network took the form of both short-term synaptic plasticity and activity mediated by recurrent connections. In other early reservoir neural network models the memory of the recent stimulus history was provided solely by the recurrent activity. Overall, the general concept of reservoir computing stems from the use of recursive connections within neural networks to create a complex dynamical system. It is a generalisation of earlier neural network architectures such as recurrent neural networks, liquid-state machines and echo-state networks. Reservoir computing also extends to physical systems that are not networks in the classical sense, but rather continuous systems in space and/or time: e.g. a literal "bucket of water" can serve as a reservoir that performs computations on inputs given as perturbations of the surface. The resultant complexity of such recurrent neural networks was found to be useful in solving a variety of problems including language processing and dynamic system modeling. However, training of recurrent neural networks is challenging and computationally expensive. Reservoir computing reduces those training-related challenges by fixing the dynamics of the reservoir and only training the linear output layer. A large variety of nonlinear dynamical systems can serve as a reservoir that performs computations. In recent years semiconductor lasers have attracted considerable interest as computation can be fast and energy efficient compared to electrical components. Recent advances in both AI and quantum information theory have given rise to the concept of quantum neural networks. These hold promise in quantum information processing, which is challenging to classical networks, but can also find application in solving classical problems. In 2018, a physical realization of a quantum reservoir computing architecture was demonstrated in the form of nuclear spins within a molecular solid. However, the nuclear spin experiments in did not demonstrate quantum reservoir computing per se as they did not involve processing of sequential data. Rather the data were vector inputs, which makes this more accurately a demonstration of quantum implementation of a random kitchen sink algorithm (also going by the name of extreme learning machines in some communities). In 2019, another possible implementation of quantum reservoir processors was proposed in the form of two-dimensional fermionic lattices. In 2020, realization of reservoir computing on gate-based quantum computers was proposed and demonstrated on cloud-based IBM superconducting near-term quantum computers. Reservoir computers have been used for time-series analysis purposes. In particular, some of their usages involve chaotic time-series prediction, separation of chaotic signals, and link inference of networks from their dynamics. == Classical reservoir computing == === Reservoir === The 'reservoir' in reservoir computing is the internal structure of the computer, and must have two properties: it must be made up of individual, non-linear units, and it must be capable of storing information. The non-linearity describes the response of each unit to input, which is what allows reservoir computers to solve complex problems. Reservoirs are able to store information by connecting the units in recurrent loops, where the previous input affects the next response. The change in reaction due to the past allows the computers to be trained to complete specific tasks. Reservoirs can be virtual or physical. Virtual reservoirs are typically randomly generated and are designed like neural networks. Virtual reservoirs can be designed to have non-linearity and recurrent loops, but, unlike neural networks, the connections between units are randomized and remain unchanged throughout computation. Physical reservoirs are possible because of the inherent non-linearity of certain natural systems. The interaction between ripples on the surface of water contains the nonlinear dynamics required in reservoir creation, and a pattern recognition RC was developed by first inputting ripples with electric motors then recording and analyzing the ripples in the readout. === Readout === The readout is a neural network layer that performs a linear transformation on the output of the reservoir. The weights of the readout layer are trained by analyzing the spatiotemporal patterns of the reservoir after excitation by known inputs, and by utilizing a training method such as a linear regression or a Ridge regression. As its implementation depends on spatiotemporal reservoir patterns, the details of readout methods are tailored to each type of reservoir. For example, the readout for a reservoir computer using a container of liquid as its reservoir might entail observing spatiotemporal patterns on the surface of the liquid. === Types === ==== Context reverberation network ==== An early example of reservoir computing was the context reverberation network. In this architecture, an input layer feeds into a high dimensional dynamical system which is read out by a trainable single-layer perceptron. Two kinds of dynamical system were described: a recurrent neural network with fixed random weights, and a continuous reaction–diffusion system inspired by Alan Turing's model of morphogenesis. At the trainable layer, the perceptron associates current inputs with the signals that reverberate in the dynamical system; the latter were said to provide a dynamic "context" for the inputs. In the language of later work, the reaction–diffusion system served as the reservoir. ==== Echo state network ==== The tree echo state network (TreeESN) model represents a generalization of the reservoir computing framework to tree structured data. ==== Liquid-state machine ==== Chaotic liquid state machine The liquid (i.e. reservoir) of a chaotic liquid state machine (CLSM), or chaotic reservoir, is made from chaotic spiking neurons but which stabilize their activity by settling to a single hypothesis that describes the trained inputs of the machine. This is in contrast to general types of reservoirs that don't stabilize. The liquid stabilization occurs via synaptic plasticity and chaos control that govern neural connections inside the liquid. CLSM showed promising results in learning sensitive time series data. ==== Nonlinear transient computation ==== This type of information processing is most relevant when time-dependent input signals depart from the mechanism's internal dynamics. These departures cause transients or temporary altercations which are represented in the device's output. ==== Deep reservoir computing ==== The extension of the reservoir computing framework towards deep learning, with the introduction of deep reservoir computing and of the deep echo state network (DeepESN) model allows to develop efficiently trained models for hierarchical processing of temporal data, at the same time enabling the investigation on the inherent role of layered composition in recurrent neural networks. == Quantum reservoir computing == Quantum reservoir computing may use the nonlinear nature of quantum mechanical interactions or processes to form the characteristic nonlinear reservoirs but may also be done with linear reservoirs when the injection of the input to the reservoir creates the nonlinearity. The marriage of machine learning and quantum devices is leading to the emergence of quantum neuromorphic computing as a new research area. === Types === ==== Gaussian states of interacting quantum harmonic oscillators ==== Gaussian states are a paradigmatic class of states of continuous variable quantum systems. Although they can nowadays be created and manipulated in, e.g, state-of-the-art optical platforms, naturally robust to decoherence, it is well-known that they are not sufficient for, e.g., universal quantum computing because transformations that preserve the Gaussian nature of a state are linear. Normally, linear dynamics would not be sufficient for nontrivial reser

    Read more →
  • Probit model

    Probit model

    In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model. A probit model is a popular specification for a binary response model. As such it treats the same set of problems as does logistic regression using similar techniques. When viewed in the generalized linear model framework, the probit model employs a probit link function. It is most often estimated using the maximum likelihood procedure, such an estimation being called a probit regression. == Conceptual framework == Suppose a response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0. For example, Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the model takes the form P ( Y = 1 ∣ X ) = Φ ( X T β ) , {\displaystyle P(Y=1\mid X)=\Phi (X^{\operatorname {T} }\beta ),} where P is the probability and Φ {\displaystyle \Phi } is the cumulative distribution function (CDF) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood. It is possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable Y ∗ = X T β + ε , {\displaystyle Y^{\ast }=X^{T}\beta +\varepsilon ,} where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive: Y = { 1 Y ∗ > 0 0 otherwise } = { 1 X T β + ε > 0 0 otherwise } {\displaystyle Y=\left.{\begin{cases}1&Y^{}>0\\0&{\text{otherwise}}\end{cases}}\right\}=\left.{\begin{cases}1&X^{\operatorname {T} }\beta +\varepsilon >0\\0&{\text{otherwise}}\end{cases}}\right\}} The use of the standard normal distribution causes no loss of generality compared with the use of a normal distribution with an arbitrary mean and standard deviation, because adding a fixed amount to the mean can be compensated by subtracting the same amount from the intercept, and multiplying the standard deviation by a fixed amount can be compensated by multiplying the weights by the same amount. To see that the two models are equivalent, note that P ( Y = 1 ∣ X ) = P ( Y ∗ > 0 ) = P ( X T β + ε > 0 ) = P ( ε > − X T β ) = P ( ε < X T β ) by symmetry of the normal distribution = Φ ( X T β ) {\displaystyle {\begin{aligned}P(Y=1\mid X)&=P(Y^{\ast }>0)\\&=P(X^{\operatorname {T} }\beta +\varepsilon >0)\\&=P(\varepsilon >-X^{\operatorname {T} }\beta )\\&=P(\varepsilon 0 {\displaystyle t,\lim _{n\rightarrow \infty }n_{t}/n=c_{t}>0} . Denote p ^ t = r t / n t {\displaystyle {\hat {p}}_{t}=r_{t}/n_{t}} σ ^ t 2 = 1 n t p ^ t ( 1 − p ^ t ) φ 2 ( Φ − 1 ( p ^ t ) ) {\displaystyle {\hat {\sigma }}_{t}^{2}={\frac {1}{n_{t}}}{\frac {{\hat {p}}_{t}(1-{\hat {p}}_{t})}{\varphi ^{2}{\big (}\Phi ^{-1}({\hat {p}}_{t}){\big )}}}} Then Berkson's minimum chi-square estimator is a generalized least squares estimator in a regression of Φ − 1 ( p ^ t ) {\displaystyle \Phi ^{-1}({\hat {p}}_{t})} on x ( t ) {\displaystyle x_{(t)}} with weights σ ^ t − 2 {\displaystyle {\hat {\sigma }}_{t}^{-2}} : β ^ = ( ∑ t = 1 T σ ^ t − 2 x ( t ) x ( t ) T ) − 1 ∑ t = 1 T σ ^ t − 2 x ( t ) Φ − 1 ( p ^ t ) {\displaystyle {\hat {\beta }}={\Bigg (}\sum _{t=1}^{T}{\hat {\sigma }}_{t}^{-2}x_{(t)}x_{(t)}^{\operatorname {T} }{\Bigg )}^{-1}\sum _{t=1}^{T}{\hat {\sigma }}_{t}^{-2}x_{(t)}\Phi ^{-1}({\hat {p}}_{t})} It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient. Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts r t {\displaystyle r_{t}} , n t {\disp

    Read more →
  • Blockmodeling linked networks

    Blockmodeling linked networks

    Blockmodeling linked networks is an approach in blockmodeling in analysing the linked networks. Such approach is based on the generalized multilevel blockmodeling approach. The main objective of this approach is to achieve clustering of the nodes from all involved sets, while at the same time using all available information. At the same time, all one-mode and two-node networks, that are connected, are blockmodeled, which results in obtaining only one clustering, using nodes from each sets. Each cluster ideally contains only nodes from one set, which also allows the modeling of the links among clusters from different sets (through two-mode networks). This approach was introduced by Aleš Žiberna in 2014. Blockmodeling linked networks can be done using: separate analysis: blockmodeling each level separately; conversion approach: converting all one-mode networks to the same level and joining with two-mode networks; a true multilevel approach: one-mode and two-mode networks are blockmodeled at the same time, resulting in one clustering for nodes from each level.

    Read more →
  • Rake (software)

    Rake (software)

    Rake is a software task management and a build automation tool created by Jim Weirich. It allows the user to specify tasks and to describe dependencies as well as to group tasks into namespaces. It is similar to SCons and Make. Rake was written in Ruby and has been part of the standard library of Ruby since version 1.9. == Examples == The tasks that should be executed need to be defined in a configuration file called Rakefile. A Rakefile has no special syntax and contains executable Ruby code. === Tasks === The basic unit in Rake is the task. A task has a name and an action block, that defines its functionality. The following code defines a task called greet that will output the text "Hello, Rake!" to the console. When defining a task, you can optionally add dependencies, that is one task can depend on the successful completion of another task. Calling the "seed" task from the following example will first execute the "migrate" task and only then proceed with the execution of the "seed" task.Tasks can also be made more versatile by accepting arguments. For example, the "generate_report" task will take a date as argument. If no argument is supplied the current date is used.A special type of task is the file task, which can be used to specify file creation tasks. The following task, for example, is given two object files, i.e. "a.o" and "b.o", to create an executable program.Another useful tool is the directory convenience method, that can be used to create directories upon demand. === Rules === When a file is named as a prerequisite but it does not have a file task defined for it, Rake will attempt to synthesize a task by looking at a list of rules supplied in the Rakefile. For example, suppose we were trying to invoke task "mycode.o" with no tasks defined for it. If the Rakefile has a rule that looks like this: This rule will synthesize any task that ends in ".o". It has as a prerequisite that a source file with an extension of ".c" must exist. If Rake is able to find a file named "mycode.c", it will automatically create a task that builds "mycode.o" from "mycode.c". If the file "mycode.c" does not exist, Rake will attempt to recursively synthesize a rule for it. When a task is synthesized from a rule, the source attribute of the task is set to the matching source file. This allows users to write rules with actions that reference the source file. === Advanced rules === Any regular expression may be used as the rule pattern. Additionally, a proc may be used to calculate the name of the source file. This allows for complex patterns and sources. The following rule is equivalent to the example above: NOTE: Because of a quirk in Ruby syntax, parentheses are required around a rule when the first argument is a regular expression. The following rule might be used for Java files: === Namespaces === To better organize big Rakefiles, tasks can be grouped into namespaces. Below is an example of a simple Rake recipe:

    Read more →
  • Generalized multidimensional scaling

    Generalized multidimensional scaling

    Generalized multidimensional scaling (GMDS) is an extension of metric multidimensional scaling, in which the target space is non-Euclidean. When the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another. GMDS is an emerging research direction. Currently, main applications are recognition of deformable objects (e.g. for three-dimensional face recognition) and texture mapping.

    Read more →
  • Language identification in the limit

    Language identification in the limit

    Language identification in the limit is a formal model for inductive inference of formal languages, mainly by computers (see machine learning and induction of regular languages). It was introduced by E. Mark Gold in a technical report and a journal article with the same title. In this model, a teacher provides to a learner some presentation (i.e. a sequence of strings) of some formal language. The learning is seen as an infinite process. Each time the learner reads an element of the presentation, it should provide a representation (e.g. a formal grammar) for the language. Gold defines that a learner can identify in the limit a class of languages if, given any presentation of any language in the class, the learner will produce only a finite number of wrong representations, and then stick with the correct representation. However, the learner need not be able to announce its correctness; and the teacher might present a counterexample to any representation arbitrarily long after. Gold defined two types of presentations: Text (positive information): an enumeration of all strings the language consists of. Complete presentation (positive and negative information): an enumeration of all possible strings, each with a label indicating if the string belongs to the language or not. == Learnability == This model is an early attempt to formally capture the notion of learnability. Gold's journal article introduces for contrast the stronger models Finite identification (where the learner has to announce correctness after a finite number of steps), and Fixed-time identification (where correctness has to be reached after an apriori-specified number of steps). A weaker formal model of learnability is the Probably approximately correct learning (PAC) model, introduced by Leslie Valiant in 1984. == Examples == It is instructive to look at concrete examples (in the tables) of learning sessions the definition of identification in the limit speaks about. A fictitious session to learn a regular language L over the alphabet {a,b} from text presentation:In each step, the teacher gives a string belonging to L, and the learner answers a guess for L, encoded as a regular expression. In step 3, the learner's guess is not consistent with the strings seen so far; in step 4, the teacher gives a string repeatedly. After step 6, the learner sticks to the regular expression (ab+ba). If this happens to be a description of the language L the teacher has in mind, it is said that the learner has learned that language.If a computer program for the learner's role would exist that was able to successfully learn each regular language, that class of languages would be identifiable in the limit. Gold has shown that this is not the case. A particular learning algorithm always guessing L to be just the union of all strings seen so far:If L is a finite language, the learner will eventually guess it correctly, however, without being able to tell when. Although the guess didn't change during step 3 to 6, the learner couldn't be sure to be correct.Gold has shown that the class of finite languages is identifiable in the limit, however, this class is neither finitely nor fixed-time identifiable. Learning from complete presentation by telling:In each step, the teacher gives a string and tells whether it belongs to L (green) or not (red, struck-out). Each possible string is eventually classified in this way by the teacher. Learning from complete presentation by request:The learner gives a query string, the teacher tells whether it belongs to L (yes) or not (no); the learner then gives a guess for L, followed by the next query string. In this example, the learner happens to query in each step just the same string as given by the teacher in example 3.In general, Gold has shown that each language class identifiable in the request-presentation setting is also identifiable in the telling-presentation setting, since the learner, instead of querying a string, just needs to wait until it is eventually given by the teacher. == Gold's theorem == More formally, a language L {\displaystyle L} is a nonempty set, and its elements are called sentences. a language family is a set of languages. a language-learning environment E {\displaystyle E} for a language L {\displaystyle L} is a stream of sentences from L {\displaystyle L} , such that each sentence in L {\displaystyle L} appears at least once. a language learner is a function f {\displaystyle f} that sends a list of sentences to a language. This is interpreted as saying that, after seeing sentences a 1 , a 2 . . . , a n {\displaystyle a_{1},a_{2}...,a_{n}} in that order, the language learner guesses that the language that produces the sentences should be f ( a 1 , . . . , a n ) {\displaystyle f(a_{1},...,a_{n})} . Note that the learner is not obliged to be correct — it could very well guess a language that does not even contain a 1 , . . . , a n {\displaystyle a_{1},...,a_{n}} . a language learner f {\displaystyle f} learns a language L {\displaystyle L} in environment E = ( a 1 , a 2 , . . . ) {\displaystyle E=(a_{1},a_{2},...)} if the learner always guesses L {\displaystyle L} after seeing enough examples from the environment. a language learner f {\displaystyle f} learns a language L {\displaystyle L} if it learns L {\displaystyle L} in any environment E {\displaystyle E} for L {\displaystyle L} . a language family is learnable if there exists a language learner that can learn all languages in the family. Notes: In the context of Gold's theorem, sentences need only be distinguishable. They need not be anything in particular, such as finite strings (as usual in formal linguistics). Learnability is not a concept for individual languages. Any individual language L {\displaystyle L} could be learned by a trivial learner that always guesses L {\displaystyle L} . Learnability is not a concept for individual learners. A language family is learnable if, and only if, there exists some learner that can learn the family. It does not matter how well the learner performs for learning languages outside the family. Gold's theorem is easily bypassed if negative examples are allowed. In particular, the language family { L 1 , L 2 , . . . , L ∞ } {\displaystyle \{L_{1},L_{2},...,L_{\infty }\}} can be learned by a learner that always guesses L ∞ {\displaystyle L_{\infty }} until it receives the first negative example ¬ a n {\displaystyle \neg a_{n}} , where a n ∈ L n + 1 ∖ L n {\displaystyle a_{n}\in L_{n+1}\setminus L_{n}} , at which point it always guesses L n {\displaystyle L_{n}} . == Learnability characterization == Dana Angluin gave the characterizations of learnability from text (positive information) in a 1980 paper. If a learner is required to be effective, then an indexed class of recursive languages is learnable in the limit if there is an effective procedure that uniformly enumerates tell-tales for each language in the class (Condition 1). It is not hard to see that if an ideal learner (i.e., an arbitrary function) is allowed, then an indexed class of languages is learnable in the limit if each language in the class has a tell-tale (Condition 2). == Language classes learnable in the limit == The table shows which language classes are identifiable in the limit in which learning model. On the right-hand side, each language class is a superclass of all lower classes. Each learning model (i.e. type of presentation) can identify in the limit all classes below it. In particular, the class of finite languages is identifiable in the limit by text presentation (cf. Example 2 above), while the class of regular languages is not. Pattern Languages, introduced by Dana Angluin in another 1980 paper, are also identifiable by normal text presentation; they are omitted in the table, since they are above the singleton and below the primitive recursive language class, but incomparable to the classes in between. == Sufficient conditions for learnability == Condition 1 in Angluin's paper is not always easy to verify. Therefore, people come up with various sufficient conditions for the learnability of a language class. See also Induction of regular languages for learnable subclasses of regular languages. === Finite thickness === A class of languages has finite thickness if every non-empty set of strings is contained in at most finitely many languages of the class. This is exactly Condition 3 in Angluin's paper. Angluin showed that if a class of recursive languages has finite thickness, then it is learnable in the limit. A class with finite thickness certainly satisfies MEF-condition and MFF-condition; in other words, finite thickness implies M-finite thickness. === Finite elasticity === A class of languages is said to have finite elasticity if for every infinite sequence of strings s 0 , s 1 , . . . {\displaystyle s_{0},s_{1},...} and every infinite sequence of languages in the class L 1 , L 2 , . . . {\displaystyle L_{1},L_{2},...} , there exists a finite number n such

    Read more →
  • Calibration (statistics)

    Calibration (statistics)

    There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. Calibration can mean a reverse process to regression, where instead of a future dependent variable being predicted from known explanatory variables, a known observation of the dependent variables is used to predict a corresponding explanatory variable; procedures in statistical classification to determine class membership probabilities which assess the uncertainty of a given new observation belonging to each of the already established classes. In addition, calibration is used in statistics with the usual general meaning of calibration. For example, model calibration can be also used to refer to Bayesian inference about the value of a model's parameters, given some data set, or more generally to any type of fitting of a statistical model. As Philip Dawid puts it, "a forecaster is well calibrated if, for example, of those events to which he assigns a probability 30 percent, the long-run proportion that actually occurs turns out to be 30 percent." == In classification == Calibration in classification means transforming classifier scores into class membership probabilities. An overview of calibration methods for two-class and multi-class classification tasks is given by Gebel (2009). A classifier might separate the classes well, but be poorly calibrated, meaning that the estimated class probabilities are far from the true class probabilities. In this case, a calibration step may help improve the estimated probabilities. A variety of metrics exist that are aimed to measure the extent to which a classifier produces well-calibrated probabilities. Foundational work includes the Expected Calibration Error (ECE). Into the 2020s, variants include the Adaptive Calibration Error (ACE) and the Test-based Calibration Error (TCE), which address limitations of the ECE metric that may arise when classifier scores concentrate on narrow subset of the [0,1] range. A 2020s advancement in calibration assessment is the introduction of the Estimated Calibration Index (ECI). The ECI extends the concepts of the Expected Calibration Error (ECE) to provide a more nuanced measure of a model's calibration, particularly addressing overconfidence and underconfidence tendencies. Originally formulated for binary settings, the ECI has been adapted for multiclass settings, offering both local and global insights into model calibration. This framework aims to overcome some of the theoretical and interpretative limitations of existing calibration metrics. Through a series of experiments, Famiglini et al. demonstrate the framework's effectiveness in delivering a more accurate understanding of model calibration levels and discuss strategies for mitigating biases in calibration assessment. An online tool has been proposed to compute both ECE and ECI. The following univariate calibration methods exist for transforming classifier scores into class membership probabilities in the two-class case: Assignment value approach, see Garczarek (2002) Bayes approach, see Bennett (2002) Isotonic regression, see Zadrozny and Elkan (2002) Platt scaling (a form of logistic regression), see Lewis and Gale (1994) and Platt (1999) Bayesian Binning into Quantiles (BBQ) calibration, see Naeini, Cooper, Hauskrecht (2015) Beta calibration, see Kull, Filho, Flach (2017) === In probability prediction and forecasting === In prediction and forecasting, a Brier score is sometimes used to assess prediction accuracy of a set of predictions, specifically that the magnitude of the assigned probabilities track the relative frequency of the observed outcomes. Philip E. Tetlock employs the term "calibration" in this sense in his 2015 book Superforecasting. This differs from accuracy and precision. For example, as expressed by Daniel Kahneman, "if you give all events that happen a probability of .6 and all the events that don't happen a probability of .4, your discrimination is perfect but your calibration is miserable". In meteorology, in particular, as concerns weather forecasting, a related mode of assessment is known as forecast skill. == In regression == The calibration problem in regression is the use of known data on the observed relationship between a dependent variable and an independent variable to make estimates of other values of the independent variable from new observations of the dependent variable. This can be known as "inverse regression"; there is also sliced inverse regression. The following multivariate calibration methods exist for transforming classifier scores into class membership probabilities in the case with classes count greater than two: Reduction to binary tasks and subsequent pairwise coupling, see Hastie and Tibshirani (1998) Dirichlet calibration, see Gebel (2009) === Example === One example is that of dating objects, using observable evidence such as tree rings for dendrochronology or carbon-14 for radiometric dating. The observation is caused by the age of the object being dated, rather than the reverse, and the aim is to use the method for estimating dates based on new observations. The problem is whether the model used for relating known ages with observations should aim to minimise the error in the observation, or minimise the error in the date. The two approaches will produce different results, and the difference will increase if the model is then used for extrapolation at some distance from the known results.

    Read more →
  • Negobot

    Negobot

    Negobot also referred to as Lolita or Lolita chatbot is a chatterbot that was introduced to the public in 2013, designed by researchers from the University of Deusto and Optenet to catch online pedophiles. It is a conversational agent that utilizes natural language processing (NLP), information retrieval (IR) and Automatic Learning. Because the bot poses as a young female in order to entice and track potential predators, it became known in media as the "virtual Lolita", in reference to Vladimir Nabokov's novel. == Background == In 2013, the University of Deusto researchers published a paper on their work with Negobot and disclosed the text online. In their abstract, the researchers addressed the issue that an increasing number of children are using the internet and that these young users are more susceptible to existing internet risks. Their main objective was to create a chatterbot with the ability to trap online predators that posed a threat to children. They intended to deploy the bot into sites frequented by predators such as social networks and chatrooms. The university researchers used information provided by anti-pedophilia activist organization Perverted-Justice, including examples of online encounters and conversations with sexual predators, to supplement the program's artificial intelligence system. == Features == === Programmed persona === The chatterbot takes the guise of a naive and vulnerable 14-year-old girl. The bot's programmers used methods of artificial intelligence and natural language processing to create a conversational agent fluent in typical teenage slang, misspellings, and knowledge of pop culture. Through these linguistic features, the bot is able to mimic the conversational style of young teenagers. It also features split personalities and seven different patterns of conversation. Negobot's primary creator, Dr. Carlos Laorden, expressed the significance of the bot's distinguishable style of communication, stating that normally, "chatbots tend to be very predictable. Their behavior and interest in a conversation are flat, which is a problem when attempting to detect untrustworthy targets like paedophiles." What makes Negobot different is its game theory feature, which makes it able to "maintain a much more realistic conversation." Apart from being able to imitate a stereotypical teenager, the program is also able to translate messages into different languages. === Game theory === Negobot's designers programmed it with the ability to treat conversations with potential predators as if it were a game, the objective being to collect as much information on the suspect as possible that could provide evidence of pedophilic characteristics and motives. The use of game theory shapes the decisions the bot makes and the overall direction of the conversation. The bot initiates its undercover operations by entering a chat as a passive participant, waiting to be chatted by a user. Once a user elicits conversation, the bot will frame the conversation in such a way that keeps the target engaged, extracting personal information and discouraging it from leaving the chat. The information is then recorded to be potentially sent to the police. If the target seems to lose interest, the bot attempts to make it feel guilty by expressing sentiments of loneliness and emotional need through strategic, formulated responses, ultimately prolonging interaction. In addition, the bot may provide fake information about itself in attempt to lure the target into physical meetings. === Limitations === Despite being able to carry out a realistic conversation, Negobot is still unable to detect linguistic subtleties in the messages of others, including sarcasm. == Controversy == John Carr, a specialist in online child safety, expressed his concern to BBC over the legality of this undercover investigation. He claimed that using the bot on unsuspecting internet users could be considered a form of entrapment or harassment. The type of information that Negobot collects from potential online predators, he said, is unlikely to be upheld in court. Furthermore, he warned that relying on only software without any real-world policing risks enticing individuals to do or say things that they would not have if real-world policing were a factor.

    Read more →
  • Spiking neural network

    Spiking neural network

    Spiking neural networks (SNNs) are artificial neural networks (ANN) that mimic natural neural networks. These models leverage timing of discrete spikes as the main information carrier. In addition to neuronal and synaptic state, SNNs incorporate the concept of time into their operating model. The idea is that neurons in the SNN do not transmit information at each propagation cycle (as it happens with typical multi-layer perceptron networks), but rather transmit information only when a membrane potential—an intrinsic quality of the neuron related to its membrane electrical charge—reaches a specific value, called the threshold. When the membrane potential reaches the threshold, the neuron fires, and generates a signal that travels to other neurons which, in turn, increase or decrease their potentials in response to this signal. A neuron model that fires at the moment of threshold crossing is also called a spiking neuron model. While spike rates can be considered the analogue of the variable output of a traditional ANN, neurobiology research indicated that high speed processing cannot be performed solely through a rate-based scheme. For example humans can perform an image recognition task requiring no more than 10ms of processing time per neuron through the successive layers (going from the retina to the temporal lobe). This time window is too short for rate-based encoding. The precise spike timings in a small set of spiking neurons also has a higher information coding capacity compared with a rate-based approach. The most prominent spiking neuron model is the leaky integrate-and-fire model. In that model, the momentary activation level (modeled as a differential equation) is normally considered to be the neuron's state, with incoming spikes pushing this value higher or lower, until the state eventually either decays or—if the firing threshold is reached—the neuron fires. After firing, the state variable is reset to a lower value. Various decoding methods exist for interpreting the outgoing spike train as a real-value number, relying on either the frequency of spikes (rate-code), the time-to-first-spike after stimulation, or the interval between spikes. == History == Many multi-layer artificial neural networks are fully connected, receiving input from every neuron in the previous layer and signalling every neuron in the subsequent layer. Although these networks have achieved breakthroughs, they do not match biological networks and do not mimic neurons. The biology-inspired Hodgkin–Huxley model of a spiking neuron was proposed in 1952. This model described how action potentials are initiated and propagated. Communication between neurons, which requires the exchange of chemical neurotransmitters in the synaptic gap, is described in models such as the integrate-and-fire model, FitzHugh–Nagumo model (1961–1962), and Hindmarsh–Rose model (1984). The leaky integrate-and-fire model (or a derivative) is commonly used as it is easier to compute than Hodgkin–Huxley. While the notion of an artificial spiking neural network became popular only in the twenty-first century, studies between 1980 and 1995 supported the concept. The first models of this type of ANN appeared to simulate non-algorithmic intelligent information processing systems. However, the notion of the spiking neural network as a mathematical model was first worked on in the early 1970s. As of 2019 SNNs lagged behind ANNs in accuracy, but the gap is decreasing, and has vanished on some tasks. == Underpinnings == Information in the brain is represented as action potentials (neuron spikes), which may group into spike trains or coordinated waves. A fundamental question of neuroscience is to determine whether neurons communicate by a rate or temporal code. Temporal coding implies that a single spiking neuron can replace hundreds of hidden units on a conventional neural net. SNNs define a neuron's current state as its potential (possibly modeled as a differential equation). An input pulse causes the potential to rise and then gradually decline. Encoding schemes can interpret these pulse sequences as a number, considering pulse frequency and pulse interval. Using the precise time of pulse occurrence, a neural network can consider more information and offer better computing properties. SNNs compute in the continuous domain. Such neurons test for activation only when their potentials reach a certain value. When a neuron is activated, it produces a signal that is passed to connected neurons, accordingly raising or lowering their potentials. The SNN approach produces a continuous output instead of the binary output of traditional ANNs. Pulse trains are not easily interpretable, hence the need for encoding schemes. However, a pulse train representation may be more suited for processing spatiotemporal data (or real-world sensory data classification). SNNs connect neurons only to nearby neurons so that they process input blocks separately (similar to CNN using filters). They consider time by encoding information as pulse trains so as not to lose information. This avoids the complexity of a recurrent neural network (RNN). Impulse neurons are more powerful computational units than traditional artificial neurons. SNNs are theoretically more powerful than so called "second-generation networks" defined as ANNs "based on computational units that apply activation function with a continuous set of possible output values to a weighted sum (or polynomial) of the inputs"; however, SNN training issues and hardware requirements limit their use. Although unsupervised biologically inspired learning methods are available such as Hebbian learning and STDP, no effective supervised training method is suitable for SNNs that can provide better performance than second-generation networks. Spike-based activation of SNNs is not differentiable, thus gradient descent-based backpropagation (BP) is not available. SNNs have much larger computational costs for simulating realistic neural models than traditional ANNs. Pulse-coupled neural networks (PCNN) are often confused with SNNs. A PCNN can be seen as a kind of SNN. Researchers are actively working on various topics. The first concerns differentiability. The expressions for both the forward- and backward-learning methods contain the derivative of the neural activation function which is not differentiable because a neuron's output is either 1 when it spikes, and 0 otherwise. This all-or-nothing behavior disrupts gradients and makes these neurons unsuitable for gradient-based optimization. Approaches to resolving it include: resorting to entirely biologically inspired local learning rules for the hidden units translating conventionally trained "rate-based" NNs to SNNs smoothing the network model to be continuously differentiable defining an SG (Surrogate Gradient) as a continuous relaxation of the real gradients The second concerns the optimization algorithm. Standard BP can be expensive in terms of computation, memory, and communication and may be poorly suited to the hardware that implements it (e.g., a computer, brain, or neuromorphic device). Incorporating additional neuron dynamics such as Spike Frequency Adaptation (SFA) is a notable advance, enhancing efficiency and computational power. These neurons sit between biological complexity and computational complexity. Originating from biological insights, SFA offers significant computational benefits by reducing power usage, especially in cases of repetitive or intense stimuli. This adaptation improves signal/noise clarity and introduces an elementary short-term memory at the neuron level, which in turn, improves accuracy and efficiency. This was mostly achieved using compartmental neuron models. The simpler versions are of neuron models with adaptive thresholds, are an indirect way of achieving SFA. It equips SNNs with improved learning capabilities, even with constrained synaptic plasticity, and elevates computational efficiency. This feature lessens the demand on network layers by decreasing the need for spike processing, thus lowering computational load and memory access time—essential aspects of neural computation. Moreover, SNNs utilizing neurons capable of SFA achieve levels of accuracy that rival those of conventional ANNs, while also requiring fewer neurons for comparable tasks. This efficiency streamlines the computational workflow and conserves space and energy, while maintaining technical integrity. High-performance deep spiking neural networks can operate with 0.3 spikes per neuron. == Applications == SNNs can in principle be applied to the same applications as traditional ANNs. In addition, SNNs can model the central nervous system of biological organisms, such as an insect seeking food without prior knowledge of the environment. Due to their relative realism, they can be used to study biological neural circuits. Starting with a hypothesis about the topology of a biological neuronal circuit and its functi

    Read more →
  • IBM Watsonx

    IBM Watsonx

    Watsonx is a platform by IBM for building and managing artificial intelligence (AI) applications for business use. Released on May 9, 2023, the platform provides software tools and infrastructure for companies to work with both IBM's own AI models and models from third-party sources. The platform consists of three main components: watsonx.ai, a studio for training, validating, and deploying AI models; watsonx.data, a system for storing and managing data used by the models; and watsonx.governance, a toolkit to ensure AI applications are compliant with company policies and regulations. A key feature of the platform is that it can be trained on a company's private data to perform specialized tasks, a process known as fine-tuning. IBM states that this client-specific data is not used to train its own models. == History == Watsonx was introduced on May 9, 2023, at the annual IBM Think conference, as a platform that includes multiple services. Just like Watson AI computer with the similar name, Watsonx was named after Thomas J. Watson, IBM's founder and first CEO. On February 13, 2024, Anaconda partnered with IBM to embed its open-source Python packages into Watsonx. Watsonx is used at ESPN's Fantasy Football App for managing players' performance, and by Italian telecommunications company Wind Tre. It was employed to generate editorial content around nominees during the 66th Annual Grammy Awards. In 2025, Wimbledon integrated IBM watsonx generative AI into its app and website. Integrated with IBM Safer Payments, IBM watsonx has been used in banking sector fraud detection and anti-money laundering (AML) systems. == Services == === watsonx.ai === Watsonx.ai is a platform that allows AI developers to leverage a wide range of LLMs under IBM's own Granite series and others such as Facebook's LLaMA-2, free and open-source model Mistral, and many others present in the Hugging Face community. These models come pre-trained and optimized for various natural language processing (NLP) applications.The platform also allows fine-tuning with its Tuning Studio. === watsonx.data === Watsonx.data is a platform designed to assist clients in addressing issues related to data volume, complexity, cost, and governance.. The platform facilitates seamless data access, whether stored in the cloud or on-premises, through a single entry point. === watsonx.governance === Watsonx.governance is a platform that utilizes IBM's AI capabilities to implement AI lifecycle governance. This helps them manage risks and maintain compliance with evolving AI and industry regulations, while reducing AI bias through automated oversight.

    Read more →
  • Latent class model

    Latent class model

    In statistics, a latent class model (LCM) is a model for clustering multivariate discrete data. It assumes that the data arise from a mixture of discrete distributions, within each of which the variables are independent. It is called a latent class model because the class to which each data point belongs is unobserved (or latent). Latent class analysis (LCA) is a subset of structural equation modeling used to find groups or subtypes of cases in multivariate categorical data. These groups or subtypes of cases are called "latent classes". When faced with the following situation, a researcher might opt to use LCA to better understand the data: Symptoms a, b, c, and d have been recorded in a variety of patients diagnosed with diseases X, Y, and Z. Disease X is associated with symptoms a, b, and c; disease Y is linked to symptoms b, c, and d; and disease Z is connected to symptoms a, c, and d. In this context, the LCA would attempt to detect the presence of latent classes (i.e., the disease entities), thus creating patterns of association in the symptoms. As in factor analysis, LCA can also be used to classify cases according to their maximum likelihood class membership probability. The key criterion for resolving the LCA is identifying latent classes in which the observed symptom associations are effectively rendered null. This is because within each class, the diseases responsible for the symptoms create a structure of dependencies. As a result, the symptoms become conditionally independent, meaning that, given the class a case belongs to, the symptoms are no longer related to one another. == Model == Within each latent class, the observed variables are statistically independent—an essential aspect of latent class modeling. Usually, the observed variables are statistically dependent. By introducing the latent variable, independence is restored in the sense that within classes, variables are independent (local independence). Therefore, the association between the observed variables is explained by the classes of the latent variable (McCutcheon, 1987). In one form, the LCM is written as p i 1 , i 2 , … , i N ≈ ∑ t T p t ∏ n N p i n , t n , {\displaystyle p_{i_{1},i_{2},\ldots ,i_{N}}\approx \sum _{t}^{T}p_{t}\,\prod _{n}^{N}p_{i_{n},t}^{n},} where T {\displaystyle T} is the number of latent classes and p t {\displaystyle p_{t}} are the so-called recruitment or unconditional probabilities that should sum to one. p i n , t n {\displaystyle p_{i_{n},t}^{n}} are the marginal or conditional probabilities. For a two-way latent class model, the form is p i j ≈ ∑ t T p t p i t p j t . {\displaystyle p_{ij}\approx \sum _{t}^{T}p_{t}\,p_{it}\,p_{jt}.} This two-way model is related to probabilistic latent semantic analysis and non-negative matrix factorization. The probability model used in LCA is closely related to the Naive Bayes classifier. The main difference is that in LCA, the class membership of an individual is a latent variable, whereas in Naive Bayes classifiers, the class membership is an observed label. == Related methods == There are a number of methods with distinct names and uses that share a common relationship. Cluster analysis is, like LCA, used to discover taxon-like groups of cases in data. Multivariate mixture estimation (MME) is applicable to continuous data and assumes that such data arise from a mixture of distributions, such as a set of heights arising from a mixture of men and women. If a multivariate mixture estimation is constrained so that measures must be uncorrelated within each distribution, it is termed latent profile analysis. Modified to handle discrete data, this constrained analysis is known as LCA. Discrete latent trait models further constrain the classes to form from segments of a single dimension, allocating members to classes based on that dimension. An example would be assigning cases to social classes based on ability or merit. In a practical instance, the variables could be multiple choice items of a political questionnaire. In this case, the data consists of an N-way contingency table with answers to the items for a number of respondents. In this example, the latent variable refers to political opinion, and the latent classes to political groups. Given group membership, the conditional probabilities specify the chance that certain answers are chosen. == Application == LCA may be used in many fields, such as: collaborative filtering, Behavior Genetics and Evaluation of diagnostic tests.

    Read more →
  • InciWeb

    InciWeb

    InciWeb is an interagency all-risk incident web information management system provided by the United States Forest Service released in 2004. It was originally developed for wildland fire emergencies, but can be also used for other emergency incidents (natural disasters, such as earthquakes, floods, hurricanes, and tornadoes). == Introduction == It was developed with two primary missions: 1. Provide the public a single source of incident related information 2. Provide a standardized reporting tool for the Public Affairs community Official announcements include evacuations, road closures, news releases, maps, photographs, and basic info and current situation about the incident. Incident information can be accessed by: web browser at https://inciweb.wildfire.gov/ Twitter RSS web feed == Technical == The original application was hosted at the United States Forest Service - Wildland Fire Training and Conference Center, at McClellan Airfield, California, comprising three servers: Database server Administrative server Load balancer for the public content which routes traffic to a pool of eight servers. Web traffic averages 2 million plus hits daily during the fire season with the ability to handle 3.5 million hits. The servers were moved to the National information Technology Center (NITC), Kansas City, Missouri on July 16, 2008, along with the release of version 2.0; the current version is 2.2. == Availability issues == InciWeb was having technical difficulties due to the high volume of Internet users trying to access the site during the September–October 2006 Day Fire and the Summer 2008 California wildfires. == Participating agencies == United States Forest Service Bureau of Land Management Bureau of Indian Affairs Fish and Wildlife Service National Park Service National Oceanic & Atmospheric Administration Department of the Interior Office of Aircraft Services National Association of State Foresters United States Fire Administration These same agencies are also in the National Interagency Fire Center.

    Read more →
  • Multinomial logistic regression

    Multinomial logistic regression

    In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.). Multinomial logistic regression is known by a variety of other names, including polytomous LR, multiclass LR, softmax regression, multinomial logit (mlogit), the maximum entropy (MaxEnt) classifier, and the conditional maximum entropy model. == Background == Multinomial logistic regression is used when the dependent variable in question is nominal (equivalently categorical, meaning that it falls into any one of a set of categories that cannot be ordered in any meaningful way) and for which there are more than two categories. Some examples would be: Which major will a college student choose, given their grades, stated likes and dislikes, etc.? Which blood type does a person have, given the results of various diagnostic tests? In a hands-free mobile phone dialing application, which person's name was spoken, given various properties of the speech signal? Which candidate will a person vote for, given particular demographic characteristics? Which country will a firm locate an office in, given the characteristics of the firm and of the various candidate countries? These are all statistical classification problems. They all have in common a dependent variable to be predicted that comes from one of a limited set of items that cannot be meaningfully ordered, as well as a set of independent variables (also known as features, explanators, etc.), which are used to predict the dependent variable. Multinomial logistic regression is a particular solution to classification problems that use a linear combination of the observed features and some problem-specific parameters to estimate the probability of each particular value of the dependent variable. The best values of the parameters for a given problem are usually determined from some training data (e.g. some people for whom both the diagnostic test results and blood types are known, or some examples of known words being spoken). == Assumptions == The multinomial logistic model assumes that data are case-specific; that is, each independent variable has a single value for each case. As with other types of regression, there is no need for the independent variables to be statistically independent from each other (unlike, for example, in a naive Bayes classifier); however, collinearity is assumed to be relatively low, as it becomes difficult to differentiate between the impact of several variables if this is not the case. If the multinomial logit is used to model choices, it relies on the assumption of independence of irrelevant alternatives (IIA), which is not always desirable. This assumption states that the odds of preferring one class over another do not depend on the presence or absence of other "irrelevant" alternatives. For example, the relative probabilities of taking a car or bus to work do not change if a bicycle is added as an additional possibility. This allows the choice of K alternatives to be modeled as a set of K − 1 independent binary choices, in which one alternative is chosen as a "pivot" and the other K − 1 compared against it, one at a time. The IIA hypothesis is a core hypothesis in rational choice theory; however numerous studies in psychology show that individuals often violate this assumption when making choices. An example of a problem case arises if choices include a car and a blue bus. Suppose the odds ratio between the two is 1 : 1. Now if the option of a red bus is introduced, a person may be indifferent between a red and a blue bus, and hence may exhibit a car : blue bus : red bus odds ratio of 1 : 0.5 : 0.5, thus maintaining a 1 : 1 ratio of car : any bus while adopting a changed car : blue bus ratio of 1 : 0.5. Here the red bus option was not in fact irrelevant, because a red bus was a perfect substitute for a blue bus. If the multinomial logit is used to model choices, it may in some situations impose too much constraint on the relative preferences between the different alternatives. It is especially important to take into account if the analysis aims to predict how choices would change if one alternative were to disappear (for instance if one political candidate withdraws from a three candidate race). Other models like the nested logit or the multinomial probit may be used in such cases as they allow for violation of the IIA. == Model == === Introduction === There are multiple equivalent ways to describe the mathematical model underlying multinomial logistic regression. This can make it difficult to compare different treatments of the subject in different texts. The article on logistic regression presents a number of equivalent formulations of simple logistic regression, and many of these have analogues in the multinomial logit model. The idea behind all of them, as in many other statistical classification techniques, is to construct a linear predictor function that constructs a score from a set of weights that are linearly combined with the explanatory variables (features) of a given observation using a dot product: score ⁡ ( X i , k ) = β k ⋅ X i , {\displaystyle \operatorname {score} (\mathbf {X} _{i},k)={\boldsymbol {\beta }}_{k}\cdot \mathbf {X} _{i},} where Xi is the vector of explanatory variables describing observation i, βk is a vector of weights (or regression coefficients) corresponding to outcome k, and score(Xi, k) is the score associated with assigning observation i to category k. In discrete choice theory, where observations represent people and outcomes represent choices, the score is considered the utility associated with person i choosing outcome k. The predicted outcome is the one with the highest score. The difference between the multinomial logit model and numerous other methods, models, algorithms, etc. with the same basic setup (the perceptron algorithm, support vector machines, linear discriminant analysis, etc.) is the procedure for determining (training) the optimal weights/coefficients and the way that the score is interpreted. In particular, in the multinomial logit model, the score can directly be converted to a probability value, indicating the probability of observation i choosing outcome k given the measured characteristics of the observation. This provides a principled way of incorporating the prediction of a particular multinomial logit model into a larger procedure that may involve multiple such predictions, each with a possibility of error. Without such means of combining predictions, errors tend to multiply. For example, imagine a large predictive model that is broken down into a series of submodels where the prediction of a given submodel is used as the input of another submodel, and that prediction is in turn used as the input into a third submodel, etc. If each submodel has 90% accuracy in its predictions, and there are five submodels in series, then the overall model has only 0.95 = 59% accuracy. If each submodel has 80% accuracy, then overall accuracy drops to 0.85 = 33% accuracy. This issue is known as error propagation and is a serious problem in real-world predictive models, which are usually composed of numerous parts. Predicting probabilities of each possible outcome, rather than simply making a single optimal prediction, is one means of alleviating this issue. === Setup === The basic setup is the same as in logistic regression, the only difference being that the dependent variables are categorical rather than binary, i.e. there are K possible outcomes rather than just two. The following description is somewhat shortened; for more details, consult the logistic regression article. ==== Data points ==== Specifically, it is assumed that we have a series of N observed data points. Each data point i (ranging from 1 to N) consists of a set of M explanatory variables x1,i ... xM,i (also known as independent variables, predictor variables, features, etc.), and an associated categorical outcome Yi (also known as dependent variable, response variable), which can take on one of K possible values. These possible values represent logically separate categories (e.g. different political parties, blood types, etc.), and are often described mathematically by arbitrarily assigning each a number from 1 to K. The explanatory variables and outcome represent observed properties of the data points, and are often thought of as originating in the observations of N "experiments" — although an "experiment" may consist of nothing more than gathering data. The goal of multinomial logistic regression is to construct a model that explains the relationship between the explanatory variables and the outcome, so tha

    Read more →
  • Information Harvesting

    Information Harvesting

    Information Harvesting (IH) was an early data mining product from the 1990s. It was invented by Ralphe Wiggins and produced by the Ryan Corp, later Information Harvesting Inc., of Cambridge, Massachusetts. Wiggins had a background in genetic algorithms and fuzzy logic. IH sought to infer rules from sets of data. It did this first by classifying various input variables into one of a number of bins, thereby putting some structure on the continuous variables in the input. IH then proceeds to generate rules, trading off generalization against memorization, that will infer the value of the prediction variable, possibly creating many levels of rules in the process. It included strategies for checking if overfitting took place and, if so, correcting for it. Because of its strategies for correcting for overfitting by considering more data, and refining the rules based on that data, IH might also be considered to be a form of machine learning. The advantage of IH, as compared with other data mining products of its time and even later, was that it provided a mechanism for finding multiple rules that would classify the data and determining, according to set criteria, the best rules to use.

    Read more →