AI Generator Video Free Online

AI Generator Video Free Online — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Pattern theory

    Pattern theory

    Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithms and machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language. Broad in its mathematical coverage, Pattern Theory spans algebra and statistics, as well as local topological and global entropic properties. In addition to the new algebraic vocabulary, its statistical approach is novel in its aim to: Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was previously commonplace. Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbs-like graph. Study the randomness and variability of these graphs. Create the basic classes of stochastic models applied by listing the deformations of the patterns. Synthesize (sample) from the models, not just analyze signals with them. The Brown University Pattern Theory Group was formed in 1972 by Ulf Grenander. Many mathematicians are currently working in this group, noteworthy among them being the Fields Medalist David Mumford. Mumford regards Grenander as his "guru" in Pattern Theory.

    Read more →
  • Markov model

    Markov model

    In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property. == Introduction == Andrey Andreyevich Markov (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research later became known as the Markov chain. There are four common Markov models used in different situations, depending on whether every sequential state is observable or not, and whether the system is to be adjusted on the basis of observations made: == Markov chain == The simplest Markov model is the Markov chain. It models the state of a system with a random variable that changes through time. In this context, the Markov property indicates that the distribution for this variable depends only on the distribution of a previous state. An example use of a Markov chain is Markov chain Monte Carlo, which uses the Markov property to prove that a particular method for performing a random walk will sample from the joint distribution. == Hidden Markov model == A hidden Markov model is a Markov chain for which the state is only partially observable or noisily observable. In other words, observations are related to the state of the system, but they are typically insufficient to precisely determine the state. Several well-known algorithms for hidden Markov models exist. For example, given a sequence of observations, the Viterbi algorithm will compute the most-likely corresponding sequence of states, the forward algorithm will compute the probability of the sequence of observations, and the Baum–Welch algorithm will estimate the starting probabilities, the transition function, and the observation function of a hidden Markov model. One common use is for speech recognition, where the observed data is the speech audio waveform and the hidden state is the spoken text. In this example, the Viterbi algorithm finds the most likely sequence of spoken words given the speech audio. == Markov decision process == A Markov decision process is a Markov chain in which state transitions depend on the current state and an action vector that is applied to the system. Typically, a Markov decision process is used to compute a policy of actions that will maximize some utility with respect to expected rewards. == Partially observable Markov decision process == A partially observable Markov decision process (POMDP) is a Markov decision process in which the state of the system is only partially observed. POMDPs are known to be NP complete, but recent approximation techniques have made them useful for a variety of applications, such as controlling simple agents or robots. == Markov random field == A Markov random field, or Markov network, may be considered to be a generalization of a Markov chain in multiple dimensions. In a Markov chain, state depends only on the previous state in time, whereas in a Markov random field, each state depends on its neighbors in any of multiple directions. A Markov random field may be visualized as a field or graph of random variables, where the distribution of each random variable depends on the neighboring variables with which it is connected. More specifically, the joint distribution for any random variable in the graph can be computed as the product of the "clique potentials" of all the cliques in the graph that contain that random variable. Modeling a problem as a Markov random field is useful because it implies that the joint distributions at each vertex in the graph may be computed in this manner. == Hierarchical Markov models == Hierarchical Markov models can be applied to categorize human behavior at various levels of abstraction. For example, a series of simple observations, such as a person's location in a room, can be interpreted to determine more complex information, such as in what task or activity the person is performing. Two kinds of Hierarchical Markov Models are the Hierarchical hidden Markov model and the Abstract Hidden Markov Model. Both have been used for behavior recognition and certain conditional independence properties between different levels of abstraction in the model allow for faster learning and inference. == Tolerant Markov model == A Tolerant Markov model (TMM) is a probabilistic-algorithmic Markov chain model. It assigns the probabilities according to a conditioning context that considers the last symbol, from the sequence to occur, as the most probable instead of the true occurring symbol. A TMM can model three different natures: substitutions, additions or deletions. Successful applications have been efficiently implemented in DNA sequences compression. == Markov-chain forecasting models == Markov-chains have been used as a forecasting methods for several topics, for example price trends, wind power and solar irradiance. The Markov-chain forecasting models utilize a variety of different settings, from discretizing the time-series to hidden Markov-models combined with wavelets and the Markov-chain mixture distribution model (MCM).

    Read more →
  • Neural gas

    Neural gas

    Neural gas is an artificial neural network, inspired by the self-organizing map and introduced in 1991 by Thomas Martinetz and Klaus Schulten. The neural gas is a simple algorithm for finding optimal data representations based on feature vectors. The algorithm was coined "neural gas" because of the dynamics of the feature vectors during the adaptation process, which distribute themselves like a gas within the data space. It is applied where data compression or vector quantization is an issue, for example speech recognition, image processing or pattern recognition. As a robustly converging alternative to the k-means clustering it is also used for cluster analysis. == Algorithm == Suppose we want to model a probability distribution P ( x ) {\displaystyle P(x)} of data vectors x {\displaystyle x} using a finite number of feature vectors w i {\displaystyle w_{i}} , where i = 1 , ⋯ , N {\displaystyle i=1,\cdots ,N} . For each time step t {\displaystyle t} Sample data vector x {\displaystyle x} from P ( x ) {\displaystyle P(x)} Compute the distance between x {\displaystyle x} and each feature vector. Rank the distances. Let i 0 {\displaystyle i_{0}} be the index of the closest feature vector, i 1 {\displaystyle i_{1}} the index of the second closest feature vector, and so on. Update each feature vector by: w i k t + 1 = w i k t + ε ⋅ e − k / λ ⋅ ( x − w i k t ) , k = 0 , ⋯ , N − 1 {\displaystyle w_{i_{k}}^{t+1}=w_{i_{k}}^{t}+\varepsilon \cdot e^{-k/\lambda }\cdot (x-w_{i_{k}}^{t}),k=0,\cdots ,N-1} In the algorithm, ε {\displaystyle \varepsilon } can be understood as the learning rate, and λ {\displaystyle \lambda } as the neighborhood range. ε {\displaystyle \varepsilon } and λ {\displaystyle \lambda } are reduced with increasing t {\displaystyle t} so that the algorithm converges after many adaptation steps. The adaptation step of the neural gas can be interpreted as gradient descent on a cost function. By adapting not only the closest feature vector but all of them with a step size decreasing with increasing distance order, compared to (online) k-means clustering a much more robust convergence of the algorithm can be achieved. The neural gas model does not delete a node and also does not create new nodes. === Comparison with SOM === Compared to self-organized map, the neural gas model does not assume that some vectors are neighbors. If two vectors happen to be close together, they would tend to move together, and if two vectors happen to be apart, they would tend to not move together. In contrast, in an SOM, if two vectors are neighbors in the underlying graph, then they will always tend to move together, no matter whether the two vectors happen to be neighbors in the Euclidean space. The name "neural gas" is because one can imagine it to be what an SOM would be like if there is no underlying graph, and all points are free to move without the bonds that bind them together. == Variants == A number of variants of the neural gas algorithm exists in the literature so as to mitigate some of its shortcomings. More notable is perhaps Bernd Fritzke's growing neural gas, but also one should mention further elaborations such as the Growing When Required network and also the incremental growing neural gas. A performance-oriented approach that avoids the risk of overfitting is the Plastic Neural gas model. === Growing neural gas === Fritzke describes the growing neural gas (GNG) as an incremental network model that learns topological relations by using a "Hebb-like learning rule", only, unlike the neural gas, it has no parameters that change over time and it is capable of continuous learning, i.e. learning on data streams. GNG has been widely used in several domains, demonstrating its capabilities for clustering data incrementally. The GNG is initialized with two randomly positioned nodes which are initially connected with a zero age edge and whose errors are set to 0. Since in the GNG input data is presented sequentially one by one, the following steps are followed at each iteration: It is calculating the errors (distances) between the two closest nodes to the current input data. The error of the winner node (only the closest one) is respectively accumulated. The winner node and its topological neighbors (connected by an edge) are moving towards the current input by different fractions of their respective errors. The age of all edges connected to the winner node are incremented. If the winner node and the second-winner are connected by an edge, such an edge is set to 0. Else, an edge is created between them. If there are edges with an age larger than a threshold, they are removed. Nodes without connections are eliminated. If the current iteration is an integer multiple of a predefined frequency-creation threshold, a new node is inserted between the node with the largest error (among all) and its topological neighbor presenting the highest error. The link between the former and the latter nodes is eliminated (their errors are decreased by a given factor) and the new node is connected to both of them. The error of the new node is initialized as the updated error of the node which had the largest error (among all). The accumulated error of all nodes is decreased by a given factor. If the stopping criterion is not met, the algorithm takes a following input. The criterion might be a given number of epochs, i.e., a pre-set number of times where all data is presented, or the reach of a maximum number of nodes. === Incremental growing neural gas === Another neural gas variant inspired by the GNG algorithm is the incremental growing neural gas (IGNG). The authors propose the main advantage of this algorithm to be "learning new data (plasticity) without degrading the previously trained network and forgetting the old input data (stability)." === Growing when required === Having a network with a growing set of nodes, like the one implemented by the GNG algorithm was seen as a great advantage, however some limitation on the learning was seen by the introduction of the parameter λ, in which the network would only be able to grow when iterations were a multiple of this parameter. The proposal to mitigate this problem was a new algorithm, the Growing When Required network (GWR), which would have the network grow more quickly, by adding nodes as quickly as possible whenever the network identified that the existing nodes would not describe the input well enough. === Plastic neural gas === The ability to only grow a network may quickly introduce overfitting; on the other hand, removing nodes on the basis of age only, as in the GNG model, does not ensure that the removed nodes are actually useless, because removal depends on a model parameter that should be carefully tuned to the "memory length" of the stream of input data. The "Plastic Neural Gas" model solves this problem by making decisions to add or remove nodes using an unsupervised version of cross-validation, which controls an equivalent notion of "generalization ability" for the unsupervised setting. While growing-only methods only cater for the incremental learning scenario, the ability to grow and shrink is suited to the more general streaming data problem. == Implementations == To find the ranking i 0 , i 1 , … , i N − 1 {\displaystyle i_{0},i_{1},\ldots ,i_{N-1}} of the feature vectors, the neural gas algorithm involves sorting, which is a procedure that does not lend itself easily to parallelization or implementation in analog hardware. However, implementations in both parallel software and analog hardware were actually designed.

    Read more →
  • Genetic operator

    Genetic operator

    A genetic operator is an operator used in evolutionary algorithms (EA) to guide the algorithm towards a solution to a given problem. There are three main types of operators (mutation, crossover and selection), which must work in conjunction with one another in order for the algorithm to be successful. Genetic operators are used to create and maintain genetic diversity (mutation operator), combine existing solutions (also known as chromosomes) into new solutions (crossover) and select between solutions (selection). The classic representatives of evolutionary algorithms include genetic algorithms, evolution strategies, genetic programming and evolutionary programming. In his book discussing the use of genetic programming for the optimization of complex problems, computer scientist John Koza has also identified an 'inversion' or 'permutation' operator; however, the effectiveness of this operator has never been conclusively demonstrated and this operator is rarely discussed in the field of genetic programming. For combinatorial problems, however, these and other operators tailored to permutations are frequently used by other EAs. Mutation (or mutation-like) operators are said to be unary operators, as they only operate on one chromosome at a time. In contrast, crossover operators are said to be binary operators, as they operate on two chromosomes at a time, combining two existing chromosomes into one new chromosome. == Operators == Genetic variation is a necessity for the process of evolution. Genetic operators used in evolutionary algorithms are analogous to those in the natural world: survival of the fittest, or selection; reproduction (crossover, also called recombination); and mutation. === Selection === Selection operators give preference to better candidate solutions (chromosomes), allowing them to pass on their 'genes' to the next generation (iteration) of the algorithm. The best solutions are determined using some form of objective function (also known as a 'fitness function' in evolutionary algorithms), before being passed to the crossover operator. Different methods for choosing the best solutions exist, for example, fitness proportionate selection and tournament selection. A further or the same selection operator is used to determine the individuals for being selected to form the next parental generation. The selection operator may also ensure that the best solution(s) from the current generation always become(s) a member of the next generation without being altered; this is known as elitism or elitist selection. === Crossover === Crossover is the process of taking more than one parent solutions (chromosomes) and producing a child solution from them. By recombining portions of good solutions, the evolutionary algorithm is more likely to create a better solution. As with selection, there are a number of different methods for combining the parent solutions, including the edge recombination operator (ERO) and the 'cut and splice crossover' and 'uniform crossover' methods. The crossover method is often chosen to closely match the chromosome's representation of the solution; this may become particularly important when variables are grouped together as building blocks, which might be disrupted by a non-respectful crossover operator. Similarly, crossover methods may be particularly suited to certain problems; the ERO is considered a good option for solving the travelling salesman problem. === Mutation === The mutation operator encourages genetic diversity amongst solutions and attempts to prevent the evolutionary algorithm converging to a local minimum by stopping the solutions becoming too close to one another. In mutating the current pool of solutions, a given solution may change between slightly and entirely from the previous solution. By mutating the solutions, an evolutionary algorithm can reach an improved solution solely through the mutation operator. Again, different methods of mutation may be used; these range from a simple bit mutation (flipping random bits in a binary string chromosome with some low probability) to more complex mutation methods in which genes in the solution are changed, for example by adding a random value from the Gaussian distribution to the current gene value. As with the crossover operator, the mutation method is usually chosen to match the representation of the solution within the chromosome. == Combining operators == While each operator acts to improve the solutions produced by the evolutionary algorithm working individually, the operators must work in conjunction with each other for the algorithm to be successful in finding a good solution. Using the selection operator on its own will tend to fill the solution population with copies of the best solution from the population. If the selection and crossover operators are used without the mutation operator, the algorithm will tend to converge to a local minimum, that is, a good but sub-optimal solution to the problem. Using the mutation operator on its own leads to a random walk through the search space. Only by using all three operators together can the evolutionary algorithm become a noise-tolerant global search algorithm, yielding good solutions to the problem at hand.

    Read more →
  • Quantum machine learning

    Quantum machine learning

    Quantum machine learning (QML) is the study of quantum algorithms for machine learning. It often refers to quantum algorithms for machine learning tasks which analyze classical data, sometimes called quantum-enhanced machine learning. QML algorithms use qubits and quantum operations to try to improve the space and time complexity of classical machine learning algorithms. Hybrid QML methods involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device. These routines can be more complex in nature and executed faster on a quantum computer. Furthermore, quantum algorithms can be used to analyze quantum states instead of classical data. The term "quantum machine learning" is sometimes used to refer classical machine learning methods applied to data generated from quantum experiments (i.e. machine learning of quantum systems), such as learning the phase transitions of a quantum system or creating new quantum experiments. QML also extends to a branch of research that explores methodological and structural similarities between certain physical systems and learning systems, in particular neural networks. For example, some mathematical and numerical techniques from quantum physics are applicable to classical deep learning and vice versa. Furthermore, researchers investigate more abstract notions of learning theory with respect to quantum information, sometimes referred to as "quantum learning theory". == Machine learning with quantum computers == Quantum-enhanced machine learning refers to quantum algorithms that solve tasks in machine learning, thereby improving and often expediting classical machine learning techniques. Such algorithms typically require one to encode the given classical data set into a quantum computer to make it accessible for quantum information processing. Subsequently, quantum information processing routines are applied and the result of the quantum computation is read out by measuring the quantum system. For example, the outcome of the measurement of a qubit reveals the result of a binary classification task. While many proposals of QML algorithms are still purely theoretical and require a full-scale universal quantum computer to be tested, others have been implemented on small-scale or special purpose quantum devices. === Quantum associative memories and quantum pattern recognition === Early work on quantum associative memories has been done by Dan Ventura and Tony Martinez and by Carlo A. Trugenberger in the late 1990s and early 2000s. Associative (or content-addressable) memories are able to recognize stored content on the basis of a similarity measure, while random access memories are accessed by the address of stored information and not its content. As such they must be able to retrieve both incomplete and corrupted patterns, the essential machine learning task of pattern recognition. Typical classical associative memories store p patterns in the O ( n 2 ) {\displaystyle O(n^{2})} interactions (synapses) of a real, symmetric energy matrix over a network of n artificial neurons. The encoding is such that the desired patterns are local minima of the energy functional and retrieval is done by minimizing the total energy, starting from an initial configuration. Unfortunately, classical associative memories are severely limited by the phenomenon of cross-talk. When too many patterns are stored, spurious memories appear which quickly proliferate, so that the energy landscape becomes disordered and no retrieval is anymore possible. The number of storable patterns is typically limited by a linear function of the number of neurons, p ≤ O ( n ) {\displaystyle p\leq O(n)} . Quantum associative memories (in their simplest realization) store patterns in a unitary matrix U acting on the Hilbert space of n qubits. Retrieval is realized by the unitary evolution of a fixed initial state to a quantum superposition of the desired patterns with probability distribution peaked on the most similar pattern to an input. By its very quantum nature, the retrieval process is thus probabilistic. Because quantum associative memories are free from cross-talk, however, spurious memories are never generated. Correspondingly, they have a superior capacity than classical ones. The number of parameters in the unitary matrix U is O ( p n ) {\displaystyle O(pn)} . One can thus have efficient, spurious-memory-free quantum associative memories for any polynomial number of patterns. If the matrix U is encoded as a unique operator (as opposed as to a sequence of gates as in the circuit model), e.g. by an optical interferometer, the retrieval becomes efficient even for an exponential number of patterns. === Linear algebra simulation with quantum amplitudes === A number of quantum algorithms for machine learning are based on the idea of amplitude encoding, that is, to associate the amplitudes of a quantum state with the inputs and outputs of computations. Since a state of n {\displaystyle n} qubits is described by 2 n {\displaystyle 2^{n}} complex amplitudes, this information encoding can allow for an exponentially compact representation. Intuitively, this corresponds to associating a discrete probability distribution over binary random variables with a classical vector. The goal of algorithms based on amplitude encoding is to formulate quantum algorithms whose resources grow polynomially in the number of qubits n {\displaystyle n} , which amounts to a logarithmic time complexity in the number of amplitudes and thereby the dimension of the input. Many QML algorithms in this category are based on variations of the quantum algorithm for linear systems of equations (colloquially called HHL, after the paper's authors) which, under specific conditions, performs a matrix inversion using an amount of physical resources growing only logarithmically in the dimensions of the matrix. One of these conditions is that a Hamiltonian which entry-wise corresponds to the matrix can be simulated efficiently, which is known to be possible if the matrix is sparse or low rank. For reference, any known classical algorithm for matrix inversion requires a number of operations that grows more than quadratically in the dimension of the matrix (e.g. O ( n 2.373 ) {\displaystyle O{\mathord {\left(n^{2.373}\right)}}} ), but they are not restricted to sparse matrices. Quantum matrix inversion can be applied to machine learning methods in which the training reduces to solving a linear system of equations, for example in least-squares linear regression, the least-squares version of support vector machines, and Gaussian processes. A crucial bottleneck of methods that simulate linear algebra computations with the amplitudes of quantum states is state preparation, which often requires one to initialise a quantum system in a state whose amplitudes reflect the features of the entire dataset. Although efficient methods for state preparation are known for specific cases, this step easily hides the complexity of the task. === Variational quantum algorithms (VQAs) === In a variational quantum algorithm, a classical computer optimizes the parameters used to prepare a quantum state, while a quantum computer is used to do the actual state preparation and measurement. VQAs are considered promising candidates for noisy intermediate-scale quantum computers. Variational quantum circuits (or parameterized quantum circuits) are a popular class of VQAs where the parameters are those used in a fixed quantum circuit. Researchers have studied VQCs to solve optimization problems and find the ground state energy of complex quantum systems, which were difficult to solve using a classical computer. === Quantum binary classifier === Pattern reorganization is one of the important tasks of machine learning, binary classification is one of the tools or algorithms to find patterns. Binary classification is used in supervised learning and in unsupervised learning. In QML, classical bits are converted to qubits and they are mapped to Hilbert space; complex value data are used in a quantum binary classifier to use the advantage of Hilbert space. By exploiting the quantum mechanic properties such as superposition, entanglement, interference the quantum binary classifier produces the accurate result in short period of time. === Quantum machine learning algorithms based on Grover search === Another approach to improving classical machine learning with quantum information processing uses amplitude amplification methods based on Grover's search algorithm, which has been shown to solve unstructured search problems with a quadratic speedup compared to classical algorithms. These quantum routines can be employed for learning algorithms that translate into an unstructured search task, as can be done, for instance, in the case of the k-medians and the k-nearest neighbors algorithms. Other applications include quadratic speedups in the training of perceptrons. An e

    Read more →
  • Prescription monitoring program

    Prescription monitoring program

    In the United States, prescription monitoring programs (PMPs) or prescription drug monitoring programs (PDMPs) are state-run programs which collect and distribute data about the prescription and dispensation of federally controlled substances and, depending on state requirements, other potentially abusable prescription drugs. PMPs are meant to help prevent adverse drug-related events such as opioid overdoses, drug diversion, and substance abuse by decreasing the amount and/or frequency of opioid prescribing, and by identifying those patients who are obtaining prescriptions from multiple providers (i.e., "doctor shopping") or those physicians overprescribing opioids. Most US health care workers support the idea of PMPs, which intend to assist physicians, physician assistants, nurse practitioners, dentists and other prescribers, the pharmacists, chemists and support staff of dispensing establishments. The database, whose use is required by State law, typically requires prescribers and pharmacies dispensing controlled substances to register with their respective state PMPs and (for pharmacies and providers who dispense from their offices) to report the dispensation of such prescriptions to an electronic online database. The majority of PMPs are authorized to notify law enforcement agencies or licensing boards or physicians when a prescriber, or patients receiving prescriptions, exceed thresholds established by the state or prescription recipient exceeds thresholds established by the State. All states have implemented PDMPs, although evidence for the effectiveness of these programs is mixed. While prescription of opioids has decreased with PMP use, overdose deaths in many states have actually increased, with those states sharing data with neighboring jurisdictions or requiring reporting of more drugs experiencing highest increases in deaths. This may be because those declined opioid prescriptions turn to street drugs, whose potency and contaminants carry greater overdose risk. == History == Prescription drug monitoring programs, or PDMPs, are an example of one initiative proposed to alleviate effects of the opioid crisis. The programs are designed to restrict prescription drug abuse by limiting a patient's ability to obtain similar prescriptions from multiple providers (i.e. “doctor shopping”) and reducing diversion of controlled substances. This is meant to reduce risk of fatal overdose caused by high doses of opioids or interactions between opioids and benzodiazepenes, and to enable better decision making on the part of healthcare providers who may be unaware of a patient's prescription drug use, history or other prescriptions. PDMPs have been implemented in state legislations since 1939 in California, a time before electronic medical records, though implementation rose alongside increased awareness of overprescribing of opioids and overdose. A later New York state program was struck down by the U.S. Supreme Court in Whalen v. Roe. But, by 2019, 49 states, the District of Columbia, and Guam had enacted PDMP legislation. In 2021 Missouri, the last State to not use a PMP, adopted legislation to create one. PMPs are constantly being updated to increase speed of data collection, sharing of data across States, and ease of interpretation. This is being done by integrating PDMP reports with other health information technologies such as health information exchanges (HIE), electronic health record (EHR) systems, and/ or pharmacy dispensing software systems. One program that has been implemented in nine states is called the PDMP Electronic Health Records Integration and Interoperability Expansion, also known as PEHRIIE. Another software, marketed by Bamboo Health and integrated with PMPs in 43 states, uses an algorithm to track factors thought to increase risk of diversion, abuse or overdose, and assigns patients a three digit score based on presumed indicators of risk. While some studies have suggested that PDMP-HIT integration and sharing of interstate data brings benefits such as reduced opioid-related inpatient morbidity, others have found no or negative impact on mortality compared to states without PMP data sharing. Patient and media reports suggest need for testing and evaluation of algorithmic software used to score risk, with some patients reporting denial of prescriptions without c explanation or clarity of data. == Goals == Most health care workers support PMPs which intend to assist physicians, physician assistants, nurse practitioners, dentists and other prescribers, the pharmacists, chemists and support staff of dispensing establishments, as well as law-enforcement agencies. The collaboration supports the legitimate medical use of controlled substances while limiting their abuse and diversion. Pharmacies dispensing controlled substances and prescribers typically must register with their respective state PMPs and (for pharmacies and providers who dispense controlled substances from their offices) report the dispensation to an electronic online database. Some pharmacy software can submit these reports automatically to multiple states. == Usage == === List of programs by state === === Software systems === NarxCare is a prescription drug monitoring program (PDMP) run by Bamboo Health. Bamboo Health was formerly known as Appriss. It is widely used across the United States by pharmacies including Rite Aid as well as those at Walmart and Sam’s Club. The NarxCare software allows doctors to view data about a patient, combining data from the prescription registries of various U.S. states to make the registries interoperable nationally. It also uses machine learning to generate an "Overdose Risk Score" that potentially includes EMS and criminal justice data; these scores have been criticized by researchers and patient advocates for the lack of transparency in the process as well as the potential for disparate treatment of women and minority groups. Advertised as an "analytics tool and care management platform", the NarxCare software allows doctors to view data about a patient including how many pharmacies they have visited and the combinations of medication they are prescribed. It combines data from the prescription registries of various U.S. states, making the registries interoperable nationally. It additionally uses machine learning to generate various three-digit "risk scores" and an overall "Overdose Risk Score", collectively referred to as Narx Scores, in a process that potentially includes EMS and criminal justice data as well as court records. == Controversy == Many doctors and researchers support the idea of PDMPs as a tool in combatting the opioid epidemic. Opioid prescribing, opioid diversion and supply, opioid misuse, and opioid-related morbidity and mortality are common elements in data entered into PDMPs. Prescription Monitoring Programs are purported to offer economic benefits for the states who implement them by decreasing overall health care costs, lost productivity, and investigation times. However, there are many studies that conclude the impact of PDMPs is unclear. While use of PMPs has been accompanied by decrease in opioid prescribing, few analyses consider corresponding use of street opioids, extramedical use, or diversion, which might provide a more holistic method for evaluation of PMP intent and efficacy. Evidence for PDMP impact on fatal overdoses is decidedly mixed, with multiple studies finding increased overdose rates in some states, decreases in others, or no clear impact. Interestingly, an increase in heroin overdoses after PDMP implementation has been commonly reported, presumably as denial of prescription opioids sends patients in search of street drugs. Narx Scores have been criticized by researchers and patient advocates for the lack of transparency in the generation process as well as the potential for disparate treatment of women and minority groups. Writing in Duke Law Journal, Jennifer Oliva stated that "black-box algorithms" are used to generate the scores.

    Read more →
  • Apache Mahout

    Apache Mahout

    Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala libraries for common math operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; a number of algorithms have been implemented. == Features == === Samsara === Apache Mahout-Samsara refers to a Scala domain-specific language (DSL) that allows users to use R-like syntax as opposed to traditional Scala-like syntax. This allows user to express algorithms concisely and clearly. === Backend agnostic === Apache Mahout's code abstracts the domain-specific language from the engine where the code is run. While active development is done with the Apache Spark engine, users are free to implement any engine they choose- H2O and Apache Flink have been implemented in the past and examples exist in the code base. === GPU/CPU accelerators === The JVM has notoriously slow computation. To improve speed, "native solvers" were added which move in-core, and by extension, distributed BLAS operations out of the JVM, offloading to off-heap or GPU memory for processing via multiple CPUs and/or CPU cores, or GPUs when built against the ViennaCL library. ViennaCL is a highly optimized C++ library with BLAS operations implemented in OpenMP, and OpenCL. As of release 14.1, the OpenMP build considered to be stable, leaving the OpenCL build is still in its experimental proof-of-concept phase. === Recommenders === Apache Mahout features implementations of Alternating Least Squares, Co-Occurrence, and Correlated Co-Occurrence, a unique-to-Mahout recommender algorithm that extends co-occurrence to be used on multiple dimensions of data. == History == === Transition from Map Reduce to Apache Spark === While Mahout's core algorithms for clustering, classification and batch based collaborative filtering were implemented on top of Apache Hadoop using the map/reduce paradigm, it did not restrict contributions to Hadoop-based implementations. Contributions that run on a single node or on a non-Hadoop cluster were also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project and can run stand-alone without Hadoop. Starting with the release 0.10.0, the project shifted its focus to building a backend-independent programming environment, code named "Samsara". The environment consists of an algebraic backend-independent optimizer and an algebraic Scala DSL unifying in-memory and distributed algebraic operators. Supported algebraic platforms are Apache Spark, H2O, and Apache Flink. Support for MapReduce algorithms started being gradually phased out in 2014. === Release history === === Developers === Apache Mahout is developed by a community. The project is managed by a group called the "Project Management Committee" (PMC). The current PMC is Andrew Musselman, Andrew Palumbo, Drew Farris, Isabel Drost-Fromm, Jake Mannix, Pat Ferrel, Paritosh Ranjan, Trevor Grant, Robin Anil, Sebastian Schelter, Stevo Slavić.

    Read more →
  • Diffusion map

    Diffusion map

    Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by Coifman and Lafon which computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data. The Euclidean distance between points in the embedded space is equal to the "diffusion distance" between probability distributions centered at those points. Different from linear dimensionality reduction methods such as principal component analysis (PCA), diffusion maps are part of the family of nonlinear dimensionality reduction methods which focus on discovering the underlying manifold that the data has been sampled from. By integrating local similarities at different scales, diffusion maps give a global description of the data-set. Compared with other methods, the diffusion map algorithm is robust to noise perturbation and computationally inexpensive. == Definition of diffusion maps == Following and , diffusion maps can be defined in four steps. === Connectivity === Diffusion maps exploit the relationship between heat diffusion and random walk Markov chain. The basic observation is that if we take a random walk on the data, walking to a nearby data-point is more likely than walking to another that is far away. Let ( X , A , μ ) {\displaystyle (X,{\mathcal {A}},\mu )} be a measure space, where X {\displaystyle X} is the data set and μ {\displaystyle \mu } represents the distribution of the points on X {\displaystyle X} . Based on this, the connectivity k {\displaystyle k} between two data points, x {\displaystyle x} and y {\displaystyle y} , can be defined as the probability of walking from x {\displaystyle x} to y {\displaystyle y} in one step of the random walk. Usually, this probability is specified in terms of a kernel function of the two points: k : X × X → R {\displaystyle k:X\times X\rightarrow \mathbb {R} } . For example, the popular Gaussian kernel: k ( x , y ) = exp ⁡ ( − | | x − y | | 2 ϵ ) {\displaystyle k(x,y)=\exp \left(-{\frac {||x-y||^{2}}{\epsilon }}\right)} More generally, the kernel function has the following properties k ( x , y ) = k ( y , x ) {\displaystyle k(x,y)=k(y,x)} ( k {\displaystyle k} is symmetric) k ( x , y ) ≥ 0 ∀ x , y {\displaystyle k(x,y)\geq 0\,\,\forall x,y} ( k {\displaystyle k} is positivity preserving). The kernel constitutes the prior definition of the local geometry of the data-set. Since a given kernel will capture a specific feature of the data set, its choice should be guided by the application that one has in mind. This is a major difference with methods such as principal component analysis, where correlations between all data points are taken into account at once. Given ( X , k ) {\displaystyle (X,k)} , we can then construct a reversible discrete-time Markov chain on X {\displaystyle X} (a process known as the normalized graph Laplacian construction): d ( x ) = ∫ X k ( x , y ) d μ ( y ) {\displaystyle d(x)=\int _{X}k(x,y)d\mu (y)} and define: p ( x , y ) = k ( x , y ) d ( x ) {\displaystyle p(x,y)={\frac {k(x,y)}{d(x)}}} Although the new normalized kernel does not inherit the symmetric property, it does inherit the positivity-preserving property and gains a conservation property: ∫ X p ( x , y ) d μ ( y ) = 1 {\displaystyle \int _{X}p(x,y)d\mu (y)=1} === Diffusion process === From p ( x , y ) {\displaystyle p(x,y)} we can construct a transition matrix of a Markov chain ( M {\displaystyle M} ) on X {\displaystyle X} . In other words, p ( x , y ) {\displaystyle p(x,y)} represents the one-step transition probability from x {\displaystyle x} to y {\displaystyle y} , and M t {\displaystyle M^{t}} gives the t-step transition matrix. We define the diffusion matrix L {\displaystyle L} (it is also a version of graph Laplacian matrix) L i , j = k ( x i , x j ) {\displaystyle L_{i,j}=k(x_{i},x_{j})\,} We then define the new kernel L i , j ( α ) = k ( α ) ( x i , x j ) = L i , j ( d ( x i ) d ( x j ) ) α {\displaystyle L_{i,j}^{(\alpha )}=k^{(\alpha )}(x_{i},x_{j})={\frac {L_{i,j}}{(d(x_{i})d(x_{j}))^{\alpha }}}\,} or equivalently, L ( α ) = D − α L D − α {\displaystyle L^{(\alpha )}=D^{-\alpha }LD^{-\alpha }\,} where D is a diagonal matrix and D i , i = ∑ j L i , j . {\displaystyle D_{i,i}=\sum _{j}L_{i,j}.} We apply the graph Laplacian normalization to this new kernel: M = ( D ( α ) ) − 1 L ( α ) , {\displaystyle M=({D}^{(\alpha )})^{-1}L^{(\alpha )},\,} where D ( α ) {\displaystyle D^{(\alpha )}} is a diagonal matrix and D i , i ( α ) = ∑ j L i , j ( α ) . {\displaystyle {D}_{i,i}^{(\alpha )}=\sum _{j}L_{i,j}^{(\alpha )}.} p ( x j , t | x i ) = M i , j t {\displaystyle p(x_{j},t|x_{i})=M_{i,j}^{t}\,} One of the main ideas of the diffusion framework is that running the chain forward in time (taking larger and larger powers of M {\displaystyle M} ) reveals the geometric structure of X {\displaystyle X} at larger and larger scales (the diffusion process). Specifically, the notion of a cluster in the data set is quantified as a region in which the probability of escaping this region is low (within a certain time t). Therefore, t not only serves as a time parameter, but it also has the dual role of scale parameter. The eigendecomposition of the matrix M t {\displaystyle M^{t}} yields M i , j t = ∑ l λ l t ψ l ( x i ) ϕ l ( x j ) {\displaystyle M_{i,j}^{t}=\sum _{l}\lambda _{l}^{t}\psi _{l}(x_{i})\phi _{l}(x_{j})\,} where { λ l } {\displaystyle \{\lambda _{l}\}} is the sequence of eigenvalues of M {\displaystyle M} and { ψ l } {\displaystyle \{\psi _{l}\}} and { ϕ l } {\displaystyle \{\phi _{l}\}} are the biorthogonal left and right eigenvectors respectively. Due to the spectrum decay of the eigenvalues, only a few terms are necessary to achieve a given relative accuracy in this sum. ==== Parameter α and the diffusion operator ==== The reason to introduce the normalization step involving α {\displaystyle \alpha } is to tune the influence of the data point density on the infinitesimal transition of the diffusion. In some applications, the sampling of the data is generally not related to the geometry of the manifold we are interested in describing. In this case, we can set α = 1 {\displaystyle \alpha =1} and the diffusion operator approximates the Laplace–Beltrami operator. We then recover the Riemannian geometry of the data set regardless of the distribution of the points. To describe the long-term behavior of the point distribution of a system of stochastic differential equations, we can use α = 0.5 {\displaystyle \alpha =0.5} and the resulting Markov chain approximates the Fokker–Planck diffusion. With α = 0 {\displaystyle \alpha =0} , it reduces to the classical graph Laplacian normalization. === Diffusion distance === The diffusion distance at time t {\displaystyle t} between two points can be measured as the similarity of two points in the observation space with the connectivity between them. It is given by D t ( x i , x j ) 2 = ∑ y ( p ( y , t | x i ) − p ( y , t | x j ) ) 2 ϕ 0 ( y ) {\displaystyle D_{t}(x_{i},x_{j})^{2}=\sum _{y}{\frac {(p(y,t|x_{i})-p(y,t|x_{j}))^{2}}{\phi _{0}(y)}}} where ϕ 0 ( y ) {\displaystyle \phi _{0}(y)} is the stationary distribution of the Markov chain, given by the first left eigenvector of M {\displaystyle M} . Explicitly: ϕ 0 ( y ) = d ( y ) ∑ z ∈ X d ( z ) {\displaystyle \phi _{0}(y)={\frac {d(y)}{\sum _{z\in X}d(z)}}} Intuitively, D t ( x i , x j ) {\displaystyle D_{t}(x_{i},x_{j})} is small if there is a large number of short paths connecting x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} . There are several interesting features associated with the diffusion distance, based on our previous discussion that t {\displaystyle t} also serves as a scale parameter: Points are closer at a given scale (as specified by D t ( x i , x j ) {\displaystyle D_{t}(x_{i},x_{j})} ) if they are highly connected in the graph, therefore emphasizing the concept of a cluster. This distance is robust to noise, since the distance between two points depends on all possible paths of length t {\displaystyle t} between the points. From a machine learning point of view, the distance takes into account all evidences linking x i {\displaystyle x_{i}} to x j {\displaystyle x_{j}} , allowing us to conclude that this distance is appropriate for the design of inference algorithms based on the majority of preponderance. === Diffusion process and low-dimensional embedding === The diffusion distance can be calculated using the eigenvectors by D t ( x i , x j ) 2 = ∑ l λ l 2 t ( ψ l ( x i ) − ψ l ( x j ) ) 2 {\displaystyle D_{t}(x_{i},x_{j})^{2}=\sum _{l}\lambda _{l}^{2t}(\psi _{l}(x_{i})-\psi _{l}(x_{j}))^{2}\,} So the eigenvectors can be used as a new set of coordinates for the data. The diffusion map is defined as: Ψ t ( x ) = ( λ 1 t ψ 1 ( x ) , λ 2 t ψ 2 ( x ) , … , λ k t ψ k ( x ) ) {\displaystyle \Psi _{t}(x)=(\lambda _{1}^{t}\psi _{1}(x),\lambda _{2}^{t}\psi _{2}(x),\ld

    Read more →
  • GasBuddy

    GasBuddy

    GasBuddy is a technology company headquartered in Dallas, United States, that offers mobile applications and websites for tracking crowd-sourced locations and prices of gas stations and convenience stores in the United States and Canada. Their platforms offer information sourced from users, gas station operators, and partner companies. They also provide business-to-business services to gas stations and convenience store owners. == History == GasBuddy was founded in Minneapolis in 2000 by Dustin Coupal, Jason Toews as a community website for sharing gas prices. In 2004, they filed as a for-profit corporation in Minnesota under the name GasBuddy Organization Inc. In 2009, GasBuddy launched OpenStore, a platform that allows convenience stores to build and manage their own mobile apps. In 2010, the company launched its own mobile apps that allowed users to input gas prices from their smartphones. In 2013, Oil Price Information Service (OPIS), a subsidiary of UCG, acquired GasBuddy. OPIS is a provider of petroleum pricing and news for businesses. In 2016, IHS acquired OPIS, separating from GasBuddy, which remained with UCG as a subsidiary company. Initially only available in the United States and Canada, GasBuddy launched in Australia in March 2016. Also in that year, GasBuddy released a completely redesigned app, its first major redesign since its release in 2010. GasBuddy also unveiled a new logo and launched GasBuddy Business Pages. GasBuddy shut down the Australian version of their app in 2022. In 2017, GasBuddy launched a gas savings program titled "Pay with GasBuddy" intended to let consumers save at gas stations in the United States. In the same year, GasBuddy was involved in a lawsuit with Reveal Mobile, a location-based marketing company, over the sale of user location data. It was revealed that GasBuddy sold information on more than 4.5 million users to Reveal each month for $9.50 per 1000 users. According to CNET, that information included "users' latitude, longitude, IP address, and time stamps on the data collected," which sparked concern in the media and between its users. In 2021, the GasBuddy app rose to the most popular app on both Android and iPhone platforms in the wake of the Colonial Pipeline ransomware attack PDI acquired GasBuddy in 2021.

    Read more →
  • Markov model

    Markov model

    In probability theory, a Markov model is a stochastic model used to model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, it assumes the Markov property). Generally, this assumption enables reasoning and computation with the model that would otherwise be intractable. For this reason, in the fields of predictive modelling and probabilistic forecasting, it is desirable for a given model to exhibit the Markov property. == Introduction == Andrey Andreyevich Markov (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research later became known as the Markov chain. There are four common Markov models used in different situations, depending on whether every sequential state is observable or not, and whether the system is to be adjusted on the basis of observations made: == Markov chain == The simplest Markov model is the Markov chain. It models the state of a system with a random variable that changes through time. In this context, the Markov property indicates that the distribution for this variable depends only on the distribution of a previous state. An example use of a Markov chain is Markov chain Monte Carlo, which uses the Markov property to prove that a particular method for performing a random walk will sample from the joint distribution. == Hidden Markov model == A hidden Markov model is a Markov chain for which the state is only partially observable or noisily observable. In other words, observations are related to the state of the system, but they are typically insufficient to precisely determine the state. Several well-known algorithms for hidden Markov models exist. For example, given a sequence of observations, the Viterbi algorithm will compute the most-likely corresponding sequence of states, the forward algorithm will compute the probability of the sequence of observations, and the Baum–Welch algorithm will estimate the starting probabilities, the transition function, and the observation function of a hidden Markov model. One common use is for speech recognition, where the observed data is the speech audio waveform and the hidden state is the spoken text. In this example, the Viterbi algorithm finds the most likely sequence of spoken words given the speech audio. == Markov decision process == A Markov decision process is a Markov chain in which state transitions depend on the current state and an action vector that is applied to the system. Typically, a Markov decision process is used to compute a policy of actions that will maximize some utility with respect to expected rewards. == Partially observable Markov decision process == A partially observable Markov decision process (POMDP) is a Markov decision process in which the state of the system is only partially observed. POMDPs are known to be NP complete, but recent approximation techniques have made them useful for a variety of applications, such as controlling simple agents or robots. == Markov random field == A Markov random field, or Markov network, may be considered to be a generalization of a Markov chain in multiple dimensions. In a Markov chain, state depends only on the previous state in time, whereas in a Markov random field, each state depends on its neighbors in any of multiple directions. A Markov random field may be visualized as a field or graph of random variables, where the distribution of each random variable depends on the neighboring variables with which it is connected. More specifically, the joint distribution for any random variable in the graph can be computed as the product of the "clique potentials" of all the cliques in the graph that contain that random variable. Modeling a problem as a Markov random field is useful because it implies that the joint distributions at each vertex in the graph may be computed in this manner. == Hierarchical Markov models == Hierarchical Markov models can be applied to categorize human behavior at various levels of abstraction. For example, a series of simple observations, such as a person's location in a room, can be interpreted to determine more complex information, such as in what task or activity the person is performing. Two kinds of Hierarchical Markov Models are the Hierarchical hidden Markov model and the Abstract Hidden Markov Model. Both have been used for behavior recognition and certain conditional independence properties between different levels of abstraction in the model allow for faster learning and inference. == Tolerant Markov model == A Tolerant Markov model (TMM) is a probabilistic-algorithmic Markov chain model. It assigns the probabilities according to a conditioning context that considers the last symbol, from the sequence to occur, as the most probable instead of the true occurring symbol. A TMM can model three different natures: substitutions, additions or deletions. Successful applications have been efficiently implemented in DNA sequences compression. == Markov-chain forecasting models == Markov-chains have been used as a forecasting methods for several topics, for example price trends, wind power and solar irradiance. The Markov-chain forecasting models utilize a variety of different settings, from discretizing the time-series to hidden Markov-models combined with wavelets and the Markov-chain mixture distribution model (MCM).

    Read more →
  • Sparse PCA

    Sparse PCA

    Sparse principal component analysis (SPCA or sparse PCA) is a technique used in statistical analysis and, in particular, in the analysis of multivariate data sets. It extends the classic method of principal component analysis (PCA) for the reduction of dimensionality of data by introducing sparsity structures to the input variables. A particular disadvantage of ordinary PCA is that the principal components are usually linear combinations of all input variables. SPCA overcomes this disadvantage by finding components that are linear combinations of just a few input variables (SPCs). This means that some of the coefficients of the linear combinations defining the SPCs, called loadings, are equal to zero. The number of nonzero loadings is called the cardinality of the SPC. == Mathematical formulation == Consider a data matrix, X {\displaystyle X} , where each of the p {\displaystyle p} columns represent an input variable, and each of the n {\displaystyle n} rows represents an independent sample from data population. One assumes each column of X {\displaystyle X} has mean zero, otherwise one can subtract column-wise mean from each element of X {\displaystyle X} . Let Σ = 1 n − 1 X ⊤ X {\displaystyle \Sigma ={\frac {1}{n-1}}X^{\top }X} be the empirical covariance matrix of X {\displaystyle X} , which has dimension p × p {\displaystyle p\times p} . Given an integer k {\displaystyle k} with 1 ≤ k ≤ p {\displaystyle 1\leq k\leq p} , the sparse PCA problem can be formulated as maximizing the variance along a direction represented by vector v ∈ R p {\displaystyle v\in \mathbb {R} ^{p}} while constraining its cardinality: max v T Σ v subject to ‖ v ‖ 2 = 1 ‖ v ‖ 0 ≤ k . {\displaystyle {\begin{aligned}\max \quad &v^{T}\Sigma v\\{\text{subject to}}\quad &\left\Vert v\right\Vert _{2}=1\\&\left\Vert v\right\Vert _{0}\leq k.\end{aligned}}} Eq. 1 The first constraint specifies that v is a unit vector. In the second constraint, ‖ v ‖ 0 {\displaystyle \left\Vert v\right\Vert _{0}} represents the ℓ 0 {\displaystyle \ell _{0}} pseudo-norm of v, which is defined as the number of its non-zero components. So the second constraint specifies that the number of non-zero components in v is less than or equal to k, which is typically an integer that is much smaller than dimension p. The optimal value of Eq. 1 is known as the k-sparse largest eigenvalue. If one takes k=p, the problem reduces to the ordinary PCA, and the optimal value becomes the largest eigenvalue of covariance matrix Σ. After finding the optimal solution v, one deflates Σ to obtain a new matrix Σ 1 = Σ − ( v T Σ v ) v v T , {\displaystyle \Sigma _{1}=\Sigma -(v^{T}\Sigma v)vv^{T},} and iterate this process to obtain further principal components. However, unlike PCA, sparse PCA cannot guarantee that different principal components are orthogonal. In order to achieve orthogonality, additional constraints must be enforced. The following equivalent definition is in matrix form. Let V {\displaystyle V} be a p×p symmetric matrix, one can rewrite the sparse PCA problem as max T r ( Σ V ) subject to T r ( V ) = 1 ‖ V ‖ 0 ≤ k 2 R a n k ( V ) = 1 , V ⪰ 0. {\displaystyle {\begin{aligned}\max \quad &Tr(\Sigma V)\\{\text{subject to}}\quad &Tr(V)=1\\&\Vert V\Vert _{0}\leq k^{2}\\&Rank(V)=1,V\succeq 0.\end{aligned}}} Eq. 2 Tr is the matrix trace, and ‖ V ‖ 0 {\displaystyle \Vert V\Vert _{0}} represents the non-zero elements in matrix V. The last line specifies that V has matrix rank one and is positive semidefinite. The last line means that one has V = v v T {\displaystyle V=vv^{T}} , so Eq. 2 is equivalent to Eq. 1. Moreover, the rank constraint in this formulation is actually redundant, and therefore sparse PCA can be cast as the following mixed-integer semidefinite program max T r ( Σ V ) subject to T r ( V ) = 1 | V i , i | ≤ z i , ∀ i ∈ { 1 , . . . , p } , | V i , j | ≤ 1 2 z i , ∀ i , j ∈ { 1 , . . . , p } : i ≠ j , V ⪰ 0 , z ∈ { 0 , 1 } p , ∑ i z i ≤ k {\displaystyle {\begin{aligned}\max \quad &Tr(\Sigma V)\\{\text{subject to}}\quad &Tr(V)=1\\&\vert V_{i,i}\vert \leq z_{i},\forall i\in \{1,...,p\},\vert V_{i,j}\vert \leq {\frac {1}{2}}z_{i},\forall i,j\in \{1,...,p\}:i\neq j,\\&V\succeq 0,z\in \{0,1\}^{p},\sum _{i}z_{i}\leq k\end{aligned}}} Eq. 3 Because of the cardinality constraint, the maximization problem is hard to solve exactly, especially when dimension p is high. In fact, the sparse PCA problem in Eq. 1 is NP-hard in the strong sense. == Computational considerations == As most sparse problems, variable selection in SPCA is a computationally intractable non-convex NP-hard problem, therefore greedy sub-optimal algorithms are often employed to find solutions. Note also that SPCA introduces hyperparameters quantifying in what capacity large parameter values are penalized. These might need tuning to achieve satisfactory performance, thereby adding to the total computational cost. == Algorithms for SPCA == Several alternative approaches (of Eq. 1) have been proposed, including a regression framework, a penalized matrix decomposition framework, a convex relaxation/semidefinite programming framework, a generalized power method framework an alternating maximization framework forward-backward greedy search and exact methods using branch-and-bound techniques, a certifiably optimal branch-and-bound approach Bayesian formulation framework. A certifiably optimal mixed-integer semidefinite branch-and-cut approach The methodological and theoretical developments of Sparse PCA as well as its applications in scientific studies are recently reviewed in a survey paper. === Notes on Semidefinite Programming Relaxation === It has been proposed that sparse PCA can be approximated by semidefinite programming (SDP). If one drops the rank constraint and relaxes the cardinality constraint by a 1-norm convex constraint, one gets a semidefinite programming relaxation, which can be solved efficiently in polynomial time: max T r ( Σ V ) subject to T r ( V ) = 1 1 T | V | 1 ≤ k V ⪰ 0. {\displaystyle {\begin{aligned}\max \quad &Tr(\Sigma V)\\{\text{subject to}}\quad &Tr(V)=1\\&\mathbf {1} ^{T}|V|\mathbf {1} \leq k\\&V\succeq 0.\end{aligned}}} Eq. 3 In the second constraint, 1 {\displaystyle \mathbf {1} } is a p×1 vector of ones, and |V| is the matrix whose elements are the absolute values of the elements of V. The optimal solution V {\displaystyle V} to the relaxed problem Eq. 3 is not guaranteed to have rank one. In that case, V {\displaystyle V} can be truncated to retain only the dominant eigenvector. While the semidefinite program does not scale beyond n=300 covariates, it has been shown that a second-order cone relaxation of the semidefinite relaxation is almost as tight and successfully solves problems with n=1000s of covariates == Applications == === Financial Data Analysis === Suppose ordinary PCA is applied to a dataset where each input variable represents a different asset, it may generate principal components that are weighted combination of all the assets. In contrast, sparse PCA would produce principal components that are weighted combination of only a few input assets, so one can easily interpret its meaning. Furthermore, if one uses a trading strategy based on these principal components, fewer assets imply less transaction costs. === Biology === Consider a dataset where each input variable corresponds to a specific gene. Sparse PCA can produce a principal component that involves only a few genes, so researchers can focus on these specific genes for further analysis. === High-dimensional Hypothesis Testing === Contemporary datasets often have the number of input variables ( p {\displaystyle p} ) comparable with or even much larger than the number of samples ( n {\displaystyle n} ). It has been shown that if p / n {\displaystyle p/n} does not converge to zero, the classical PCA is not consistent. In other words, if we let k = p {\displaystyle k=p} in Eq. 1, then the optimal value does not converge to the largest eigenvalue of data population when the sample size n → ∞ {\displaystyle n\rightarrow \infty } , and the optimal solution does not converge to the direction of maximum variance. But sparse PCA can retain consistency even if p ≫ n . {\displaystyle p\gg n.} The k-sparse largest eigenvalue (the optimal value of Eq. 1) can be used to discriminate an isometric model, where every direction has the same variance, from a spiked covariance model in high-dimensional setting. Consider a hypothesis test where the null hypothesis specifies that data X {\displaystyle X} are generated from a multivariate normal distribution with mean 0 and covariance equal to an identity matrix, and the alternative hypothesis specifies that data X {\displaystyle X} is generated from a spiked model with signal strength θ {\displaystyle \theta } : H 0 : X ∼ N ( 0 , I p ) , H 1 : X ∼ N ( 0 , I p + θ v v T ) , {\displaystyle H_{0}:X\sim N(0,I_{p}),\quad H_{1}:X\sim N(0,I_{p}+\theta vv^{T}),} where v ∈ R p {\displaystyle v\in \mathbb {R} ^{p}

    Read more →
  • Gaussian adaptation

    Gaussian adaptation

    Gaussian adaptation (GA), also called normal or natural adaptation (NA) is an evolutionary algorithm designed for the maximization of manufacturing yield due to statistical deviation of component values of signal processing systems. In short, GA is a stochastic adaptive process where a number of samples of an n-dimensional vector x[xT = (x1, x2, ..., xn)] are taken from a multivariate Gaussian distribution, N(m, M), having mean m and moment matrix M. The samples are tested for fail or pass. The first- and second-order moments of the Gaussian restricted to the pass samples are m and M. The outcome of x as a pass sample is determined by a function s(x), 0 < s(x) < q ≤ 1, such that s(x) is the probability that x will be selected as a pass sample. The average probability of finding pass samples (yield) is P ( m ) = ∫ s ( x ) N ( x − m ) d x {\displaystyle P(m)=\int s(x)N(x-m)\,dx} Then the theorem of GA states: For any s(x) and for any value of P < q, there always exist a Gaussian p. d. f. [ probability density function ] that is adapted for maximum dispersion. The necessary conditions for a local optimum are m = m and M proportional to M. The dual problem is also solved: P is maximized while keeping the dispersion constant (Kjellström, 1991). Proofs of the theorem may be found in the papers by Kjellström, 1970, and Kjellström & Taxén, 1981. Since dispersion is defined as the exponential of entropy/disorder/average information it immediately follows that the theorem is valid also for those concepts. Altogether, this means that Gaussian adaptation may carry out a simultaneous maximisation of yield and average information (without any need for the yield or the average information to be defined as criterion functions). The theorem is valid for all regions of acceptability and all Gaussian distributions. It may be used by cyclic repetition of random variation and selection (like the natural evolution). In every cycle a sufficiently large number of Gaussian distributed points are sampled and tested for membership in the region of acceptability. The centre of gravity of the Gaussian, m, is then moved to the centre of gravity of the approved (selected) points, m. Thus, the process converges to a state of equilibrium fulfilling the theorem. A solution is always approximate because the centre of gravity is always determined for a limited number of points. It was used for the first time in 1969 as a pure optimization algorithm making the regions of acceptability smaller and smaller (in analogy to simulated annealing, Kirkpatrick 1983). Since 1970 it has been used for both ordinary optimization and yield maximization. == Natural evolution and Gaussian adaptation == It has also been compared to the natural evolution of populations of living organisms. In this case s(x) is the probability that the individual having an array x of phenotypes will survive by giving offspring to the next generation; a definition of individual fitness given by Hartl 1981. The yield, P, is replaced by the mean fitness determined as a mean over the set of individuals in a large population. Phenotypes are often Gaussian distributed in a large population and a necessary condition for the natural evolution to be able to fulfill the theorem of Gaussian adaptation, with respect to all Gaussian quantitative characters, is that it may push the centre of gravity of the Gaussian to the centre of gravity of the selected individuals. This may be accomplished by the Hardy–Weinberg law. This is possible because the theorem of Gaussian adaptation is valid for any region of acceptability independent of the structure (Kjellström, 1996). In this case the rules of genetic variation such as crossover, inversion, transposition etcetera may be seen as random number generators for the phenotypes. So, in this sense Gaussian adaptation may be seen as a genetic algorithm. == How to climb a mountain == Mean fitness may be calculated provided that the distribution of parameters and the structure of the landscape is known. The real landscape is not known, but figure below shows a fictitious profile (blue) of a landscape along a line (x) in a room spanned by such parameters. The red curve is the mean based on the red bell curve at the bottom of figure. It is obtained by letting the bell curve slide along the x-axis, calculating the mean at every location. As can be seen, small peaks and pits are smoothed out. Thus, if evolution is started at A with a relatively small variance (the red bell curve), then climbing will take place on the red curve. The process may get stuck for millions of years at B or C, as long as the hollows to the right of these points remain, and the mutation rate is too small. If the mutation rate is sufficiently high, the disorder or variance may increase and the parameter(s) may become distributed like the green bell curve. Then the climbing will take place on the green curve, which is even more smoothed out. Because the hollows to the right of B and C have now disappeared, the process may continue up to the peaks at D. But of course the landscape puts a limit on the disorder or variability. Besides — dependent on the landscape — the process may become very jerky, and if the ratio between the time spent by the process at a local peak and the time of transition to the next peak is very high, it may as well look like a punctuated equilibrium as suggested by Gould (see Ridley). == Computer simulation of Gaussian adaptation == Thus far the theory only considers mean values of continuous distributions corresponding to an infinite number of individuals. In reality however, the number of individuals is always limited, which gives rise to an uncertainty in the estimation of m and M (the moment matrix of the Gaussian). And this may also affect the efficiency of the process. Unfortunately very little is known about this, at least theoretically. The implementation of normal adaptation on a computer is a fairly simple task. The adaptation of m may be done by one sample (individual) at a time, for example m(i + 1) = (1 – a) m(i) + ax where x is a pass sample, and a < 1 a suitable constant so that the inverse of a represents the number of individuals in the population. M may in principle be updated after every step y leading to a feasible point x = m + y according to: M(i + 1) = (1 – 2b) M(i) + 2byyT, where yT is the transpose of y and b << 1 is another suitable constant. In order to guarantee a suitable increase of average information, y should be normally distributed with moment matrix μ2M, where the scalar μ > 1 is used to increase average information (information entropy, disorder, diversity) at a suitable rate. But M will never be used in the calculations. Instead we use the matrix W defined by WWT = M. Thus, we have y = Wg, where g is normally distributed with the moment matrix μU, and U is the unit matrix. W and WT may be updated by the formulas W = (1 – b)W + bygT and WT = (1 – b)WT + bgyT because multiplication gives M = (1 – 2b)M + 2byyT, where terms including b2 have been neglected. Thus, M will be indirectly adapted with good approximation. In practice it will suffice to update W only W(i + 1) = (1 – b)W(i) + bygT. This is the formula used in a simple 2-dimensional model of a brain satisfying the Hebbian rule of associative learning; see the next section (Kjellström, 1996 and 1999). The figure below illustrates the effect of increased average information in a Gaussian p.d.f. used to climb a mountain Crest (the two lines represent the contour line). Both the red and green cluster have equal mean fitness, about 65%, but the green cluster has a much higher average information making the green process much more efficient. The effect of this adaptation is not very salient in a 2-dimensional case, but in a high-dimensional case, the efficiency of the search process may be increased by many orders of magnitude. == The evolution in the brain == In the brain the evolution of DNA-messages is supposed to be replaced by an evolution of signal patterns and the phenotypic landscape is replaced by a mental landscape, the complexity of which will hardly be second to the former. The metaphor with the mental landscape is based on the assumption that certain signal patterns give rise to a better well-being or performance. For instance, the control of a group of muscles leads to a better pronunciation of a word or performance of a piece of music. In this simple model it is assumed that the brain consists of interconnected components that may add, multiply and delay signal values. A nerve cell kernel may add signal values, a synapse may multiply with a constant and An axon may delay values. This is a basis of the theory of digital filters and neural networks consisting of components that may add, multiply and delay signalvalues and also of many brain models, Levine 1991. In the figure below the brain stem is supposed to deliver Gaussian distributed signal patterns. This may be possible since certai

    Read more →
  • Trazzler

    Trazzler

    Trazzler is a travel destination app that specializes in unique and local destinations. The initial concept was developed by Adam Rugel and Biz Stone in 2006 at Twitter's original offices under the name "71 miles". More than 10,000 writers and photographers have contributed and more than $350,000 in freelance contracts have been issued as a result of Trazzeler's weekly writing and photography contests. Investors in the company include SV Angel, AOL Founder Steve Case, and the Twitter founders, Evan Williams, Jack Dorsey, and Biz Stone. The company's partners are the City of Chicago, Hawaii Tourism Authority, Fairmont Hotels & Resorts, Salon.com, and Air New Zealand. Trazzler is designed for use on the iOS, Android, and Facebook.

    Read more →
  • Random projection

    Random projection

    In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. According to theoretical results, random projection preserves distances well, but empirical results are sparse. They have been applied to many natural language tasks under the name random indexing. == Dimensionality reduction == Dimensionality reduction, as the name suggests, is reducing the number of random variables using various mathematical methods from statistics and machine learning. Dimensionality reduction is often used to reduce the problem of managing and manipulating large data sets. Dimensionality reduction techniques generally use linear transformations in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions. For this purpose there are various related techniques, including: principal component analysis, linear discriminant analysis, canonical correlation analysis, discrete cosine transform, random projection, etc. Random projection is a simple and computationally efficient way to reduce the dimensionality of data by trading a controlled amount of error for faster processing times and smaller model sizes. The dimensions and distribution of random projection matrices are controlled so as to approximately preserve the pairwise distances between any two samples of the dataset. == Method == The core idea behind random projection is given in the Johnson-Lindenstrauss lemma, which states that if points in a vector space are of sufficiently high dimension, then they may be projected into a suitable lower-dimensional space in a way which approximately preserves pairwise distances between the points with high probability. In random projection, the original d {\displaystyle d} -dimensional data is projected to a k {\displaystyle k} -dimensional subspace, by multiplying on the left by a random matrix R ∈ R k × d {\displaystyle R\in \mathbb {R} ^{k\times d}} . Using matrix notation: If X d × N {\displaystyle X_{d\times N}} is the original set of N d-dimensional observations, then X k × N R P = R k × d X d × N {\displaystyle X_{k\times N}^{RP}=R_{k\times d}X_{d\times N}} is the projection of the data onto a lower k-dimensional subspace. Random projection is computationally simple: form the random matrix "R" and project the d × N {\displaystyle d\times N} data matrix X onto K dimensions of order O ( d k N ) {\displaystyle O(dkN)} . If the data matrix X is sparse with about c nonzero entries per column, then the complexity of this operation is of order O ( c k N ) {\displaystyle O(ckN)} . === Orthogonal random projection === A unit vector can be orthogonally projected to a random subspace. Let u {\displaystyle u} be the original unit vector, and let v {\displaystyle v} be its projection. The norm-squared ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as projecting a random point, uniformly sampled on the unit sphere, to its first k {\displaystyle k} coordinates. This is equivalent to sampling a random point in the multivariate gaussian distribution x ∼ N ( 0 , I d × d ) {\displaystyle x\sim {\mathcal {N}}(0,I_{d\times d})} , then normalizing it. Therefore, ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as ∑ i = 1 k x i 2 ∑ i = 1 k x i 2 + ∑ i = k + 1 d x i 2 {\displaystyle {\frac {\sum _{i=1}^{k}x_{i}^{2}}{\sum _{i=1}^{k}x_{i}^{2}+\sum _{i=k+1}^{d}x_{i}^{2}}}} , which by the chi-squared construction of the Beta distribution, has distribution Beta ⁡ ( k / 2 , ( d − k ) / 2 ) {\displaystyle \operatorname {Beta} (k/2,(d-k)/2)} , with mean k / d {\displaystyle k/d} . We have a concentration inequality P r [ | ‖ v ‖ 2 − k d | ≥ ϵ k d ] ≤ 3 exp ⁡ ( − k ϵ 2 / 64 ) {\displaystyle Pr\left[\left|\|v\|_{2}-{\frac {k}{d}}\right|\geq \epsilon {\sqrt {\frac {k}{d}}}\right]\leq 3\exp \left(-k\epsilon ^{2}/64\right)} for any ϵ ∈ ( 0 , 1 ) {\displaystyle \epsilon \in (0,1)} . === Gaussian random projection === The random matrix R can be generated using a Gaussian distribution. The first row is a random unit vector uniformly chosen from S d − 1 {\displaystyle S^{d-1}} . The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, and the following properties are satisfied: Spherical symmetry: For any orthogonal matrix A ∈ O ( d ) {\displaystyle A\in O(d)} , RA and R have the same distribution. Orthogonality: The rows of R are orthogonal to each other. Normality: The rows of R are unit-length vectors. === More computationally efficient random projections === Achlioptas has shown that the random matrix can be sampled more efficiently. Either the full matrix can be sampled IID according to R i , j = 3 / k × { + 1 with probability 1 6 0 with probability 2 3 − 1 with probability 1 6 {\displaystyle R_{i,j}={\sqrt {3/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{6}}\\0&{\text{with probability }}{\frac {2}{3}}\\-1&{\text{with probability }}{\frac {1}{6}}\end{cases}}} or the full matrix can be sampled IID according to R i , j = 1 / k × { + 1 with probability 1 2 − 1 with probability 1 2 {\displaystyle R_{i,j}={\sqrt {1/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{2}}\\-1&{\text{with probability }}{\frac {1}{2}}\end{cases}}} Both are efficient for database applications because the computations can be performed using integer arithmetic. More related study is conducted in. It was later shown how to use integer arithmetic while making the distribution even sparser, having very few nonzeroes per column, in work on the Sparse JL Transform. This is advantageous since a sparse embedding matrix means being able to project the data to lower dimension even faster. === Random Projection with Quantization === Random projection can be further condensed by quantization (discretization), with 1-bit (sign random projection) or multi-bits. It is the building block of SimHash, RP tree, and other memory efficient estimation and learning methods. == Large quasiorthogonal bases == The Johnson-Lindenstrauss lemma states that large sets of vectors in a high-dimensional space can be linearly mapped in a space of much lower (but still high) dimension n with approximate preservation of distances. One of the explanations of this effect is the exponentially high quasiorthogonal dimension of n-dimensional Euclidean space. There are exponentially large (in dimension n) sets of almost orthogonal vectors (with small value of inner products) in n–dimensional Euclidean space. This observation is useful in indexing of high-dimensional data. Quasiorthogonality of large random sets is important for methods of random approximation in machine learning. In high dimensions, exponentially large numbers of randomly and independently chosen vectors from equidistribution on a sphere (and from many other distributions) are almost orthogonal with probability close to one. This implies that in order to represent an element of such a high-dimensional space by linear combinations of randomly and independently chosen vectors, it may often be necessary to generate samples of exponentially large length if we use bounded coefficients in linear combinations. On the other hand, if coefficients with arbitrarily large values are allowed, the number of randomly generated elements that are sufficient for approximation is even less than dimension of the data space. == Implementations == RandPro - An R package for random projection sklearn.random_projection - A module for random projection from the scikit-learn Python library Weka implementation [1]

    Read more →
  • Genetic representation

    Genetic representation

    In computer programming, genetic representation is a way of presenting solutions/individuals in evolutionary computation methods. The term encompasses both the concrete data structures and data types used to realize the genetic material of the candidate solutions in the form of a genome, and the relationships between search space and problem space. In the simplest case, the search space corresponds to the problem space (direct representation). The choice of problem representation is tied to the choice of genetic operators, both of which have a decisive effect on the efficiency of the optimization. Genetic representation can encode appearance, behavior, physical qualities of individuals. Difference in genetic representations is one of the major criteria drawing a line between known classes of evolutionary computation. Terminology is often analogous with natural genetics. The block of computer memory that represents one candidate solution is called an individual. The data in that block is called a chromosome. Each chromosome consists of genes. The possible values of a particular gene are called alleles. A programmer may represent all the individuals of a population using binary encoding, permutational encoding, encoding by tree, or any one of several other representations. == Representations in some popular evolutionary algorithms == Genetic algorithms (GAs) are typically linear representations; these are often, but not always, binary. Holland's original description of GA used arrays of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size. This facilitates simple crossover operation. Depending on the application, variable-length representations have also been successfully used and tested in evolutionary algorithms (EA) in general and genetic algorithms in particular, although the implementation of crossover is more complex in this case. Evolution strategy uses linear real-valued representations, e.g., an array of real values. It uses mostly gaussian mutation and blending/averaging crossover. Genetic programming (GP) pioneered tree-like representations and developed genetic operators suitable for such representations. Tree-like representations are used in GP to represent and evolve functional programs with desired properties. Human-based genetic algorithm (HBGA) offers a way to avoid solving hard representation problems by outsourcing all genetic operators to outside agents, in this case, humans. The algorithm has no need for knowledge of a particular fixed genetic representation as long as there are enough external agents capable of handling those representations, allowing for free-form and evolving genetic representations. === Common genetic representations === binary array integer or real-valued array binary tree natural language parse tree directed graph == Distinction between search space and problem space == Analogous to biology, EAs distinguish between problem space (corresponds to phenotype) and search space (corresponds to genotype). The problem space contains concrete solutions to the problem being addressed, while the search space contains the encoded solutions. The mapping from search space to problem space is called genotype-phenotype mapping. The genetic operators are applied to elements of the search space, and for evaluation, elements of the search space are mapped to elements of the problem space via genotype-phenotype mapping. == Relationships between search space and problem space == The importance of an appropriate choice of search space for the success of an EA application was recognized early on. The following requirements can be placed on a suitable search space and thus on a suitable genotype-phenotype mapping: === Completeness === All possible admissible solutions must be contained in the search space. === Redundancy === When more possible genotypes exist than phenotypes, the genetic representation of the EA is called redundant. In nature, this is termed a degenerate genetic code. In the case of a redundant representation, neutral mutations are possible. These are mutations that change the genotype but do not affect the phenotype. Thus, depending on the use of the genetic operators, there may be phenotypically unchanged offspring, which can lead to unnecessary fitness determinations, among other things. Since the evaluation in real-world applications usually accounts for the lion's share of the computation time, it can slow down the optimization process. In addition, this can cause the population to have higher genotypic diversity than phenotypic diversity, which can also hinder evolutionary progress. In biology, the Neutral Theory of Molecular Evolution states that this effect plays a dominant role in natural evolution. This has motivated researchers in the EA community to examine whether neutral mutations can improve EA functioning by giving populations that have converged to a local optimum a way to escape that local optimum through genetic drift. This is discussed controversially and there are no conclusive results on neutrality in EAs. On the other hand, there are other proven measures to handle premature convergence. === Locality === The locality of a genetic representation corresponds to the degree to which distances in the search space are preserved in the problem space after genotype-phenotype mapping. That is, a representation has a high locality exactly when neighbors in the search space are also neighbors in the problem space. In order for successful schemata not to be destroyed by genotype-phenotype mapping after a minor mutation, the locality of a representation must be high. === Scaling === In genotype-phenotype mapping, the elements of the genotype can be scaled (weighted) differently. The simplest case is uniform scaling: all elements of the genotype are equally weighted in the phenotype. A common scaling is exponential. If integers are binary coded, the individual digits of the resulting binary number have exponentially different weights in representing the phenotype. Example: The number 90 is written in binary (i.e., in base two) as 1011010. If now one of the front digits is changed in the binary notation, this has a significantly greater effect on the coded number than any changes at the rear digits (the selection pressure has an exponentially greater effect on the front digits). For this reason, exponential scaling has the effect of randomly fixing the "posterior" locations in the genotype before the population gets close enough to the optimum to adjust for these subtleties. == Hybridization and repair in genotype-phenotype mapping == When mapping the genotype to the phenotype being evaluated, domain-specific knowledge can be used to improve the phenotype and/or ensure that constraints are met. This is a commonly used method to improve EA performance in terms of runtime and solution quality. It is illustrated below by two of the three examples. == Examples == === Example of a direct representation === An obvious and commonly used encoding for the traveling salesman problem and related tasks is to number the cities to be visited consecutively and store them as integers in the chromosome. The genetic operators must be suitably adapted so that they only change the order of the cities (genes) and do not cause deletions or duplications. Thus, the gene order corresponds to the city order and there is a simple one-to-one mapping. === Example of a complex genotype-phenotype mapping === In a scheduling task with heterogeneous and partially alternative resources to be assigned to a set of subtasks, the genome must contain all necessary information for the individual scheduling operations or it must be possible to derive them from it. In addition to the order of the subtasks to be executed, this includes information about the resource selection. A phenotype then consists of a list of subtasks with their start times and assigned resources. In order to be able to create this, as many allocation matrices must be created as resources can be allocated to one subtask at most. In the simplest case this is one resource, e.g., one machine, which can perform the subtask. An allocation matrix is a two-dimensional matrix, with one dimension being the available time units and the other being the resources to be allocated. Empty matrix cells indicate availability, while an entry indicates the number of the assigned subtask. The creation of allocation matrices ensures firstly that there are no inadmissible multiple allocations. Secondly, the start times of the subtasks can be read from it as well as the assigned resources. A common constraint when scheduling resources to subtasks is that a resource can only be allocated once per time unit and that the reservation must be for a contiguous period of time. To achieve this in a timely manner, which is a c

    Read more →