AI Code Generator Zzz

AI Code Generator Zzz — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • CAMeL-View TestRig

    CAMeL-View TestRig

    CAMeL-View is a software application, which is used for the model based design of mechatronic systems (multi-body simulation, block diagrams, pneumatic systems, hydraulic systems, general simulation, linear analysis and Hardware-in-the-Loop). CAMeL-View enables object-oriented model creation of mechatronic systems through the use of graphic blocks. The basic elements of multi-body system dynamics, control technology, hydraulics and hardware connectivity support the modeling process. The user’s proprietary C-Code can also be integrated into the models, which allows CAMeL-View TestRig to be implemented in all phases of the model based design process ( modeling, physical testing and prototyping), and lends itself especially well to mechatronic system design. The model’s structure is described and displayed with the help of directional connectors. Physical connections (such as mechanical or hydraulic linkages) as well as input and output connections (signal flow) are also available. The input of equations is done via mathematical expressions, e.g. the input of constitutive differential equations in vector and matrix form. Based on the model’s structure, the descriptive equations are converted into non-linear state space representations and converted into executable C-Code. CAMeL-View supports the simulation process with a configurable “experiment environment” (for simulator and instrumentation components) which allows the user to apply simulation models to supported targets (MPC5200, TriCore, X86, etc.) without the need for additional software tools for Hardware-in-the-Loop applications. In addition, the generation of so-called S-Functions for use in Simulink and the generation of ANSI C-Code for use in stand-alone simulators is also supported. A particularly noteworthy feature in CAMeL-View TestRig is the way in which the descriptive equations for multi-body system models are created. All multi-body simulation formalisms used for code generation create their equations in the form of typical explicit differential equations (ODE). This is especially important in Hardware-in-the-Loop applications where the calculation of simulation results within a specific, defined time frame must be assured. Only then is it possible to implement complex multi-body simulation models for Hardware-in-the-Loop applications under stringent real-time conditions. These constraints cannot be met when using DAE-based methods. Additional Toolboxes are available for linear analysis (Eigenvalues, pole-zero analysis, frequency response, etc.) of VRML-based animation. Development of CAMeL-View began in 1991 in the Paderborn Mechatronic Laboratory of Professor Dr. Ing. J. Lückel. The software was based on predecessors that had been developed there since 1986. The name stands for Computer Aided Mechatronic Laboratory – Virtual Engineering Workbench and describes the basic intent of one of the specific demands placed on development engineers in the computer lab.

    Read more →
  • Spiking neural network

    Spiking neural network

    Spiking neural networks (SNNs) are artificial neural networks (ANN) that mimic natural neural networks. These models leverage timing of discrete spikes as the main information carrier. In addition to neuronal and synaptic state, SNNs incorporate the concept of time into their operating model. The idea is that neurons in the SNN do not transmit information at each propagation cycle (as it happens with typical multi-layer perceptron networks), but rather transmit information only when a membrane potential—an intrinsic quality of the neuron related to its membrane electrical charge—reaches a specific value, called the threshold. When the membrane potential reaches the threshold, the neuron fires, and generates a signal that travels to other neurons which, in turn, increase or decrease their potentials in response to this signal. A neuron model that fires at the moment of threshold crossing is also called a spiking neuron model. While spike rates can be considered the analogue of the variable output of a traditional ANN, neurobiology research indicated that high speed processing cannot be performed solely through a rate-based scheme. For example humans can perform an image recognition task requiring no more than 10ms of processing time per neuron through the successive layers (going from the retina to the temporal lobe). This time window is too short for rate-based encoding. The precise spike timings in a small set of spiking neurons also has a higher information coding capacity compared with a rate-based approach. The most prominent spiking neuron model is the leaky integrate-and-fire model. In that model, the momentary activation level (modeled as a differential equation) is normally considered to be the neuron's state, with incoming spikes pushing this value higher or lower, until the state eventually either decays or—if the firing threshold is reached—the neuron fires. After firing, the state variable is reset to a lower value. Various decoding methods exist for interpreting the outgoing spike train as a real-value number, relying on either the frequency of spikes (rate-code), the time-to-first-spike after stimulation, or the interval between spikes. == History == Many multi-layer artificial neural networks are fully connected, receiving input from every neuron in the previous layer and signalling every neuron in the subsequent layer. Although these networks have achieved breakthroughs, they do not match biological networks and do not mimic neurons. The biology-inspired Hodgkin–Huxley model of a spiking neuron was proposed in 1952. This model described how action potentials are initiated and propagated. Communication between neurons, which requires the exchange of chemical neurotransmitters in the synaptic gap, is described in models such as the integrate-and-fire model, FitzHugh–Nagumo model (1961–1962), and Hindmarsh–Rose model (1984). The leaky integrate-and-fire model (or a derivative) is commonly used as it is easier to compute than Hodgkin–Huxley. While the notion of an artificial spiking neural network became popular only in the twenty-first century, studies between 1980 and 1995 supported the concept. The first models of this type of ANN appeared to simulate non-algorithmic intelligent information processing systems. However, the notion of the spiking neural network as a mathematical model was first worked on in the early 1970s. As of 2019 SNNs lagged behind ANNs in accuracy, but the gap is decreasing, and has vanished on some tasks. == Underpinnings == Information in the brain is represented as action potentials (neuron spikes), which may group into spike trains or coordinated waves. A fundamental question of neuroscience is to determine whether neurons communicate by a rate or temporal code. Temporal coding implies that a single spiking neuron can replace hundreds of hidden units on a conventional neural net. SNNs define a neuron's current state as its potential (possibly modeled as a differential equation). An input pulse causes the potential to rise and then gradually decline. Encoding schemes can interpret these pulse sequences as a number, considering pulse frequency and pulse interval. Using the precise time of pulse occurrence, a neural network can consider more information and offer better computing properties. SNNs compute in the continuous domain. Such neurons test for activation only when their potentials reach a certain value. When a neuron is activated, it produces a signal that is passed to connected neurons, accordingly raising or lowering their potentials. The SNN approach produces a continuous output instead of the binary output of traditional ANNs. Pulse trains are not easily interpretable, hence the need for encoding schemes. However, a pulse train representation may be more suited for processing spatiotemporal data (or real-world sensory data classification). SNNs connect neurons only to nearby neurons so that they process input blocks separately (similar to CNN using filters). They consider time by encoding information as pulse trains so as not to lose information. This avoids the complexity of a recurrent neural network (RNN). Impulse neurons are more powerful computational units than traditional artificial neurons. SNNs are theoretically more powerful than so called "second-generation networks" defined as ANNs "based on computational units that apply activation function with a continuous set of possible output values to a weighted sum (or polynomial) of the inputs"; however, SNN training issues and hardware requirements limit their use. Although unsupervised biologically inspired learning methods are available such as Hebbian learning and STDP, no effective supervised training method is suitable for SNNs that can provide better performance than second-generation networks. Spike-based activation of SNNs is not differentiable, thus gradient descent-based backpropagation (BP) is not available. SNNs have much larger computational costs for simulating realistic neural models than traditional ANNs. Pulse-coupled neural networks (PCNN) are often confused with SNNs. A PCNN can be seen as a kind of SNN. Researchers are actively working on various topics. The first concerns differentiability. The expressions for both the forward- and backward-learning methods contain the derivative of the neural activation function which is not differentiable because a neuron's output is either 1 when it spikes, and 0 otherwise. This all-or-nothing behavior disrupts gradients and makes these neurons unsuitable for gradient-based optimization. Approaches to resolving it include: resorting to entirely biologically inspired local learning rules for the hidden units translating conventionally trained "rate-based" NNs to SNNs smoothing the network model to be continuously differentiable defining an SG (Surrogate Gradient) as a continuous relaxation of the real gradients The second concerns the optimization algorithm. Standard BP can be expensive in terms of computation, memory, and communication and may be poorly suited to the hardware that implements it (e.g., a computer, brain, or neuromorphic device). Incorporating additional neuron dynamics such as Spike Frequency Adaptation (SFA) is a notable advance, enhancing efficiency and computational power. These neurons sit between biological complexity and computational complexity. Originating from biological insights, SFA offers significant computational benefits by reducing power usage, especially in cases of repetitive or intense stimuli. This adaptation improves signal/noise clarity and introduces an elementary short-term memory at the neuron level, which in turn, improves accuracy and efficiency. This was mostly achieved using compartmental neuron models. The simpler versions are of neuron models with adaptive thresholds, are an indirect way of achieving SFA. It equips SNNs with improved learning capabilities, even with constrained synaptic plasticity, and elevates computational efficiency. This feature lessens the demand on network layers by decreasing the need for spike processing, thus lowering computational load and memory access time—essential aspects of neural computation. Moreover, SNNs utilizing neurons capable of SFA achieve levels of accuracy that rival those of conventional ANNs, while also requiring fewer neurons for comparable tasks. This efficiency streamlines the computational workflow and conserves space and energy, while maintaining technical integrity. High-performance deep spiking neural networks can operate with 0.3 spikes per neuron. == Applications == SNNs can in principle be applied to the same applications as traditional ANNs. In addition, SNNs can model the central nervous system of biological organisms, such as an insect seeking food without prior knowledge of the environment. Due to their relative realism, they can be used to study biological neural circuits. Starting with a hypothesis about the topology of a biological neuronal circuit and its functi

    Read more →
  • Multiclass classification

    Multiclass classification

    In machine learning and statistical classification, multiclass classification or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary classification). For example, deciding on whether an image is showing a banana, peach, orange, or an apple is a multiclass classification problem, with four possible classes (banana, peach, orange, apple), while deciding on whether an image contains an apple or not is a binary classification problem (with the two possible classes being: apple, no apple). While many classification algorithms (e.g., decision trees, k-NN, neural networks and multinomial logistic regression) naturally permit the use of more than two classes, some are by nature binary algorithms (e.g., classical binary support vector machine) and require decomposition strategies such as one-vs-all, one-vs-one, or ECOC to solve multiclass problems. Multiclass classification should not be confused with multi-label classification, where multiple labels are to be predicted for each instance (e.g., predicting that an image contains both an apple and an orange, in the previous example). == Better-than-random multiclass models == From the confusion matrix of a multiclass model, we can determine whether a model does better than chance. Let K ≥ 3 {\displaystyle K\geq 3} be the number of classes, O {\displaystyle {\mathcal {O}}} a set of observations, y ^ : O → { 1 , . . . , K } {\displaystyle {\hat {y}}:{\mathcal {O}}\to \{1,...,K\}} a model of the target variable y : O → { 1 , . . . , K } {\displaystyle y:{\mathcal {O}}\to \{1,...,K\}} and n i , j {\displaystyle n_{i,j}} be the number of observations in the set { y = i } ∩ { y ^ = j } {\displaystyle \{y=i\}\cap \{{\hat {y}}=j\}} . We note n i . = ∑ j n i , j {\displaystyle n_{i.}=\sum _{j}n_{i,j}} , n . j = ∑ i n i , j {\displaystyle n_{.j}=\sum _{i}n_{i,j}} , n = ∑ j n . j = ∑ i n i . {\displaystyle n=\sum _{j}n_{.j}=\sum _{i}n_{i.}} , λ i = n i . n {\displaystyle \lambda _{i}={\frac {n_{i.}}{n}}} and μ j = n . j n {\displaystyle \mu _{j}={\frac {n_{.j}}{n}}} . It is assumed that the confusion matrix ( n i , j ) i , j {\displaystyle (n_{i,j})_{i,j}} contains at least one non-zero entry in each row, that is λ i > 0 {\displaystyle \lambda _{i}>0} for any i {\displaystyle i} . Finally we call "normalized confusion matrix" the matrix of conditional probabilities ( P ( y ^ = j ∣ y = i ) ) i , j = ( n i , j n i . ) i , j {\displaystyle (\mathbb {P} ({\hat {y}}=j\mid y=i))_{i,j}=\left({\frac {n_{i,j}}{n_{i.}}}\right)_{i,j}} . === Intuitive explanation === The lift is a way of measuring the deviation from independence of two events A {\displaystyle A} and B {\displaystyle B} : L i f t ( A , B ) = P ( A ∩ B ) P ( A ) P ( B ) = P ( A ∣ B ) P ( A ) = P ( B ∣ A ) P ( B ) {\displaystyle \mathrm {Lift} (A,B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (A)\mathbb {P} (B)}}={\frac {\mathbb {P} (A\mid B)}{\mathbb {P} (A)}}={\frac {\mathbb {P} (B\mid A)}{\mathbb {P} (B)}}} We have L i f t ( A , B ) > 1 {\displaystyle \mathrm {Lift} (A,B)>1} if and only if events A {\displaystyle A} and B {\displaystyle B} occur simultaneously with a greater probability than if they were independent. In other words, if one of the two events occurs, the probability of observing the other event increases. A first condition to satisfy is to have L i f t ( y = i , y ^ = i ) ≥ 1 {\displaystyle \mathrm {Lift} (y=i,{\hat {y}}=i)\geq 1} for any i {\displaystyle i} . And the quality of a model (better or worse than chance) does not change if we over- or undersample the dataset, that is if we multiply each row R i {\displaystyle R_{i}} of the confusion matrix by a constant c i {\displaystyle c_{i}} . Thus the second condition is that the necessary and sufficient conditions for doing better than chance need only depend on the normalized confusion matrix. The condition on lifts can be reformulated with One versus Rest binary models : for any i {\displaystyle i} , we define the binary target variable y i {\displaystyle y_{i}} which is the indicator of event { y = i } {\displaystyle \{y=i\}} , and the binary model y ^ i {\displaystyle {\hat {y}}_{i}} of y i {\displaystyle y_{i}} which is the indicator of event { y ^ = i } {\displaystyle \{{\hat {y}}=i\}} . Each of the y ^ i {\displaystyle {\hat {y}}_{i}} models is a "One versus Rest" model. L i f t ( y = i , y ^ = i ) {\displaystyle \mathrm {Lift} (y=i,{\hat {y}}=i)} only depends on the events { y = i } {\displaystyle \{y=i\}} and { y ^ = i } {\displaystyle \{{\hat {y}}=i\}} , so merging or not merging the other classes doesn't change its value. We therefore have L i f t ( y = i , y ^ = i ) = L i f t ( y i = 1 , y ^ i = 1 ) {\displaystyle \mathrm {Lift} (y=i,{\hat {y}}=i)=\mathrm {Lift} (y_{i}=1,{\hat {y}}_{i}=1)} and the first condition is that all binary One versus Rest models are better than chance. ==== Example ==== If K = 2 {\displaystyle K=2} and 2 is the class of interest , the normalized confusion matrix is ( s p e c i f i c i t y 1 − s p e c i f i c i t y 1 − s e n s i t i v i t y s e n s i t i v i t y ) {\displaystyle {\begin{pmatrix}\mathrm {specificity} &1-\mathrm {specificity} \\1-\mathrm {sensitivity} &\mathrm {sensitivity} \end{pmatrix}}} and we have L i f t ( y = 1 , y ^ = 1 ) − 1 = P ( y = y ^ = 1 ) λ 1 μ 1 − 1 = n 1 , 1 n n 1. n .1 − 1 {\displaystyle \mathrm {Lift} (y=1,{\hat {y}}=1)-1={\frac {\mathbb {P} (y={\hat {y}}=1)}{\lambda _{1}\mu _{1}}}-1={\frac {n_{1,1}n}{n_{1.}n_{.1}}}-1} = n 1 , 1 ( n 1 , 1 + n 1 , 2 + n 2 , 1 + n 2 , 2 ) − ( n 1 , 1 + n 1 , 2 ) ( n 1 , 1 + n 2 , 1 ) n 1. n .1 = n 1 , 1 n 2 , 2 − n 1 , 2 n 2 , 1 n 1. n .1 {\displaystyle ={\frac {n_{1,1}(n_{1,1}+n_{1,2}+n_{2,1}+n_{2,2})-(n_{1,1}+n_{1,2})(n_{1,1}+n_{2,1})}{n_{1.}n_{.1}}}={\frac {n_{1,1}n_{2,2}-n_{1,2}n_{2,1}}{n_{1.}n_{.1}}}} . Thus L i f t ( y = 1 , y ^ = 1 ) ≥ 1 ⟺ n 1 , 1 n 2 , 2 − n 1 , 2 n 2 , 1 ≥ 0 {\displaystyle \mathrm {Lift} (y=1,{\hat {y}}=1)\geq 1\iff n_{1,1}n_{2,2}-n_{1,2}n_{2,1}\geq 0} . Similarly, by swapping the roles of 1 and 2, we find that L i f t ( y = 2 , y ^ = 2 ) ≥ 1 ⟺ n 1 , 1 n 2 , 2 − n 1 , 2 n 2 , 1 ≥ 0 {\displaystyle \mathrm {Lift} (y=2,{\hat {y}}=2)\geq 1\iff n_{1,1}n_{2,2}-n_{1,2}n_{2,1}\geq 0} . Dividing by n 1. n 2. {\displaystyle n_{1.}n_{2.}} we find that the necessary and sufficient condition on the normalized confusion matrix is s e n s i t i v i t y s p e c i f i c i t y − ( 1 − s e n s i t i v i t y ) ( 1 − s p e c i f i c i t y ) ≥ 0 ⟺ s e n s i t i v i t y + s p e c i f i c i t y − 1 ≥ 0 ⟺ J ≥ 0 {\displaystyle \mathrm {sensitivity} \ \mathrm {specificity} -(1-\mathrm {sensitivity} )(1-\mathrm {specificity} )\geq 0\iff \mathrm {sensitivity} +\mathrm {specificity} -1\geq 0\iff J\geq 0} . This brings us back to the classical binary condition: Youden's J must be positive (or zero for random models). === Random models === A random model is a model that is independent of the target variable. This property is easily reformulated with the confusion matrix. This proposition shows that the model y ^ {\displaystyle {\hat {y}}} of y {\displaystyle y} is uninformative if and only if there are two families of numbers ( α i ) i {\displaystyle (\alpha _{i})_{i}} and ( β j ) j {\displaystyle (\beta _{j})_{j}} such that P ( { y = i } ∩ { y ^ = j } ) = α i β j {\displaystyle \mathbb {P} (\{y=i\}\cap \{{\hat {y}}=j\})=\alpha _{i}\beta _{j}} for any i {\displaystyle i} and j {\displaystyle j} . === Multiclass likelihood ratios and diagnostic odds ratios === We define generalized likelihood ratios calculated from the normalized confusion matrix: for any i {\displaystyle i} and j ≠ i {\displaystyle j\not =i} , let L R i , j = P ( y ^ = j ∣ y = j ) P ( y ^ = j ∣ y = i ) {\displaystyle \mathrm {LR} _{i,j}={\frac {\mathbb {P} ({\hat {y}}=j\mid y=j)}{\mathbb {P} ({\hat {y}}=j\mid y=i)}}} . When K = 2 {\displaystyle K=2} , if 2 is the class of interest,, we find the classical likelihood ratios L R 1 , 2 = L R + {\displaystyle \mathrm {LR} _{1,2}=\mathrm {LR} _{+}} and L R 2 , 1 = 1 L R − {\displaystyle \mathrm {LR} _{2,1}={\frac {1}{\mathrm {LR} _{-}}}} . Multiclass diagnostic odds ratios can also be defined using the formula D O R i , j = D O R j , i = L R i , j L R j , i = n i , i n j , j n i , j n j , i = P ( y ^ = j ∣ y = j ) / P ( y ^ = i ∣ y = j ) P ( y ^ = j ∣ y = i ) / P ( y ^ = i ∣ y = i ) {\displaystyle \mathrm {DOR} _{i,j}=\mathrm {DOR} _{j,i}=\mathrm {LR} _{i,j}\mathrm {LR} _{j,i}={\frac {n_{i,i}n_{j,j}}{n_{i,j}n_{j,i}}}={\frac {\mathbb {P} ({\hat {y}}=j\mid y=j)/\mathbb {P} ({\hat {y}}=i\mid y=j)}{\mathbb {P} ({\hat {y}}=j\mid y=i)/\mathbb {P} ({\hat {y}}=i\mid y=i)}}} We saw above that a better-than-chance model (or a random model) must verify L i f t ( y = i , y ^ = i ) ≥ 1 {\displaystyle \mathrm {Lift} (y=i,{\hat {y}}=i)\geq 1} for any i {\displaystyle i} and λ i {\displaystyle \lambda _{i}} . According to the previous corollary, likelihood ratios are thus greater

    Read more →
  • Neural Networks (journal)

    Neural Networks (journal)

    Neural Networks is a monthly peer-reviewed scientific journal and an official journal of the International Neural Network Society, European Neural Network Society, and Japanese Neural Network Society. == History == The journal was established in 1988 and is published by Elsevier. It covers all aspects of research on artificial neural networks. The founding editor-in-chief was Stephen Grossberg (Boston University). The current editors-in-chief are DeLiang Wang (Ohio State University) and Taro Toyoizumi (RIKEN Center for Brain Science). == Abstracting and indexing == The journal is abstracted and indexed in Scopus and the Science Citation Index Expanded. According to the Journal Citation Reports, the journal has a 2022 impact factor of 7.8.

    Read more →
  • Visualization (graphics)

    Visualization (graphics)

    Visualization (or visualisation in Commonwealth English; see spelling differences), also known as graphics visualization, is any technique for creating images, diagrams, or animations to communicate a message. Visualization through visual imagery has been an effective way to communicate both abstract and concrete ideas since the dawn of humanity. Examples from history include cave paintings, Egyptian hieroglyphs, Greek geometry, and Leonardo da Vinci's revolutionary methods of technical drawing for engineering purposes that actively involve scientific requirements. Visualization today has ever-expanding applications in science, education, engineering (e.g., product visualization), interactive multimedia, medicine, etc. Typical of a visualization application is the field of computer graphics. The invention of computer graphics (and 3D computer graphics) may be the most important development in visualization since the invention of central perspective in the Renaissance period. The development of animation also helped advance visualization. == Overview == The use of visualization to present information is not a new phenomenon. It has been used in maps, scientific drawings, and data plots for over a thousand years. Examples from cartography include Ptolemy's Geographia (2nd century AD), a map of China (1137 AD), and Minard's map (1861) of Napoleon's invasion of Russia a century and a half ago. Most of the concepts learned in devising these images carry over in a straightforward manner to computer visualization. Edward Tufte has written three critically acclaimed books that explain many of these principles. Computer graphics has from its beginning been used to study scientific problems. However, in its early days the lack of graphics power often limited its usefulness. The recent emphasis on visualization started in 1987 with the publication of Visualization in Scientific Computing, a special issue of Computer Graphics. Since then, there have been several conferences and workshops, co-sponsored by the IEEE Computer Society and ACM SIGGRAPH, devoted to the general topic, and special areas in the field, for example volume visualization. Most people are familiar with the digital animations produced to present meteorological data during weather reports on television, though few can distinguish between those models of reality and the satellite photos that are also shown on such programs. TV also offers scientific visualizations when it shows computer drawn and animated reconstructions of road or airplane accidents. Some of the most popular examples of scientific visualizations are computer-generated images that show real spacecraft in action, out in the void far beyond Earth, or on other planets. Dynamic forms of visualization, such as educational animation or timelines, have the potential to enhance learning about systems that change over time. Apart from the distinction between interactive visualizations and animation, the most useful categorization is probably between abstract and model-based scientific visualizations. The abstract visualizations show completely conceptual constructs in 2D or 3D. These generated shapes are completely arbitrary. The model-based visualizations either place overlays of data on real or digitally constructed images of reality or make a digital construction of a real object directly from the scientific data. Scientific visualization is usually done with specialized software, though there are a few exceptions, noted below. Some of these specialized programs have been released as open source software, having very often its origins in universities, within an academic environment where sharing software tools and giving access to the source code is common. There are also many proprietary software packages of scientific visualization tools. Models and frameworks for building visualizations include the data flow models popularized by systems such as AVS, IRIS Explorer, and VTK toolkit, and data state models in spreadsheet systems such as the Spreadsheet for Visualization and Spreadsheet for Images. == Applications == === Scientific visualization === As a subject in computer science, scientific visualization is the use of interactive, sensory representations, typically visual, of abstract data to reinforce cognition, hypothesis building, and reasoning. Scientific visualization is the transformation, selection, or representation of data from simulations or experiments, with an implicit or explicit geometric structure, to allow the exploration, analysis, and understanding of the data. Scientific visualization focuses and emphasizes the representation of higher order data using primarily graphics and animation techniques. It is a very important part of visualization and maybe the first one, as the visualization of experiments and phenomena is as old as science itself. Traditional areas of scientific visualization are flow visualization, medical visualization, astrophysical visualization, and chemical visualization. There are several different techniques to visualize scientific data, with isosurface reconstruction and direct volume rendering being the more common. === Data and information visualization === Data visualization is a related subcategory of visualization dealing with statistical graphics and geospatial data (as in thematic cartography) that is abstracted in schematic form. Information visualization concentrates on the use of computer-supported tools to explore large amount of abstract data. The term "information visualization" was originally coined by the User Interface Research Group at Xerox PARC and included Jock Mackinlay. Practical application of information visualization in computer programs involves selecting, transforming, and representing abstract data in a form that facilitates human interaction for exploration and understanding. Important aspects of information visualization are dynamics of visual representation and the interactivity. Strong techniques enable the user to modify the visualization in real-time, thus affording unparalleled perception of patterns and structural relations in the abstract data in question. === Educational visualization === Educational visualization is using a simulation to create an image of something so it can be taught about. This is very useful when teaching about a topic that is difficult to otherwise see, for example, atomic structure, because atoms are far too small to be studied easily without expensive and difficult to use scientific equipment. === Knowledge visualization === The use of visual representations to transfer knowledge between at least two persons aims to improve the transfer of knowledge by using computer and non-computer-based visualization methods complementarily. Thus properly designed visualization is an important part of not only data analysis but knowledge transfer process, too. Knowledge transfer may be significantly improved using hybrid designs as it enhances information density but may decrease clarity as well. For example, visualization of a 3D scalar field may be implemented using iso-surfaces for field distribution and textures for the gradient of the field. Examples of such visual formats are sketches, diagrams, images, objects, interactive visualizations, information visualization applications, and imaginary visualizations as in stories. While information visualization concentrates on the use of computer-supported tools to derive new insights, knowledge visualization focuses on transferring insights and creating new knowledge in groups. Beyond the mere transfer of facts, knowledge visualization aims to further transfer insights, experiences, attitudes, values, expectations, perspectives, opinions, and estimates in different fields by using various complementary visualizations. See also: picture dictionary, visual dictionary === Product visualization === Product visualization involves visualization software technology for the viewing and manipulation of 3D models, technical drawing and other related documentation of manufactured components and large assemblies of products. It is a key part of product lifecycle management. Product visualization software typically provides high levels of photorealism so that a product can be viewed before it is actually manufactured. This supports functions ranging from design and styling to sales and marketing. Technical visualization is an important aspect of product development. Originally technical drawings were made by hand, but with the rise of advanced computer graphics the drawing board has been replaced by computer-aided design (CAD). CAD-drawings and models have several advantages over hand-made drawings such as the possibility of 3-D modeling, rapid prototyping, and simulation. 3D product visualization promises more interactive experiences for online shoppers, but also challenges retailers to overcome hurdles in the production of 3D content, as large-scale 3D content production can be extremel

    Read more →
  • Inverted pendulum

    Inverted pendulum

    An inverted pendulum is a pendulum that has its center of mass above its pivot point. It is unstable and falls over without additional help. It can be suspended stably in this inverted position by using a control system to monitor the angle of the pole and move the pivot point horizontally back under the center of mass when it starts to fall over, keeping it balanced. The inverted pendulum is a classic problem in dynamics and control theory and is used as a benchmark for testing control strategies. It is often implemented with the pivot point mounted on a cart that can move horizontally under control of an electronic servo system as shown in the photo; this is called a cart and pole apparatus. Most applications limit the pendulum to 1 degree of freedom by affixing the pole to an axis of rotation. Whereas a normal pendulum is stable when hanging downward, an inverted pendulum is inherently unstable, and must be actively balanced in order to remain upright; this can be done either by applying a torque at the pivot point, by moving the pivot point horizontally as part of a feedback system, changing the rate of rotation of a mass mounted on the pendulum on an axis parallel to the pivot axis and thereby generating a net torque on the pendulum, or by oscillating the pivot point vertically. A simple demonstration of moving the pivot point in a feedback system is achieved by balancing an upturned broomstick on the end of one's finger. A second type of inverted pendulum is a tiltmeter for tall structures, which consists of a wire anchored to the bottom of the foundation and attached to a float in a pool of oil at the top of the structure that has devices for measuring movement of the neutral position of the float away from its original position. == Overview == A pendulum with its bob hanging directly below the support pivot is at a stable equilibrium point, where it remains motionless because there is no torque on the pendulum. If displaced from this position, it experiences a restoring torque that returns it toward the equilibrium position. A pendulum with its bob in an inverted position, supported on a rigid rod directly above the pivot, 180° from its stable equilibrium position, is at an unstable equilibrium point. At this point again there is no torque on the pendulum, but the slightest displacement away from this position causes a gravitation torque on the pendulum that accelerates it away from equilibrium, causing it to fall over. In order to stabilize a pendulum in this inverted position, a feedback control system can be used, which monitors the pendulum's angle and moves the position of the pivot point sideways when the pendulum starts to fall over, to keep it balanced. The inverted pendulum is a classic problem in dynamics and control theory and is widely used as a benchmark for testing control algorithms (PID controllers, state-space representation, neural networks, fuzzy control, genetic algorithms, etc.). Variations on this problem include multiple links, allowing the motion of the cart to be commanded while maintaining the pendulum, and balancing the cart-pendulum system on a see-saw. The inverted pendulum is related to rocket or missile guidance, where the center of gravity is located behind the center of drag causing aerodynamic instability. The understanding of a similar problem can be shown by simple robotics in the form of a balancing cart. Balancing an upturned broomstick on the end of one's finger is a simple demonstration, and the problem is solved by self-balancing personal transporters such as the Segway PT, the self-balancing hoverboard and the self-balancing unicycle. Another way that an inverted pendulum may be stabilized, without any feedback or control mechanism, is by oscillating the pivot rapidly up and down. This is called Kapitza's pendulum. If the oscillation is sufficiently strong (in terms of its acceleration and amplitude) then the inverted pendulum can recover from perturbations in a strikingly counterintuitive manner. If the driving point moves in simple harmonic motion, the pendulum's motion is described by the Mathieu equation. == Equations of motion == The equations of motion of inverted pendulums are dependent on what constraints are placed on the motion of the pendulum. Inverted pendulums can be created in various configurations resulting in a number of Equations of Motion describing the behavior of the pendulum. === Stationary pivot point === In a configuration where the pivot point of the pendulum is fixed in space, the equation of motion is similar to that for an uninverted pendulum. The equation of motion below assumes no friction or any other resistance to movement, a rigid massless rod, and the restriction to 2-dimensional movement. θ ¨ − g ℓ sin ⁡ θ = 0 {\displaystyle {\ddot {\theta }}-{g \over \ell }\sin \theta =0} Where θ ¨ {\displaystyle {\ddot {\theta }}} is the angular acceleration of the pendulum, g {\displaystyle g} is the standard gravity on the surface of the Earth, ℓ {\displaystyle \ell } is the length of the pendulum, and θ {\displaystyle \theta } is the angular displacement measured from the equilibrium position. When θ ¨ {\displaystyle {\ddot {\theta }}} added to both sides, it has the same sign as the angular acceleration term: θ ¨ = g ℓ sin ⁡ θ {\displaystyle {\ddot {\theta }}={g \over \ell }\sin \theta } Thus, the inverted pendulum accelerates away from the vertical unstable equilibrium in the direction initially displaced, and the acceleration is inversely proportional to the length. Tall pendulums fall more slowly than short ones. Derivation using torque and moment of inertia: The pendulum is assumed to consist of a point mass, of mass m {\displaystyle m} , affixed to the end of a massless rigid rod, of length ℓ {\displaystyle \ell } , attached to a pivot point at the end opposite the point mass. The net torque of the system must equal the moment of inertia times the angular acceleration: τ n e t = I θ ¨ {\displaystyle {\boldsymbol {\tau }}_{\mathrm {net} }=I{\ddot {\theta }}} The torque due to gravity providing the net torque: τ n e t = m g ℓ sin ⁡ θ {\displaystyle {\boldsymbol {\tau }}_{\mathrm {net} }=mg\ell \sin \theta \,\!} Where θ {\displaystyle \theta \ } is the angle measured from the inverted equilibrium position. The resulting equation: I θ ¨ = m g ℓ sin ⁡ θ {\displaystyle I{\ddot {\theta }}=mg\ell \sin \theta \,\!} The moment of inertia for a point mass: I = m R 2 {\displaystyle I=mR^{2}} In the case of the inverted pendulum the radius is the length of the rod, ℓ {\displaystyle \ell } . Substituting in I = m ℓ 2 {\displaystyle I=m\ell ^{2}} m ℓ 2 θ ¨ = m g ℓ sin ⁡ θ {\displaystyle m\ell ^{2}{\ddot {\theta }}=mg\ell \sin \theta \,\!} Mass and ℓ 2 {\displaystyle \ell ^{2}} is divided from each side resulting in: θ ¨ = g ℓ sin ⁡ θ {\displaystyle {\ddot {\theta }}={g \over \ell }\sin \theta } === Inverted pendulum on a cart === An inverted pendulum on a cart consists of a mass m {\displaystyle m} at the top of a pole of length ℓ {\displaystyle \ell } pivoted on a horizontally moving base as shown in the adjacent image. The cart is restricted to linear motion and is subject to forces resulting in or hindering motion. === Essentials of stabilization === The essentials of stabilizing the inverted pendulum can be summarized qualitatively in three steps. 1. If the tilt angle θ {\displaystyle \theta } is to the right, the cart must accelerate to the right and vice versa. 2. The position of the cart x {\displaystyle x} relative to track center is stabilized by slightly modulating the null angle (the angle error that the control system tries to null) by the position of the cart, that is, null angle = θ + k x {\displaystyle =\theta +kx} where k {\displaystyle k} is small. This makes the pole want to lean slightly toward track center and stabilize at track center where the tilt angle is exactly vertical. Any offset in the tilt sensor or track slope that would otherwise cause instability translates into a stable position offset. A further added offset gives position control. 3. A normal pendulum subject to a moving pivot point such as a load lifted by a crane, has a peaked response at the pendulum radian frequency of ω p = g / ℓ {\displaystyle \omega _{p}={\sqrt {g/\ell }}} . To prevent uncontrolled swinging, the frequency spectrum of the pivot motion should be suppressed near ω p {\displaystyle \omega _{p}} . The inverted pendulum requires the same suppression filter to achieve stability. As a consequence of the null angle modulation strategy, the position feedback is positive, that is, a sudden command to move right produces an initial cart motion to the left followed by a move right to rebalance the pendulum. The interaction of the pendulum instability and the positive position feedback instability to produce a stable system is a feature that makes the mathematical analysis an interesting and challenging problem. === From Lagrange's equations === The equations of motion c

    Read more →
  • Oja's rule

    Oja's rule

    Oja's learning rule, or simply Oja's rule, named after Finnish computer scientist Erkki Oja (Finnish pronunciation: [ˈojɑ], AW-yuh), is a model of how neurons in the brain or in artificial neural networks change connection strength, or learn, over time. It is a modification of the standard Hebb's Rule that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis. This is a computational form of an effect which is believed to happen in biological neurons. == Theory == Oja's rule requires a number of simplifications to derive, but in its final form it is demonstrably stable, unlike Hebb's rule. It is a single-neuron special case of the Generalized Hebbian Algorithm. However, Oja's rule can also be generalized in other ways to varying degrees of stability and success. === Formula === Consider a simplified model of a neuron y {\displaystyle y} that returns a linear combination of its inputs x using presynaptic weights w: y ( x ) = ∑ j = 1 m x j w j {\displaystyle \,y(\mathbf {x} )~=~\sum _{j=1}^{m}x_{j}w_{j}} Oja's rule defines the change in presynaptic weights w given the output response y {\displaystyle y} of a neuron to its inputs x to be Δ w = w n + 1 − w n = η y n ( x n − y n w n ) , {\displaystyle \,\Delta \mathbf {w} ~=~\mathbf {w} _{n+1}-\mathbf {w} _{n}~=~\eta \,y_{n}(\mathbf {x} _{n}-y_{n}\mathbf {w} _{n}),} where η is the learning rate which can also change with time. Note that the bold symbols are vectors and n defines a discrete time iteration. The rule can also be made for continuous iterations as d w d t = η y ( t ) ( x ( t ) − y ( t ) w ( t ) ) . {\displaystyle \,{\frac {d\mathbf {w} }{dt}}~=~\eta \,y(t)(\mathbf {x} (t)-y(t)\mathbf {w} (t)).} === Derivation === The simplest learning rule known is Hebb's rule, which states in conceptual terms that neurons that fire together, wire together. In component form as a difference equation, it is written Δ w = η y ( x n ) x n {\displaystyle \,\Delta \mathbf {w} ~=~\eta \,y(\mathbf {x} _{n})\mathbf {x} _{n}} , or in scalar form with implicit n-dependence, w i ( n + 1 ) = w i ( n ) + η y ( x ) x i {\displaystyle \,w_{i}(n+1)~=~w_{i}(n)+\eta \,y(\mathbf {x} )x_{i}} , where y(xn) is again the output, this time explicitly dependent on its input vector x. Hebb's rule has synaptic weights approaching infinity with a positive learning rate. We can stop this by normalizing the weights so that each weight's magnitude is restricted between 0, corresponding to no weight, and 1, corresponding to being the only input neuron with any weight. We do this by normalizing the weight vector to be of length one: w i ( n + 1 ) = w i ( n ) + η y ( x ) x i ( ∑ j = 1 m [ w j ( n ) + η y ( x ) x j ] p ) 1 / p {\displaystyle \,w_{i}(n+1)~=~{\frac {w_{i}(n)+\eta \,y(\mathbf {x} )x_{i}}{\left(\sum _{j=1}^{m}[w_{j}(n)+\eta \,y(\mathbf {x} )x_{j}]^{p}\right)^{1/p}}}} . Note that in Oja's original paper, p=2, corresponding to quadrature (root sum of squares), which is the familiar Cartesian normalization rule. However, any type of normalization, even linear, will give the same result without loss of generality. For a small learning rate | η | ≪ 1 {\displaystyle |\eta |\ll 1} the equation can be expanded as a Power series in η {\displaystyle \eta } . w i ( n + 1 ) = w i ( n ) ( ∑ j w j p ( n ) ) 1 / p + η ( y x i ( ∑ j w j p ( n ) ) 1 / p − w i ( n ) ∑ j y x j w j p − 1 ( n ) ( ∑ j w j p ( n ) ) ( 1 + 1 / p ) ) + O ( η 2 ) {\displaystyle \,w_{i}(n+1)~=~{\frac {w_{i}(n)}{\left(\sum _{j}w_{j}^{p}(n)\right)^{1/p}}}~+~\eta \left({\frac {yx_{i}}{\left(\sum _{j}w_{j}^{p}(n)\right)^{1/p}}}-{\frac {w_{i}(n)\sum _{j}yx_{j}w_{j}^{p-1}(n)}{\left(\sum _{j}w_{j}^{p}(n)\right)^{(1+1/p)}}}\right)~+~O(\eta ^{2})} . For small η, our higher-order terms O(η2) go to zero. We again make the specification of a linear neuron, that is, the output of the neuron is equal to the sum of the product of each input and its synaptic weight to the power of p-1, which in the case of p=2 is synaptic weight itself, or y ( x ) = ∑ j = 1 m x j w j p − 1 {\displaystyle \,y(\mathbf {x} )~=~\sum _{j=1}^{m}x_{j}w_{j}^{p-1}} . We also specify that our weights normalize to 1, which will be a necessary condition for stability, so | w | = ( ∑ j = 1 m w j p ) 1 / p = 1 {\displaystyle \,|\mathbf {w} |~=~\left(\sum _{j=1}^{m}w_{j}^{p}\right)^{1/p}~=~1} , which, when substituted into our expansion, gives Oja's rule, or w i ( n + 1 ) = w i ( n ) + η y ( x i − w i ( n ) y ) {\displaystyle \,w_{i}(n+1)~=~w_{i}(n)+\eta \,y(x_{i}-w_{i}(n)y)} . === Stability and PCA === In analyzing the convergence of a single neuron evolving by Oja's rule, one extracts the first principal component, or feature, of a data set. Furthermore, with extensions using the Generalized Hebbian Algorithm, one can create a multi-Oja neural network that can extract as many features as desired, allowing for principal components analysis. A principal component aj is extracted from a dataset x through some associated vector qj, or aj = qj⋅x, and we can restore our original dataset by taking x = ∑ j a j q j {\displaystyle \mathbf {x} ~=~\sum _{j}a_{j}\mathbf {q} _{j}} . In the case of a single neuron trained by Oja's rule, we find the weight vector converges to q1, or the first principal component, as time or number of iterations approaches infinity. We can also define, given a set of input vectors Xi, that its correlation matrix Rij = XiXj has an associated eigenvector given by qj with eigenvalue λj. The variance of outputs of our Oja neuron σ2(n) = ⟨y2(n)⟩ then converges with time iterations to the principal eigenvalue, or lim n → ∞ σ 2 ( n ) = λ 1 {\displaystyle \lim _{n\rightarrow \infty }\sigma ^{2}(n)~=~\lambda _{1}} . These results are derived using Lyapunov function analysis, and they show that Oja's neuron necessarily converges on strictly the first principal component if certain conditions are met in our original learning rule. Most importantly, our learning rate η is allowed to vary with time, but only such that its sum is divergent but its power sum is convergent, that is ∑ n = 1 ∞ η ( n ) = ∞ , ∑ n = 1 ∞ η ( n ) p < ∞ , p > 1 {\displaystyle \sum _{n=1}^{\infty }\eta (n)=\infty ,~~~\sum _{n=1}^{\infty }\eta (n)^{p}<\infty ,~~~p>1} . Our output activation function y(x(n)) is also allowed to be nonlinear and nonstatic, but it must be continuously differentiable in both x and w and have derivatives bounded in time. == Applications == Oja's rule was originally described in Oja's 1982 paper, but the principle of self-organization to which it is applied is first attributed to Alan Turing in 1952. PCA has also had a long history of use before Oja's rule formalized its use in network computation in 1989. The model can thus be applied to any problem of self-organizing mapping, in particular those in which feature extraction is of primary interest. Therefore, Oja's rule has an important place in image and speech processing. It is also useful as it expands easily to higher dimensions of processing, thus being able to integrate multiple outputs quickly. A canonical example is its use in binocular vision. === Biology and Oja's subspace rule === There is clear evidence for both long-term potentiation and long-term depression in biological neural networks, along with a normalization effect in both input weights and neuron outputs. However, while there is no direct experimental evidence yet of Oja's rule active in a biological neural network, a biophysical derivation of a generalization of the rule is possible. Such a derivation requires retrograde signalling from the postsynaptic neuron, which is biologically plausible (see neural backpropagation), and takes the form of Δ w i j ∝ ⟨ x i y j ⟩ − ϵ ⟨ ( c p r e ∗ ∑ k w i k y k ) ⋅ ( c p o s t ∗ y j ) ⟩ , {\displaystyle \Delta w_{ij}~\propto ~\langle x_{i}y_{j}\rangle -\epsilon \left\langle \left(c_{\mathrm {pre} }\sum _{k}w_{ik}y_{k}\right)\cdot \left(c_{\mathrm {post} }y_{j}\right)\right\rangle ,} where as before wij is the synaptic weight between the ith input and jth output neurons, x is the input, y is the postsynaptic output, and we define ε to be a constant analogous the learning rate, and cpre and cpost are presynaptic and postsynaptic functions that model the weakening of signals over time. Note that the angle brackets denote the average and the ∗ operator is a convolution. By taking the pre- and post-synaptic functions into frequency space and combining integration terms with the convolution, we find that this gives an arbitrary-dimensional generalization of Oja's rule known as Oja's Subspace, namely Δ w = C x ⋅ w − w ⋅ C y . {\displaystyle \Delta w~=~Cx\cdot w-w\cdot Cy.}

    Read more →
  • FERET (facial recognition technology)

    FERET (facial recognition technology)

    The Facial Recognition Technology (FERET) program was a government-sponsored project that aimed to create a large, automatic face-recognition system for intelligence, security, and law enforcement purposes. The program began in 1993 under the combined leadership of Dr. Harry Wechsler at George Mason University (GMU) and Dr. Jonathon Phillips at the Army Research Laboratory (ARL) in Adelphi, Maryland and resulted in the development of the Facial Recognition Technology (FERET) database. The goal of the FERET program was to advance the field of face recognition technology by establishing a common database of facial imagery for researchers to use and setting a performance baseline for face-recognition algorithms. Potential areas where this face-recognition technology could be used include: Automated searching of mug books using surveillance photos Controlling access to restricted facilities or equipment Checking the credentials of personnel for background and security clearances Monitoring airports, border crossings, and secure manufacturing facilities for particular individuals Finding and logging multiple appearances of individuals over time in surveillance videos Verifying identities at ATM machines Searching photo ID records for fraud detection The FERET database has been used by more than 460 research groups and is currently managed by the National Institute of Standards and Technology (NIST). By 2017, the FERET database has been used to train artificial intelligence programs and computer vision algorithms to identify and sort faces. == History == The origin of facial recognition technology is largely attributed to Woodrow Wilson Bledsoe and his work in the 1960s, when he developed a system to identify faces from a database of thousands of photographs. The FERET program first began as a way to unify a large body of face-recognition technology research under a standard database. Before the program's inception, most researchers created their own facial imagery database that was attuned to their own specific area of study. These personal databases were small and usually consisted of images from less than 50 individuals. The only notable exceptions were the following: Alex Pentland’s database of around 7500 facial images at the Massachusetts Institute of Technology (MIT) Joseph Wilder's database of around 250 individuals at Rutgers University Christoph von der Malsburg’s database of around 100 facial images at the University of Southern California (USC) The lack of a common database made it difficult to compare the results of face recognition studies in the scientific literature because each report involved different assumptions, scoring methods, and images. Most of the papers that were published did not use images from a common database nor follow a standard testing protocol. As a result, researchers were unable to make informed comparisons between the performances of different face-recognition algorithms. In September 1993, the FERET program was spearheaded by Dr. Harry Wechsler and Dr. Jonathon Phillips under the sponsorship of the U.S. Department of Defense Counterdrug Technology Development Program through DARPA with ARL serving as technical agent. === Phase I === The first facial images for the FERET database were collected from August 1993 to December 1994, a time period known as Phase I. The pictures were initially taken with a 35-mm camera at both GMU and ARL facilities, and the same physical setup was used in each photography session to keep the images consistent. For each individual, the pictures were taken in sets, including two frontal views, a right and left profile, a right and left quarter profile, a right and left half profile, and sometimes at five extra locations. Therefore, a set of images consisted of 5 to 11 images per person. At the end of Phase I, the FERET database had collected 673 sets of images, resulting in over 5000 total images. At the end of Phase I, five organizations were given the opportunity to test their face-recognition algorithm on the newly created FERET database in order to compare how they performed against each other. There five principal investigators were: MIT, led by Alex Pentland Rutgers University, led by Joseph Wilder The Analytic Science Company (TASC), led by Gale Gordon The University of Illinois at Chicago (UIC) and the University of Illinois at Urbana-Champaign, led by Lewis Sadler and Thomas Huang USC, led by Christoph von der Malsburg During this evaluation, three different automatic tests were given to the principal investigators without human intervention: The large gallery test, which served to baseline how algorithms performed against a database when it has not been properly tuned. The false-alarm test, which tested how well the algorithm monitored an airport for suspected terrorists. The rotation test, which measured how well the algorithm performed when the images of an individual in the gallery had different poses compared to those in the probe set. For most of the test trials, the algorithms developed by USC and MIT managed to outperform the other three algorithms for the Phase I evaluation. === Phase II === Phase II began after Phase I, and during this time, the FERET database acquired more sets of facial images. By the start of the Phase II evaluation in March 1995, the database contained 1109 sets of images for a total of 8525 images of 884 individuals. During the second evaluation, the same algorithms from the Phase I evaluation were given a single test. However, the database now contained significantly more duplicate images (463, compared to the previous 60), making the test more challenging. === Phase III === Afterwards, the FERET program entered Phase III where another 456 sets of facial images were added to the database. The Phase III evaluation, which took place in September 1996, aimed to not only gauge the progress of the algorithms since the Phase I assessment but also identify the strengths and weaknesses of each algorithm and determine future objectives for research. By the end of 1996, the FERET database had accumulated a total of 14,126 facial images pertaining to 1199 different individuals as well as 365 duplicate sets of images. As a result of the FERET program, researchers were able to establish a common baseline for comparing different face-recognition algorithms and create a large standard database of facial images that is open for research. In 2003, DARPA released a high-resolution, 24-bit color version of the images in the FERET database (existing reference).

    Read more →
  • Pyramid (image processing)

    Pyramid (image processing)

    Pyramid, or pyramid representation, is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsampling. Pyramid representation is a predecessor to scale-space representation and multiresolution analysis. == Pyramid generation == There are two main types of pyramids: lowpass and bandpass. A lowpass pyramid is made by smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of 2 along each coordinate direction. The resulting image is then subjected to the same procedure, and the cycle is repeated multiple times. Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution). If illustrated graphically, the entire multi-scale representation will look like a pyramid, with the original image on the bottom and each cycle's resulting smaller image stacked one atop the other. A bandpass pyramid is made by forming the difference between images at adjacent levels in the pyramid and performing image interpolation between adjacent levels of resolution, to enable computation of pixelwise differences. == Pyramid generation kernels == A variety of different smoothing kernels have been proposed for generating pyramids. Among the suggestions that have been given, the binomial kernels arising from the binomial coefficients stand out as a particularly useful and theoretically well-founded class. Thus, given a two-dimensional image, we may apply the (normalized) binomial filter (1/4, 1/2, 1/4) typically twice or more along each spatial dimension and then subsample the image by a factor of two. This operation may then proceed as many times as desired, leading to a compact and efficient multi-scale representation. If motivated by specific requirements, intermediate scale levels may also be generated where the subsampling stage is sometimes left out, leading to an oversampled or hybrid pyramid. With the increasing computational efficiency of CPUs available today, it is in some situations also feasible to use wider supported Gaussian filters as smoothing kernels in the pyramid generation steps. === Gaussian pyramid === In a Gaussian pyramid, subsequent images are weighted down using a Gaussian average (Gaussian blur) and scaled down. Each pixel containing a local average corresponds to a neighborhood pixel on a lower level of the pyramid. This technique is used especially in texture synthesis. === Laplacian pyramid === A Laplacian pyramid is very similar to a Gaussian pyramid but saves the difference image of the blurred versions between each levels. Only the smallest level is not a difference image to enable reconstruction of the high resolution image using the difference images on higher levels. This technique can be used in image compression. === Steerable pyramid === A steerable pyramid, developed by Simoncelli and others, is an implementation of a multi-scale, multi-orientation band-pass filter bank used for applications including image compression, texture synthesis, and object recognition. It can be thought of as an orientation selective version of a Laplacian pyramid, in which a bank of steerable filters are used at each level of the pyramid instead of a single Laplacian or Gaussian filter. == Applications of pyramids == === Alternative representation === In the early days of computer vision, pyramids were used as the main type of multi-scale representation for computing multi-scale image features from real-world image data. More recent techniques include scale-space representation, which has been popular among some researchers due to its theoretical foundation, the ability to decouple the subsampling stage from the multi-scale representation, the more powerful tools for theoretical analysis as well as the ability to compute a representation at any desired scale, thus avoiding the algorithmic problems of relating image representations at different resolution. Nevertheless, pyramids are still frequently used for expressing computationally efficient approximations to scale-space representation. === Detail manipulation === Levels of a Laplacian pyramid can be added to or removed from the original image to amplify or reduce detail at different scales. However, detail manipulation of this form is known to produce halo artifacts in many cases, leading to the development of alternatives such as the bilateral filter. Some image compression file formats use the Adam7 algorithm or some other interlacing technique. These can be seen as a kind of image pyramid. Because those file format store the "large-scale" features first, and fine-grain details later in the file, a particular viewer displaying a small "thumbnail" or on a small screen can quickly download just enough of the image to display it in the available pixels—so one file can support many viewer resolutions, rather than having to store or generate a different file for each resolution.

    Read more →
  • Discrete diffusion model

    Discrete diffusion model

    In machine learning, discrete diffusion models are a class of diffusion models, which themselves are a class of latent variable generative models. Each discrete diffusion model consists of two major components: the forward jump diffusion process, and the reverse jump diffusion process. The goal of diffusion modeling is, given a given dataset and a forward process, to learn a model for the reverse process, such that the reverse process can generate new elements that are distributed similarly as the original dataset. A trained discrete diffusion model can be sampled in many ways, which trades off computational efficiency and sample quality. In general, higher quality data can be obtained, but at the price of higher computational cost. In standard diffusion modeling, the diffusion process takes place over a state space that is continuous space of R n {\displaystyle \mathbb {R} ^{n}} , but over a discrete set S {\displaystyle S} . A discrete set is simply a set where one cannot speak of "infinitesimally close" points. Points can be more or less separated from each other, but the separation is always a finite number. This in particular means the standard framework of continuous diffusion does not apply, since it uses gaussian noise, which is continuous. Nevertheless, an analogous theory can be produced. Discrete diffusion is usually used for language modeling. In practice, the state space S {\displaystyle S} is not only discrete, but finite, so this is what we will assume from now on. == Continuous time Markov process == In the case of continuous state space, during the forward discrete diffusion process, at each step t → t + d t {\displaystyle t\to t+dt} , we mix in an infinitesimal amount of gaussian noise d x t = − 1 2 β ( t ) x t d t + β ( t ) d W t {\displaystyle dx_{t}=-{\frac {1}{2}}\beta (t)x_{t}dt+{\sqrt {\beta (t)}}dW_{t}} . This changes the probability density function, by first a convolution with the density of a gaussian, followed by a scaling. In the case of discrete state space, the gaussian noise must be replaced by a noise that takes values over a finite set. For example, if the noise is the uniform distribution over S {\displaystyle S} , then the probability distribution at time t + d t {\displaystyle t+dt} satisfies q t + d t ( x ) = ( 1 − d t ) q t ( x ) + d t ( 1 | S | ∑ y ∈ S q t ( y ) ) {\displaystyle q_{t+dt}(x)=(1-dt)q_{t}(x)+dt\left({\frac {1}{|S|}}\sum _{y\in S}q_{t}(y)\right)} More succinctly, ∂ t q t ( x ) = − ( 1 − 1 | S | ) q t ( x ) + ∑ y ∈ S , y ≠ x 1 | S | q t ( y ) {\displaystyle \partial _{t}q_{t}(x)=-\left(1-{\frac {1}{|S|}}\right)q_{t}(x)+\sum _{y\in S,y\neq x}{\frac {1}{|S|}}q_{t}(y)} In general, we do not need to convolve with a uniformly distributed noise, but with an arbitrary noise process. That is, we use an arbitrary matrix Q t {\displaystyle Q_{t}} such that ∂ t q t ( y ) = ∑ x ∈ S Q t ( y , x ) q t ( x ) {\displaystyle \partial _{t}q_{t}(y)=\sum _{x\in S}Q_{t}(y,x)q_{t}(x)} where Q t {\displaystyle Q_{t}} is called the rate matrix. Any matrix may be used as a rate matrix if it has non-negative off-diagonals, and each column sums to 0: Q t ( y , x ) ≥ 0 ∀ y ≠ x , ∑ y ∈ S Q t ( y , x ) = 0 ∀ x {\displaystyle Q_{t}(y,x)\geq 0\quad \forall y\neq x,\quad \sum _{y\in S}Q_{t}(y,x)=0\quad \forall x} A continuous time Markov chain (CTMC) is defined by a continuous function Q {\displaystyle Q} that maps any time t ∈ [ 0 , T ) {\displaystyle t\in [0,T)} to a rate matrix Q t {\displaystyle Q_{t}} . Given the function Q {\displaystyle Q} , time-evolution under the CTMC is done as follows: Given state x t {\displaystyle x_{t}} at time t {\displaystyle t} , and given an infinitesimal d t {\displaystyle dt} , the state at t + d t {\displaystyle t+dt} is x t + d t {\displaystyle x_{t+dt}} , such that Pr ( x t + d t | x t ) = { 1 + Q t ( x t + d t , x t ) d t if x t + d t = x t Q t ( x t + d t , x t ) d t else {\displaystyle \Pr(x_{t+dt}|x_{t})={\begin{cases}1+Q_{t}(x_{t+dt},x_{t})dt&{\text{if }}x_{t+dt}=x_{t}\\Q_{t}(x_{t+dt},x_{t})dt&{\text{else}}\end{cases}}} This implies that the probability distribution function evolves according to ∂ t q t ( y ) = ∑ x ∈ S Q t ( y , x ) q t ( x ) {\displaystyle \partial _{t}q_{t}(y)=\sum _{x\in S}Q_{t}(y,x)q_{t}(x)} which is what we previously specified. === Backward process === Similarly to the case of continuous diffusion, in discrete diffusion, there exists a backward diffusion process Q ¯ t {\displaystyle {\bar {Q}}_{t}} : s ( x , t ) y := q t ( y ) q t ( x ) , Q ¯ t ( y , x ) := { s ( x , t ) y Q t ( x , y ) if y ≠ x − ∑ y : y ≠ x Q ¯ t ( y , x ) if y = x {\displaystyle s(x,t)_{y}:={\frac {q_{t}(y)}{q_{t}(x)}},\quad {\bar {Q}}_{t}(y,x):={\begin{cases}s(x,t)_{y}Q_{t}(x,y)&{\text{if }}y\neq x\\-\sum _{y:y\neq x}{\bar {Q}}_{t}(y,x)&{\text{if }}y=x\end{cases}}} where s ( x , t ) y {\displaystyle s(x,t)_{y}} should be interpreted as the discrete score or concrete score, since, abusing notation a bit, the score function is ∇ ln ⁡ ρ t ( x ) = 1 d x ( ρ t ( x + d x ) ρ t ( x ) − 1 ) {\displaystyle \nabla \ln \rho _{t}(x)={\frac {1}{dx}}\left({\frac {\rho _{t}(x+dx)}{\rho _{t}(x)}}-1\right)} . If we picture the distribution q t {\displaystyle q_{t}} as a bunch of point-masses, one per state x ∈ S {\displaystyle x\in S} , then the forward diffusion from time t {\displaystyle t} to t + d t {\displaystyle t+dt} is performed by removing Q t ( x , y ) q t ( y ) d t {\displaystyle Q_{t}(x,y)q_{t}(y)dt} from the mass at y {\displaystyle y} and moving it to the mass at x {\displaystyle x} , for each pair x ≠ y {\displaystyle x\neq y} . Thus, the process is reversed in detail by the CTMC defined by Q ¯ {\displaystyle {\bar {Q}}} , since Q ¯ t ( y , x ) q t ( x ) = Q t ( x , y ) q t ( y ) {\displaystyle {\bar {Q}}_{t}(y,x)q_{t}(x)=Q_{t}(x,y)q_{t}(y)} . Given Q ¯ t {\displaystyle {\bar {Q}}_{t}} , if we have a way to sample from q t {\displaystyle q_{t}} , then we can sample from q t − d t {\displaystyle q_{t-dt}} by first sampling x t ∼ q t {\displaystyle x_{t}\sim q_{t}} , then sampling x t − d t {\displaystyle x_{t-dt}} according to Pr ( x t − d t | x t ) = { 1 + Q ¯ t ( x t − d t , x t ) d t if x t − d t = x t Q ¯ t ( x t − d t , x t ) d t else {\displaystyle \Pr(x_{t-dt}|x_{t})={\begin{cases}1+{\bar {Q}}_{t}(x_{t-dt},x_{t})dt&{\text{if }}x_{t-dt}=x_{t}\\{\bar {Q}}_{t}(x_{t-dt},x_{t})dt&{\text{else}}\end{cases}}} === Overall plan of score-matching discrete diffusion modeling === Similar to score-matching continuous diffusion, score-matching discrete diffusion is a method to sample an initial distribution. If we have a certain function s θ {\displaystyle s_{\theta }} that approximates the true score function s θ ( x , t ) y ≈ s ( x , t ) y {\displaystyle s_{\theta }(x,t)_{y}\approx s(x,t)_{y}} , then it allows a corresponding Q ¯ θ {\displaystyle {\bar {Q}}^{\theta }} to be defined in the same way. If we also have a base distribution q base {\displaystyle q_{\text{base}}} such that it is easy to sample from, and approximately equal to the true terminal distribution q base ≈ q T {\displaystyle q_{\text{base}}\approx q_{T}} , then we can perform the backward CTMC with Q ¯ θ {\displaystyle {\bar {Q}}^{\theta }} and q T θ := q terminal {\displaystyle q_{T}^{\theta }:=q_{\text{terminal}}} . When both approximations are good, the backward CTMC would give q 0 θ ≈ q 0 {\displaystyle q_{0}^{\theta }\approx q_{0}} . This is the idea of score-matching discrete diffusion modeling. If q data {\displaystyle q_{\text{data}}} is sharp, in the sense that for some x , x ′ {\displaystyle x,x'} , we have q data ( x ) ≫ q data ( x ′ ) {\displaystyle q_{\text{data}}(x)\gg q_{\text{data}}(x')} , then the score function would diverge as 1 / t {\displaystyle 1/t} at the t → 0 {\displaystyle t\to 0} limit. To avoid this in practice, it is common to use early stopping, which is to stop the backward process at some time δ > 0 {\displaystyle \delta >0} , and sample from q δ θ {\displaystyle q_{\delta }^{\theta }} instead of q 0 θ {\displaystyle q_{0}^{\theta }} . === Tractable forward processes === The theory of CTMC works for any continuous choice of rate matrices Q {\displaystyle Q} . However, most choices are computationally expensive and cannot be used in practice. In the case of continuous diffusion, the gaussian noise is used for the simple reason that the sum of any number of gaussians is still a gaussian. This allows one to sample any x t ∼ ρ t {\displaystyle x_{t}\sim \rho _{t}} by sampling a single x 0 ∼ ρ 0 {\displaystyle x_{0}\sim \rho _{0}} , followed by a single gaussian noise z ∼ N ( 0 , I ) {\displaystyle z\sim {\mathcal {N}}(0,I)} , and let x t = α ¯ t x 0 + σ t z {\displaystyle x_{t}={\sqrt {{\bar {\alpha }}_{t}}}x_{0}+\sigma _{t}z} , without needing any x s {\displaystyle x_{s}} for any 0 < s < t {\displaystyle 0 Read more →

  • Genetic operator

    Genetic operator

    A genetic operator is an operator used in evolutionary algorithms (EA) to guide the algorithm towards a solution to a given problem. There are three main types of operators (mutation, crossover and selection), which must work in conjunction with one another in order for the algorithm to be successful. Genetic operators are used to create and maintain genetic diversity (mutation operator), combine existing solutions (also known as chromosomes) into new solutions (crossover) and select between solutions (selection). The classic representatives of evolutionary algorithms include genetic algorithms, evolution strategies, genetic programming and evolutionary programming. In his book discussing the use of genetic programming for the optimization of complex problems, computer scientist John Koza has also identified an 'inversion' or 'permutation' operator; however, the effectiveness of this operator has never been conclusively demonstrated and this operator is rarely discussed in the field of genetic programming. For combinatorial problems, however, these and other operators tailored to permutations are frequently used by other EAs. Mutation (or mutation-like) operators are said to be unary operators, as they only operate on one chromosome at a time. In contrast, crossover operators are said to be binary operators, as they operate on two chromosomes at a time, combining two existing chromosomes into one new chromosome. == Operators == Genetic variation is a necessity for the process of evolution. Genetic operators used in evolutionary algorithms are analogous to those in the natural world: survival of the fittest, or selection; reproduction (crossover, also called recombination); and mutation. === Selection === Selection operators give preference to better candidate solutions (chromosomes), allowing them to pass on their 'genes' to the next generation (iteration) of the algorithm. The best solutions are determined using some form of objective function (also known as a 'fitness function' in evolutionary algorithms), before being passed to the crossover operator. Different methods for choosing the best solutions exist, for example, fitness proportionate selection and tournament selection. A further or the same selection operator is used to determine the individuals for being selected to form the next parental generation. The selection operator may also ensure that the best solution(s) from the current generation always become(s) a member of the next generation without being altered; this is known as elitism or elitist selection. === Crossover === Crossover is the process of taking more than one parent solutions (chromosomes) and producing a child solution from them. By recombining portions of good solutions, the evolutionary algorithm is more likely to create a better solution. As with selection, there are a number of different methods for combining the parent solutions, including the edge recombination operator (ERO) and the 'cut and splice crossover' and 'uniform crossover' methods. The crossover method is often chosen to closely match the chromosome's representation of the solution; this may become particularly important when variables are grouped together as building blocks, which might be disrupted by a non-respectful crossover operator. Similarly, crossover methods may be particularly suited to certain problems; the ERO is considered a good option for solving the travelling salesman problem. === Mutation === The mutation operator encourages genetic diversity amongst solutions and attempts to prevent the evolutionary algorithm converging to a local minimum by stopping the solutions becoming too close to one another. In mutating the current pool of solutions, a given solution may change between slightly and entirely from the previous solution. By mutating the solutions, an evolutionary algorithm can reach an improved solution solely through the mutation operator. Again, different methods of mutation may be used; these range from a simple bit mutation (flipping random bits in a binary string chromosome with some low probability) to more complex mutation methods in which genes in the solution are changed, for example by adding a random value from the Gaussian distribution to the current gene value. As with the crossover operator, the mutation method is usually chosen to match the representation of the solution within the chromosome. == Combining operators == While each operator acts to improve the solutions produced by the evolutionary algorithm working individually, the operators must work in conjunction with each other for the algorithm to be successful in finding a good solution. Using the selection operator on its own will tend to fill the solution population with copies of the best solution from the population. If the selection and crossover operators are used without the mutation operator, the algorithm will tend to converge to a local minimum, that is, a good but sub-optimal solution to the problem. Using the mutation operator on its own leads to a random walk through the search space. Only by using all three operators together can the evolutionary algorithm become a noise-tolerant global search algorithm, yielding good solutions to the problem at hand.

    Read more →
  • VITAL (machine learning software)

    VITAL (machine learning software)

    VITAL (Validating Investment Tool for Advancing Life Sciences) was a Board Management Software machine learning proprietary software developed by Aging Analytics, a company registered in Bristol (England) and dissolved in 2017. Andrew Garazha (the firm's Senior Analyst) declared that the project aimed "through iterative releases and updates to create a piece of software capable of making autonomous investment decisions." According to Nick Dyer-Witheford, VITAL 1.0 was a "basic algorithm". On 13 May 2014, Deep Knowledge Ventures, a Hong Kong venture capital firm, claimed to have appointed VITAL to its board of directors in order to prove that artificial intelligence could be an instrument for investment decision-making. The announcement received great press coverage despite the fact commentators consider this a publicity stunt. Fortune reported in 2019 that VITAL is no longer used. == Criticism == Academics and journalists viewed VITAL's board appointment with skepticism. University of Sheffield computer science professor Noel Sharkey called it "a publicity hype". Michael Osborne, a University of Oxford associate professor in machine learning, found it is "a gimmick to call that an actual board member". Simon Sharwood of The Register, wrote there is "a strong whiff of stunt and/or promotion about this". In a 2019 speech, the Chief Scientist of Australia, Alan Finkel, commented, "At the time, most of us probably dismissed Vital as a PR exercise. I admit, I used her story three years ago to get a laugh in one of my speeches." Florian Möslein, a law professor at the University of Marburg, wrote in 2018 that "Vital has widely been acknowledged as the 'world's first artificial intelligence company director'". Vice journalist Jason Koebler suggested that the software did not have any article intelligence capabilities and concluded "VITAL can’t talk, and it can’t hear, and it can’t be a real, functional executive of a company." Sharwood of The Register noted that because VITAL was not a natural person, it could not be a board member under Hong Kong's corporate governance laws. However, in a 2017 interview to The Nikkei, Dmitry Kaminskiy, managing partner of Deep Knowledge Ventures, stated that VITAL had observer status on the board and no voting rights. University of Sheffield computer science professor Noel Sharkey said of VITAL, "On first sight, it looks like a futuristic idea but on reflection it is really a little bit of publicity hype." Vice journalist Jason Koebler said "this is a gimmick" and said "There is literally nothing to suggest that VITAL has any sort of capabilities beyond any other proprietary analysis software". Michael Osborne, a University of Oxford associate professor in machine learning, found VITAL's appointment to be noncredible, saying it is "a bit of a gimmick to call that an actual board member". Osborne said that a core duty of board members to converse with each other, which the algorithm is incapable of doing, so its more likely functionality is to serve as a springboard for conversation among other board members. In a 2019 speech, the Chief Scientist of Australia, Alan Finkel, commented, "At the time, most of us probably dismissed Vital as a PR exercise. I admit, I used her story three years ago to get a laugh in one of my speeches." == Machine intelligence as board member == VITAL was created by a group of programmers employed by Aging Analytics According to Andrew Garazh, Aging Analytics Senior Analyst, VITAL was not a machine learning algorithm as the necessary datasets on investment rounds, intellectual property and clinical trial outcomes are generally not disclosed. Rather, VITAL used fuzzy logic based on 50 parameters to assess risk factors. Aging Analytics licensed the software to Deep Knowledge Ventures. It was used to help the human board members of Deep Knowledge Venture make investment decisions in biotechnology companies. For instance, it supported investments in Insilico Medicine, which creates ways for computers to help find drugs in research into aging. VITAL also supported investing in Pathway Pharmaceuticals, which uses the OncoFinder algorithm to choose and appraise cancer treatments. According to Dmitry Kaminskiy, managing partner of Deep Knowledge Ventures, the motivation for using VITAL was the large number of failed investments in the biotechnology sector and the desire to avoid investing in companies likely to fail. == Ethical and legal implications == Scholars addressed questions around the safety, privacy, accountability transparency and bias in algorithms. Writing in the philosophical journal Multitudes, the academic Ariel Kyrou raised questions about the consequences of a mistake made by an algorithm recommending a dangerous investment. He raised the hypothetical where VITAL was able to persuade the board to invest in a startup that had the facade of doing research into treatment for age-associated ills, but in actuality was run by terrorists who were raising funds. Kyrou raised a series of questions about who society would fault for VITAL's mistake. As the owner of VITAL, should Deep Knowledge Ventures be held accountable, or rather should the companies that supplied data to VITAL or the people who created VITAL be held liable? Simon Sharwood of The Register wrote that because the appointment of a software program to the board directors is not legally feasible in Hong Kong, there is "a strong whiff of stunt and/or promotion about this". Quoting a Thomson Reuters website describing Hong Kong legislation related to corporate governance, Sharwood pointed out that in Hong Kong "the board comprises all of the directors of the company" and "a director must normally be a natural person, except that a private company may have a body corporate as its director if the company is not a member of a listed group." He concluded that since VITAL cannot be considered a "natural person", it is merely a "cosmetic" appointment to the board and that "this software is no more a Board member than Caligula's horse was a senator". Sharwood further argued that corporations frequently purchase directors and officers liability insurance but that it would be practically impossible to get such insurance for VITAL. Sharwood also wrote that were VITAL to be hacked, any misinformation it outputs could be considered "false and misleading communications". In the book Research Handbook on the Law of Artificial Intelligence, Florian Mölein wrote that VITAL could not become a director as defined in Hong Kong's corporate laws, so the other directors just were approaching it as "a member of [the] board with observer status". Lin Shaowei raised concerns in a Journal of East China University of Political Science and Law article about how the software's appearance inspired a complex question about the relationship between corporate law and artificial intelligence. VITAL could be considered either a board director who has voting rights or an observer who does not. Lin said either choice raised questions about whether VITAL is subject to corporate law and who would be held accountable if VITAL recommends a choice that turns out to be damaging to the company. David Theo Goldberg in the Critical Times, a peer reviewed journal in Critical Global Theory, argues that VITAL processed a dataset to predict the most remunerative investment opportunities. Drawing his analysis on an article from Business Insider, Goldberg describes VITAL's decision-making predictiveness based "on surface pattern recognition and the identification of regularities and/or irregularities". In other words, Goldberg asserts that "the normativity of the surface" explains algorithmic knowledge of a "product" like VITAL. In Homo Deus, Yuval Noah Harari mentions VITAL as an example of the future risks that humankind faces. Harari argues that the human mind is being replaced by a world in which algorithms and data make the decisions. Specifically, it is argued that "as algorithms push humans out of the job market," executive boards driven by artificial intelligence are more likely to give priority to algorithms over the humans.

    Read more →
  • Egocentric vision

    Egocentric vision

    Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting. The wearable camera looking forwards is often supplemented with a camera looking inward at the user's eye and able to measure a user's eye gaze, which is useful to reveal attention and to better understand the user's activity and intentions. == History == The idea of using a wearable camera to gather visual data from a first-person perspective dates back to the 70s, when Steve Mann invented "Digital Eye Glass", a device that, when worn, causes the human eye itself to effectively become both an electronic camera and a television display. Subsequently, wearable cameras were used for health-related applications in the context of Humanistic Intelligence and Wearable AI. Egocentric vision is best done from the point-of-eye, but may also be done by way of a neck-worn camera when eyeglasses would be in-the-way. This neck-worn variant was popularized by way of the Microsoft SenseCam in 2006 for experimental health research works. The interest of the computer vision community into the egocentric paradigm has been arising slowly entering the 2010s and it is rapidly growing in recent years, boosted by both the impressive advances in the field of wearable technology and by the increasing number of potential applications. The prototypical first-person vision system described by Kanade and Hebert, in 2012 is composed by three basic components: a localization component able to estimate the surrounding, a recognition component able to identify object and people, and an activity recognition component, able to provide information about the current activity of the user. Together, these three components provide a complete situational awareness of the user, which in turn can be used to provide assistance to the user or to the caregiver. Following this idea, the first computational techniques for egocentric analysis focused on hand-related activity recognition and social interaction analysis. Also, given the unconstrained nature of the video and the huge amount of data generated, temporal segmentation and summarization were among the first problems addressed. After almost ten years of egocentric vision (2007–2017), the field is still undergoing diversification. Emerging research topics include: Social saliency estimation Multi-agent egocentric vision systems Privacy preserving techniques and applications Attention-based activity analysis Social interaction analysis Hand pose analysis Ego graphical User Interfaces (EUI) Understanding social dynamics and attention Revisiting robotic vision and machine vision as egocentric sensing Activity forecasting Gaze prediction == Technical challenges == Today's wearable cameras are small and lightweight digital recording devices that can acquire images and videos automatically, without the user intervention, with different resolutions and frame rates, and from a first-person point of view. Therefore, wearable cameras are naturally primed to gather visual information from our everyday interactions since they offer an intimate perspective of the visual field of the camera wearer. Depending on the frame rate, it is common to distinguish between photo-cameras (also called lifelogging cameras) and video-cameras. The former (e.g., Narrative Clip and Microsoft SenseCam), are commonly worn on the chest, and are characterized by a very low frame rate (up to 2fpm) that allows to capture images over a long period of time without the need of recharging the battery. Consequently, they offer considerable potential for inferring knowledge about e.g. behaviour patterns, habits or lifestyle of the user. However, due to the low frame-rate and the free motion of the camera, temporally adjacent images typically present abrupt appearance changes so that motion features cannot be reliably estimated. The latter (e.g., Google Glass, GoPro), are commonly mounted on the head, and capture conventional video (around 35fps) that allows to capture fine temporal details of interactions. Consequently, they offer potential for in-depth analysis of daily or special activities. However, since the camera is moving with the wearer head, it becomes more difficult to estimate the global motion of the wearer and in the case of abrupt movements, the images can result blurred. In both cases, since the camera is worn in a naturalistic setting, visual data present a huge variability in terms of illumination conditions and object appearance. Moreover, the camera wearer is not visible in the image and what he/she is doing has to be inferred from the information in the visual field of the camera, implying that important information about the wearer, such for instance as pose or facial expression estimation, is not available. == Applications == A collection of studies published in a special theme issue of the American Journal of Preventive Medicine has demonstrated the potential of lifelogs captured through wearable cameras from a number of viewpoints. In particular, it has been shown that used as a tool for understanding and tracking lifestyle behaviour, lifelogs would enable the prevention of noncommunicable diseases associated to unhealthy trends and risky profiles (such as obesity and depression). In addition, used as a tool of re-memory cognitive training, lifelogs would enable the prevention of cognitive and functional decline in elderly people. More recently, egocentric cameras have been used to study human and animal cognition, human-human social interaction, human-robot interaction, human expertise in complex tasks. Other applications include navigation/assistive technologies for the blind, monitoring and assistance of industrial workflows, and augmented reality interfaces.

    Read more →
  • Quadratic unconstrained binary optimization

    Quadratic unconstrained binary optimization

    Quadratic unconstrained binary optimization (QUBO), also known as unconstrained binary quadratic programming (UBQP), is a combinatorial optimization problem with a wide range of applications from finance and economics to machine learning. QUBO is an NP hard problem, and for many classical problems from theoretical computer science, like maximum cut, graph coloring and the partition problem, embeddings into QUBO have been formulated. Embeddings for machine learning models include support-vector machines, clustering and probabilistic graphical models. Moreover, due to its close connection to Ising models, QUBO constitutes a central problem class for adiabatic quantum computation, where it is solved through a physical process called quantum annealing. == Definition == Let B = { 0 , 1 } {\displaystyle \mathbb {B} =\lbrace 0,1\rbrace } the set of binary digits (or bits), then B n {\displaystyle \mathbb {B} ^{n}} is the set of binary vectors of fixed length n ∈ N {\displaystyle n\in \mathbb {N} } . Given a symmetric or upper triangular matrix Q ∈ R n × n {\displaystyle {\boldsymbol {Q}}\in \mathbb {R} ^{n\times n}} , whose entries Q i j {\displaystyle Q_{ij}} define a weight for each pair of indices i , j ∈ { 1 , … , n } {\displaystyle i,j\in \lbrace 1,\dots ,n\rbrace } , we can define the function f Q : B n → R {\displaystyle f_{\boldsymbol {Q}}:\mathbb {B} ^{n}\rightarrow \mathbb {R} } that assigns a value to each binary vector x {\displaystyle {\boldsymbol {x}}} through f Q ( x ) = x ⊺ Q x = ∑ i = 1 n ∑ j = 1 n Q i j x i x j . {\displaystyle f_{\boldsymbol {Q}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }{\boldsymbol {Qx}}=\sum _{i=1}^{n}\sum _{j=1}^{n}Q_{ij}x_{i}x_{j}.} Alternatively, the linear and quadratic parts can be separated as f Q ′ , q ( x ) = x ⊺ Q ′ x + q ⊺ x , {\displaystyle f_{{\boldsymbol {Q}}',{\boldsymbol {q}}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }{\boldsymbol {Q}}'{\boldsymbol {x}}+{\boldsymbol {q}}^{\intercal }{\boldsymbol {x}},} where Q ′ ∈ R n × n {\displaystyle {\boldsymbol {Q}}'\in \mathbb {R} ^{n\times n}} and q ∈ R n {\displaystyle {\boldsymbol {q}}\in \mathbb {R} ^{n}} . This is equivalent to the previous definition through Q = Q ′ + diag ⁡ [ q ] {\displaystyle {\boldsymbol {Q}}={\boldsymbol {Q}}'+\operatorname {diag} [{\boldsymbol {q}}]} using the diag operator, exploiting that x = x ⋅ x {\displaystyle x=x\cdot x} for all binary values x {\displaystyle x} . Intuitively, the weight Q i j {\displaystyle Q_{ij}} is added if both x i = 1 {\displaystyle x_{i}=1} and x j = 1 {\displaystyle x_{j}=1} . The QUBO problem consists of finding a binary vector x ∗ {\displaystyle {\boldsymbol {x}}^{}} that minimizes f Q {\displaystyle f_{\boldsymbol {Q}}} , i.e., ∀ x ∈ B n : f Q ( x ∗ ) ≤ f Q ( x ) {\displaystyle \forall {\boldsymbol {x}}\in \mathbb {B} ^{n}:~f_{\boldsymbol {Q}}({\boldsymbol {x}}^{})\leq f_{\boldsymbol {Q}}({\boldsymbol {x}})} . In general, x ∗ {\displaystyle {\boldsymbol {x}}^{}} is not unique, meaning there may be a set of minimizing vectors with equal value w.r.t. f Q {\displaystyle f_{\boldsymbol {Q}}} . The complexity of QUBO arises from the number of candidate binary vectors to be evaluated, as | B n | = 2 n {\displaystyle \left|\mathbb {B} ^{n}\right|=2^{n}} grows exponentially in n {\displaystyle n} . Sometimes, QUBO is defined as the problem of maximizing f Q {\displaystyle f_{\boldsymbol {Q}}} , which is equivalent to minimizing f − Q = − f Q {\displaystyle f_{-{\boldsymbol {Q}}}=-f_{\boldsymbol {Q}}} . == Properties == QUBO is scale invariant for positive factors α > 0 {\displaystyle \alpha >0} , which leave the optimum x ∗ {\displaystyle {\boldsymbol {x}}^{}} unchanged: f α Q ( x ) = x ⊺ ( α Q ) x = α ( x ⊺ Q x ) = α f Q ( x ) {\displaystyle f_{\alpha {\boldsymbol {Q}}}({\boldsymbol {x}})={\boldsymbol {x}}^{\intercal }(\alpha {\boldsymbol {Q}}){\boldsymbol {x}}=\alpha ({\boldsymbol {x}}^{\intercal }{\boldsymbol {Qx}})=\alpha f_{\boldsymbol {Q}}({\boldsymbol {x}})} . In its general form, QUBO is NP-hard and cannot be solved efficiently by any known polynomial-time algorithm. However, there are polynomially-solvable special cases, where Q {\displaystyle {\boldsymbol {Q}}} has certain properties, for example: If all coefficients are positive, the optimum is trivially x ∗ = ( 0 , … , 0 ) ⊺ {\displaystyle {\boldsymbol {x}}^{}=(0,\dots ,0)^{\intercal }} . Similarly, if all coefficients are negative, the optimum is x ∗ = ( 1 , … , 1 ) ⊺ {\displaystyle {\boldsymbol {x}}^{}=(1,\dots ,1)^{\intercal }} . If Q {\displaystyle {\boldsymbol {Q}}} is diagonal, the bits can be optimized independently, and the problem is solvable in O ( n ) {\displaystyle {\mathcal {O}}(n)} . The optimal variable assignments are simply x i ∗ = 1 {\displaystyle x_{i}^{}=1} if Q i i < 0 {\displaystyle Q_{ii}<0} , and x i ∗ = 0 {\displaystyle x_{i}^{}=0} otherwise. If all off-diagonal elements of Q {\displaystyle {\boldsymbol {Q}}} are non-positive, the corresponding QUBO problem is solvable in polynomial time. QUBO can be solved using integer linear programming solvers like CPLEX or Gurobi Optimizer. This is possible since QUBO can be reformulated as a linear constrained binary optimization problem. To achieve this, substitute the product x i x j {\displaystyle x_{i}x_{j}} by an additional binary variable z i j ∈ B {\displaystyle z_{ij}\in \mathbb {B} } and add the constraints x i ≥ z i j {\displaystyle x_{i}\geq z_{ij}} , x j ≥ z i j {\displaystyle x_{j}\geq z_{ij}} and x i + x j − 1 ≤ z i j {\displaystyle x_{i}+x_{j}-1\leq z_{ij}} . Note that z i j {\displaystyle z_{ij}} can also be relaxed to continuous variables within the bounds zero and one. == Applications == QUBO is a structurally simple, yet computationally hard optimization problem. It can be used to encode a wide range of optimization problems from various scientific areas. === Maximum Cut === Given a graph G = ( V , E ) {\displaystyle G=(V,E)} with vertex set V = { 1 , … , n } {\displaystyle V=\lbrace 1,\dots ,n\rbrace } and edges E ⊆ V × V {\displaystyle E\subseteq V\times V} , the maximum cut (max-cut) problem consists of finding two subsets S , T ⊆ V {\displaystyle S,T\subseteq V} with T = V ∖ S {\displaystyle T=V\setminus S} , such that the number of edges between S {\displaystyle S} and T {\displaystyle T} is maximized. The more general weighted max-cut problem assumes edge weights w i j ≥ 0 ∀ i , j ∈ V {\displaystyle w_{ij}\geq 0~\forall i,j\in V} , with ( i , j ) ∉ E ⇒ w i j = 0 {\displaystyle (i,j)\notin E\Rightarrow w_{ij}=0} , and asks for a partition S , T ⊆ V {\displaystyle S,T\subseteq V} that maximizes the sum of edge weights between S {\displaystyle S} and T {\displaystyle T} , i.e., max S ⊆ V ∑ i ∈ S , j ∉ S w i j . {\displaystyle \max _{S\subseteq V}\sum _{i\in S,j\notin S}w_{ij}.} By setting w i j = 1 {\displaystyle w_{ij}=1} for all ( i , j ) ∈ E {\displaystyle (i,j)\in E} this becomes equivalent to the original max-cut problem above, which is why we focus on this more general form in the following. For every vertex in i ∈ V {\displaystyle i\in V} we introduce a binary variable x i {\displaystyle x_{i}} with the interpretation x i = 0 {\displaystyle x_{i}=0} if i ∈ S {\displaystyle i\in S} and x i = 1 {\displaystyle x_{i}=1} if i ∈ T {\displaystyle i\in T} . As T = V ∖ S {\displaystyle T=V\setminus S} , every i {\displaystyle i} is in exactly one set, meaning there is a 1:1 correspondence between binary vectors x ∈ B n {\displaystyle {\boldsymbol {x}}\in \mathbb {B} ^{n}} and partitions of V {\displaystyle V} into two subsets. We observe that, for any i , j ∈ V {\displaystyle i,j\in V} , the expression x i ( 1 − x j ) + ( 1 − x i ) x j {\displaystyle x_{i}(1-x_{j})+(1-x_{i})x_{j}} evaluates to 1 if and only if i {\displaystyle i} and j {\displaystyle j} are in different subsets, equivalent to logical XOR. Let W ∈ R + n × n {\displaystyle {\boldsymbol {W}}\in \mathbb {R} _{+}^{n\times n}} with W i j = w i j ∀ i , j ∈ V {\displaystyle W_{ij}=w_{ij}~\forall i,j\in V} . By extending above expression to matrix-vector form we find that x ⊺ W ( 1 − x ) + ( 1 − x ) ⊺ W x = − 2 x ⊺ W x + ( W 1 + W ⊺ 1 ) ⊺ x {\displaystyle {\boldsymbol {x}}^{\intercal }{\boldsymbol {W}}({\boldsymbol {1}}-{\boldsymbol {x}})+({\boldsymbol {1}}-{\boldsymbol {x}})^{\intercal }{\boldsymbol {Wx}}=-2{\boldsymbol {x}}^{\intercal }{\boldsymbol {Wx}}+({\boldsymbol {W1}}+{\boldsymbol {W}}^{\intercal }{\boldsymbol {1}})^{\intercal }{\boldsymbol {x}}} is the sum of weights of all edges between S {\displaystyle S} and T {\displaystyle T} , where 1 = ( 1 , 1 , … , 1 ) ⊺ ∈ R n {\displaystyle {\boldsymbol {1}}=(1,1,\dots ,1)^{\intercal }\in \mathbb {R} ^{n}} . As this is a quadratic function over x {\displaystyle {\boldsymbol {x}}} , it is a QUBO problem whose parameter matrix we can read from above expression as Q = 2 W − diag ⁡ [ W 1 + W ⊺ 1 ] , {\displaystyle {\boldsymbol {Q}}=2{\boldsymbol {W}}-\operatorname {diag} [{\boldsymbol {W1}}+{\boldsymbol {W}}^{\intercal }{\bol

    Read more →
  • Huber loss

    Huber loss

    In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used. == Definition == The Huber loss function describes the penalty incurred by an estimation procedure f. Huber (1964) defines the loss function piecewise by L δ ( a ) = { 1 2 a 2 for | a | ≤ δ , δ ⋅ ( | a | − 1 2 δ ) , otherwise. {\displaystyle L_{\delta }(a)={\begin{cases}{\frac {1}{2}}{a^{2}}&{\text{for }}|a|\leq \delta ,\\[4pt]\delta \cdot \left(|a|-{\frac {1}{2}}\delta \right),&{\text{otherwise.}}\end{cases}}} This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where | a | = δ {\displaystyle |a|=\delta } . The variable a often refers to the residuals, that is to the difference between the observed and predicted values a = y − f ( x ) {\displaystyle a=y-f(x)} , so the former can be expanded to L δ ( y , f ( x ) ) = { 1 2 ( y − f ( x ) ) 2 for | y − f ( x ) | ≤ δ , δ ⋅ ( | y − f ( x ) | − 1 2 δ ) , otherwise. {\displaystyle L_{\delta }(y,f(x))={\begin{cases}{\frac {1}{2}}{\left(y-f(x)\right)}^{2}&{\text{for }}\left|y-f(x)\right|\leq \delta ,\\[4pt]\delta \ \cdot \left(\left|y-f(x)\right|-{\frac {1}{2}}\delta \right),&{\text{otherwise.}}\end{cases}}} The Huber loss is the convolution of the absolute value function with the rectangular function, scaled and translated. Thus it "smoothens out" the former's corner at the origin. == Motivation == Two very commonly used loss functions are the squared loss, L ( a ) = a 2 {\displaystyle L(a)=a^{2}} , and the absolute loss, L ( a ) = | a | {\displaystyle L(a)=|a|} . The squared loss function results in an arithmetic mean-unbiased estimator, and the absolute-value loss function results in a median-unbiased estimator (in the one-dimensional case, and a geometric median-unbiased estimator for the multi-dimensional case). The squared loss has the disadvantage that it has the tendency to be dominated by outliers—when summing over a set of a {\displaystyle a} 's (as in ∑ i = 1 n L ( a i ) {\textstyle \sum _{i=1}^{n}L(a_{i})} ), the sample mean is influenced too much by a few particularly large a {\displaystyle a} -values when the distribution is heavy tailed: in terms of estimation theory, the asymptotic relative efficiency of the mean is poor for heavy-tailed distributions. As defined above, the Huber loss function is strongly convex in a uniform neighborhood of its minimum a = 0 {\displaystyle a=0} ; at the boundary of this uniform neighborhood, the Huber loss function has a differentiable extension to an affine function at points a = − δ {\displaystyle a=-\delta } and a = δ {\displaystyle a=\delta } . These properties allow it to combine much of the sensitivity of the mean-unbiased, minimum-variance estimator of the mean (using the quadratic loss function) and the robustness of the median-unbiased estimator (using the absolute value function). == Pseudo-Huber loss function == The Pseudo-Huber loss function can be used as a smooth approximation of the Huber loss function. It combines the best properties of L2 squared loss and L1 absolute loss by being strongly convex when close to the target/minimum and less steep for extreme values. The scale at which the Pseudo-Huber loss function transitions from L2 loss for values close to the minimum to L1 loss for extreme values and the steepness at extreme values can be controlled by the δ {\displaystyle \delta } value. The Pseudo-Huber loss function ensures that derivatives are continuous for all degrees. It is defined as L δ ( a ) = δ 2 ( 1 + ( a / δ ) 2 − 1 ) . {\displaystyle L_{\delta }(a)=\delta ^{2}\left({\sqrt {1+(a/\delta )^{2}}}-1\right).} As such, this function approximates a 2 / 2 {\displaystyle a^{2}/2} for small values of a {\displaystyle a} , and approximates a straight line with slope δ {\displaystyle \delta } for large values of a {\displaystyle a} . While the above is the most common form, other smooth approximations of the Huber loss function also exist. == Variant for classification == For classification purposes, a variant of the Huber loss called modified Huber is sometimes used. Given a prediction f ( x ) {\displaystyle f(x)} (a real-valued classifier score) and a true binary class label y ∈ { + 1 , − 1 } {\displaystyle y\in \{+1,-1\}} , the modified Huber loss is defined as L ( y , f ( x ) ) = { max ( 0 , 1 − y f ( x ) ) 2 for y f ( x ) > − 1 , − 4 y f ( x ) otherwise. {\displaystyle L(y,f(x))={\begin{cases}\max(0,1-y\,f(x))^{2}&{\text{for }}\,\,y\,f(x)>-1,\\[4pt]-4y\,f(x)&{\text{otherwise.}}\end{cases}}} The term max ( 0 , 1 − y f ( x ) ) {\displaystyle \max(0,1-y\,f(x))} is the hinge loss used by support vector machines; the quadratically smoothed hinge loss is a generalization of L {\displaystyle L} . == Applications == The Huber loss function is used in robust statistics, M-estimation and additive modelling.

    Read more →