Biohybrid microswimmer

Biohybrid microswimmer

A biohybrid microswimmer also known as biohybrid nanorobot, can be defined as a microswimmer that consist of both biological and artificial constituents, for instance, one or several living microorganisms attached to one or various synthetic parts. In recent years nanoscopic and mesoscopic objects have been designed to collectively move through direct inspiration from nature or by harnessing its existing tools. Small mesoscopic to nanoscopic systems typically operate at low Reynolds numbers (Re ≪ 1), and understanding their motion becomes challenging. For locomotion to occur, the symmetry of the system must be broken. In addition, collective motion requires a coupling mechanism between the entities that make up the collective. To develop mesoscopic to nanoscopic entities capable of swarming behaviour, it has been hypothesised that the entities are characterised by broken symmetry with a well-defined morphology, and are powered with some material capable of harvesting energy. If the harvested energy results in a field surrounding the object, then this field can couple with the field of a neighbouring object and bring some coordination to the collective behaviour. Such robotic swarms have been categorised by an online expert panel as among the 10 great unresolved group challenges in the area of robotics. Although investigation of their underlying mechanism of action is still in its infancy, various systems have been developed that are capable of undergoing controlled and uncontrolled swarming motion by harvesting energy (e.g., light, thermal, etc.). Over the past decade, biohybrid microrobots, in which living mobile microorganisms are physically integrated with untethered artificial structures, have gained growing interest to enable the active locomotion and cargo delivery to a target destination. In addition to the motility, the intrinsic capabilities of sensing and eliciting an appropriate response to artificial and environmental changes make cell-based biohybrid microrobots appealing for transportation of cargo to the inaccessible cavities of the human body for local active delivery of diagnostic and therapeutic agents. == Background == Biohybrid microswimmers can be defined as microswimmers that consist of both biological and artificial constituents, for instance, one or several living microorganisms attached to one or various synthetic parts. The pioneers of this field, ahead of their time, were Montemagno and Bachand with a 1999 work regarding specific attachment strategies of biological molecules to nanofabricated substrates enabling the preparation of hybrid inorganic/organic nanoelectromechanical systems, so called NEMS. They described the production of large amounts of F1-ATPase from the thermophilic bacteria Bacillus PS3 for the preparation of F1-ATPase biomolecular motors immobilized on a nanoarray pattern of gold, copper or nickel produced by electron beam lithography. These proteins were attached to one micron microspheres tagged with a synthetic peptide. Consequently, they accomplished the preparation of a platform with chemically active sites and the development of biohybrid devices capable of converting energy of biomolecular motors into useful work. One of the most fundamental questions in science is what defines life. Collective motion is one of the hallmarks of life. This is commonly observed in nature at various dimensional levels as energized entities gather, in a concerted effort, into motile aggregated patterns. These motile aggregated events can be noticed, among many others, as dynamic swarms; e.g., unicellular organisms such as bacteria, locust swarms, or the flocking behaviour of birds. Ever since Newton established his equations of motion, the mystery of motion on the microscale has emerged frequently in scientific history, as famously demonstrated by a couple of articles that should be discussed briefly. First, an essential concept, popularized by Osborne Reynolds, is that the relative importance of inertia and viscosity for the motion of a fluid depends on certain details of the system under consideration. The Reynolds number Re, named in his honor, quantifies this comparison as a dimensionless ratio of characteristic inertial and viscous forces: R e = ρ u l μ {\displaystyle \mathrm {Re} ={\frac {\rho ul}{\mu }}} Here, ρ represents the density of the fluid; u is a characteristic velocity of the system (for instance, the velocity of a swimming particle); l is a characteristic length scale (e.g., the swimmer size); and μ is the viscosity of the fluid. Taking the suspending fluid to be water, and using experimentally observed values for u, one can determine that inertia is important for macroscopic swimmers like fish (Re = 100), while viscosity dominates the motion of microscale swimmers like bacteria (Re = 10−4). The overwhelming importance of viscosity for swimming at the micrometer scale has profound implications for swimming strategy. This has been discussed memorably by E. M. Purcell, who invited the reader into the world of microorganisms and theoretically studied the conditions of their motion. In the first place, propulsion strategies of large scale swimmers often involve imparting momentum to the surrounding fluid in periodic discrete events, such as vortex shedding, and coasting between these events through inertia. This cannot be effective for microscale swimmers like bacteria: due to the large viscous damping, the inertial coasting time of a micron-sized object is on the order of 1 μs. The coasting distance of a microorganism moving at a typical speed is about 0.1 angstroms (Å). Purcell concluded that only forces that are exerted in the present moment on a microscale body contribute to its propulsion, so a constant energy conversion method is essential. Microorganisms have optimized their metabolism for continuous energy production, while purely artificial microswimmers (microrobots) must obtain energy from the environment, since their on-board-storage-capacity is very limited. As a further consequence of the continuous dissipation of energy, biological and artificial microswimmers do not obey the laws of equilibrium statistical physics, and need to be described by non-equilibrium dynamics. Mathematically, Purcell explored the implications of low Reynolds number by taking the Navier-Stokes equation and eliminating the inertial terms: μ ∇ 2 u − ∇ p = 0 {\displaystyle {\begin{aligned}\mu \nabla ^{2}\mathbf {u} -{\boldsymbol {\nabla }}p&={\boldsymbol {0}}\\\end{aligned}}} where u {\displaystyle \mathbf {u} } is the velocity of the fluid and ∇ p {\displaystyle {\boldsymbol {\nabla }}p} is the gradient of the pressure. As Purcell noted, the resulting equation — the Stokes equation — contains no explicit time dependence. This has some important consequences for how a suspended body (e.g., a bacterium) can swim through periodic mechanical motions or deformations (e.g., of a flagellum). First, the rate of motion is practically irrelevant for the motion of the microswimmer and of the surrounding fluid: changing the rate of motion will change the scale of the velocities of the fluid and of the microswimmer, but it will not change the pattern of fluid flow. Secondly, reversing the direction of mechanical motion will simply reverse all velocities in the system. These properties of the Stokes equation severely restrict the range of feasible swimming strategies. Recent publications of biohybrid microswimmers include the use of sperm cells, contractive muscle cells, and bacteria as biological components, as they can efficiently convert chemical energy into movement, and additionally are capable of performing complicated motion depending on environmental conditions. In this sense, biohybrid microswimmer systems can be described as the combination of different functional components: cargo and carrier. The cargo is an element of interest to be moved (and possibly released) in a customized way. The carrier is the component responsible for the movement of the biohybrid, transporting the desired cargo, which is linked to its surface. The great majority of these systems rely on biological motile propulsion for the transportation of synthetic cargo for targeted drug delivery/ There are also examples of the opposite case: artificial microswimmers with biological cargo systems. Over the past decade, biohybrid microrobots, in which living mobile microorganisms are physically integrated with untethered artificial structures, have gained growing interest to enable the active locomotion and cargo delivery to a target destination. In addition to the motility, the intrinsic capabilities of sensing and eliciting an appropriate response to artificial and environmental changes make cell-based biohybrid microrobots appealing for transportation of cargo to the inaccessible cavities of the human body for local active delivery of diagnostic and therapeutic agents. Active locomotion, targeting and steering of concentrated therape

Multi-scale approaches

The scale space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties, scale-space axioms, which make it into a special form of multi-scale representation. There are, however, also other types of "multi-scale approaches" in the areas of computer vision, image processing and signal processing, in particular the notion of wavelets. The purpose of this article is to describe a few of these approaches: == Scale-space theory for one-dimensional signals == For one-dimensional signals, there exists quite a well-developed theory for continuous and discrete kernels that guarantee that new local extrema or zero-crossings cannot be created by a convolution operation. For continuous signals, it holds that all scale-space kernels can be decomposed into the following sets of primitive smoothing kernels: the Gaussian kernel : g ( x , t ) = 1 2 π t exp ⁡ ( − x 2 / 2 t ) {\displaystyle g(x,t)={\frac {1}{\sqrt {2\pi t}}}\exp({-x^{2}/2t})} where t > 0 {\displaystyle t>0} , truncated exponential kernels (filters with one real pole in the s-plane): h ( x ) = exp ⁡ ( − a x ) {\displaystyle h(x)=\exp({-ax})} if x ≥ 0 {\displaystyle x\geq 0} and 0 otherwise where a > 0 {\displaystyle a>0} h ( x ) = exp ⁡ ( b x ) {\displaystyle h(x)=\exp({bx})} if x ≤ 0 {\displaystyle x\leq 0} and 0 otherwise where b > 0 {\displaystyle b>0} , translations, rescalings. For discrete signals, we can, up to trivial translations and rescalings, decompose any discrete scale-space kernel into the following primitive operations: the discrete Gaussian kernel T ( n , t ) = I n ( α t ) {\displaystyle T(n,t)=I_{n}(\alpha t)} where α , t > 0 {\displaystyle \alpha ,t>0} where I n {\displaystyle I_{n}} are the modified Bessel functions of integer order, generalized binomial kernels corresponding to linear smoothing of the form f o u t ( x ) = p f i n ( x ) + q f i n ( x − 1 ) {\displaystyle f_{out}(x)=pf_{in}(x)+qf_{in}(x-1)} where p , q > 0 {\displaystyle p,q>0} f o u t ( x ) = p f i n ( x ) + q f i n ( x + 1 ) {\displaystyle f_{out}(x)=pf_{in}(x)+qf_{in}(x+1)} where p , q > 0 {\displaystyle p,q>0} , first-order recursive filters corresponding to linear smoothing of the form f o u t ( x ) = f i n ( x ) + α f o u t ( x − 1 ) {\displaystyle f_{out}(x)=f_{in}(x)+\alpha f_{out}(x-1)} where α > 0 {\displaystyle \alpha >0} f o u t ( x ) = f i n ( x ) + β f o u t ( x + 1 ) {\displaystyle f_{out}(x)=f_{in}(x)+\beta f_{out}(x+1)} where β > 0 {\displaystyle \beta >0} , the one-sided Poisson kernel p ( n , t ) = e − t t n n ! {\displaystyle p(n,t)=e^{-t}{\frac {t^{n}}{n!}}} for n ≥ 0 {\displaystyle n\geq 0} where t ≥ 0 {\displaystyle t\geq 0} p ( n , t ) = e − t t − n ( − n ) ! {\displaystyle p(n,t)=e^{-t}{\frac {t^{-n}}{(-n)!}}} for n ≤ 0 {\displaystyle n\leq 0} where t ≥ 0 {\displaystyle t\geq 0} . From this classification, it is apparent that we require a continuous semi-group structure, there are only three classes of scale-space kernels with a continuous scale parameter; the Gaussian kernel which forms the scale-space of continuous signals, the discrete Gaussian kernel which forms the scale-space of discrete signals and the time-causal Poisson kernel that forms a temporal scale-space over discrete time. If we on the other hand sacrifice the continuous semi-group structure, there are more options: For discrete signals, the use of generalized binomial kernels provides a formal basis for defining the smoothing operation in a pyramid. For temporal data, the one-sided truncated exponential kernels and the first-order recursive filters provide a way to define time-causal scale-spaces that allow for efficient numerical implementation and respect causality over time without access to the future. The first-order recursive filters also provide a framework for defining recursive approximations to the Gaussian kernel that in a weaker sense preserve some of the scale-space properties.

GeWorkbench

geWorkbench (genomics Workbench) is an open-source software platform for integrated genomic data analysis. It is a desktop application written in the programming language Java. geWorkbench uses a component architecture. As of 2016, there are more than 70 plug-ins available, providing for the visualization and analysis of gene expression, sequence, and structure data. geWorkbench is the Bioinformatics platform of MAGNet, the National Center for the Multi-scale Analysis of Genomic and Cellular Networks, one of the 8 National Centers for Biomedical Computing funded through the NIH Roadmap (NIH Common Fund). Many systems and structure biology tools developed by MAGNet investigators are available as geWorkbench plugins. == Features == Computational analysis tools such as t-test, hierarchical clustering, self-organizing maps, regulatory network reconstruction, BLAST searches, pattern-motif discovery, protein structure prediction, structure-based protein annotation, etc. Visualization of gene expression (heatmaps, volcano plot), molecular interaction networks (through Cytoscape), protein sequence and protein structure data (e.g., MarkUs). Integration of gene and pathway annotation information from curated sources as well as through Gene Ontology enrichment analysis. Component integration through platform management of inputs and outputs. Among data that can be shared between components are expression datasets, interaction networks, sample and marker (gene) sets and sequences. Dataset history tracking - complete record of data sets used and input settings. Integration with 3rd party tools such as GenePattern, Cytoscape, and Genomespace. Demonstrations of each feature described can be found at GeWorkbench-web Tutorials. == Versions == geWorkbench is open-source software that can be downloaded and installed locally. A zip file of the released version Java source is also available. Prepackaged installer versions also exist for Windows, Macintosh, and Linux.

Variational autoencoder

In machine learning, a variational autoencoder (VAE) is an artificial neural network architecture introduced by Diederik P. Kingma and Max Welling in 2013. It is part of the families of probabilistic graphical models and variational Bayesian methods. In addition to being seen as an autoencoder neural network architecture, variational autoencoders can also be studied within the mathematical formulation of variational Bayesian methods, connecting a neural encoder network to its decoder through a probabilistic latent space (for example, as a multivariate Gaussian distribution) that corresponds to the parameters of a variational distribution. Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution within the latent space, rather than to a single point in that space. The decoder has the opposite function, which is to map from the latent space to the input space, again according to a distribution (although in practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of a single point, the network can avoid overfitting the training data. Both networks are typically trained together with the usage of the reparameterization trick, although the variance of the noise model can be learned separately. Although this type of model was initially designed for unsupervised learning, its effectiveness has been proven for semi-supervised learning and supervised learning. == Overview of architecture and operation == A variational autoencoder is a generative model with a prior and noise distribution respectively. Usually such models are trained using the expectation-maximization meta-algorithm (e.g. probabilistic PCA, (spike & slab) sparse coding). Such a scheme optimizes a lower bound of the data likelihood, which is usually computationally intractable, and in doing so requires the discovery of q-distributions, or variational posteriors. These q-distributions are normally parameterized for each individual data point in a separate optimization process. However, variational autoencoders use a neural network as an amortized approach to jointly optimize across data points. In that way, the same parameters are reused for multiple data points, which can result in massive memory savings. The first neural network takes as input the data points themselves, and outputs parameters for the variational distribution. As it maps from a known input space to the low-dimensional latent space, it is called the encoder. The decoder is the second neural network of this model. It is a function that maps from the latent space to the input space, e.g. as the means of the noise distribution. It is possible to use another neural network that maps to the variance, however this can be omitted for simplicity. In such a case, the variance can be optimized with gradient descent. To optimize this model, one needs to know two terms: the "reconstruction error", and the Kullback–Leibler divergence (KL-D). Both terms are derived from the free energy expression of the probabilistic model, and therefore differ depending on the noise distribution and the assumed prior of the data, here referred to as p-distribution. For example, a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed noise; however, tasks such as binarized MNIST require a Bernoulli noise. The KL-D from the free energy expression maximizes the probability mass of the q-distribution that overlaps with the p-distribution, which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of the free energy expression, and requires a sampling approximation to compute its expectation value. More recent approaches replace Kullback–Leibler divergence (KL-D) with various statistical distances, see "Statistical distance VAE variants" below. == Formulation == From the point of view of probabilistic modeling, one wants to maximize the likelihood of the data x {\displaystyle x} by their chosen parameterized probability distribution p θ ( x ) = p ( x | θ ) {\displaystyle p_{\theta }(x)=p(x|\theta )} . This distribution is usually chosen to be a Gaussian N ( x | μ , σ ) {\displaystyle N(x|\mu ,\sigma )} which is parameterized by μ {\displaystyle \mu } and σ {\displaystyle \sigma } respectively, and as a member of the exponential family it is easy to work with as a noise distribution. Simple distributions are easy enough to maximize, however distributions where a prior is assumed over the latents z {\displaystyle z} results in intractable integrals. Let us find p θ ( x ) {\displaystyle p_{\theta }(x)} via marginalizing over z {\displaystyle z} . p θ ( x ) = ∫ z p θ ( x , z ) d z , {\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }({x,z})\,dz,} where p θ ( x , z ) {\displaystyle p_{\theta }({x,z})} represents the joint distribution under p θ {\displaystyle p_{\theta }} of the observable data x {\displaystyle x} and its latent representation or encoding z {\displaystyle z} . According to the chain rule, the equation can be rewritten as p θ ( x ) = ∫ z p θ ( x | z ) p θ ( z ) d z {\displaystyle p_{\theta }(x)=\int _{z}p_{\theta }({x|z})p_{\theta }(z)\,dz} In the vanilla variational autoencoder, z {\displaystyle z} is usually taken to be a finite-dimensional vector of real numbers, and p θ ( x | z ) {\displaystyle p_{\theta }({x|z})} to be a Gaussian distribution. Then p θ ( x ) {\displaystyle p_{\theta }(x)} is a mixture of Gaussian distributions. It is now possible to define the set of the relationships between the input data and its latent representation as Prior p θ ( z ) {\displaystyle p_{\theta }(z)} Likelihood p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} Posterior p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} Unfortunately, the computation of p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} is expensive and in most cases intractable. To speed up the calculus to make it feasible, it is necessary to introduce a further function to approximate the posterior distribution as q ϕ ( z | x ) ≈ p θ ( z | x ) {\displaystyle q_{\phi }({z|x})\approx p_{\theta }({z|x})} with ϕ {\displaystyle \phi } defined as the set of real values that parametrize q {\displaystyle q} . This is sometimes called amortized inference, since by "investing" in finding a good q ϕ {\displaystyle q_{\phi }} , one can later infer z {\displaystyle z} from x {\displaystyle x} quickly without doing any integrals. In this way, the problem is to find a good probabilistic autoencoder, in which the conditional likelihood distribution p θ ( x | z ) {\displaystyle p_{\theta }(x|z)} is computed by the probabilistic decoder, and the approximated posterior distribution q ϕ ( z | x ) {\displaystyle q_{\phi }(z|x)} is computed by the probabilistic encoder. Parametrize the encoder as E ϕ {\displaystyle E_{\phi }} , and the decoder as D θ {\displaystyle D_{\theta }} . == Evidence lower bound (ELBO) == Like many deep learning approaches that use gradient-based optimization, VAEs require a differentiable loss function to update the network weights through backpropagation. For variational autoencoders, the idea is to jointly optimize the generative model parameters θ {\displaystyle \theta } to reduce the reconstruction error between the input and the output, and ϕ {\displaystyle \phi } to make q ϕ ( z | x ) {\displaystyle q_{\phi }({z|x})} as close as possible to p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} . As reconstruction loss, mean squared error and cross entropy are often used. The Kullback–Leibler divergence D K L ( q ϕ ( z | x ) ∥ p θ ( z | x ) ) {\displaystyle D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))} can be used as a loss function to squeeze q ϕ ( z | x ) {\displaystyle q_{\phi }({z|x})} under p θ ( z | x ) {\displaystyle p_{\theta }(z|x)} . This divergence loss expands to D K L ( q ϕ ( z | x ) ∥ p θ ( z | x ) ) = E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( z | x ) ] = E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( x ) p θ ( x , z ) ] = ln ⁡ p θ ( x ) + E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ q ϕ ( z | x ) p θ ( x , z ) ] . {\displaystyle {\begin{aligned}D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }(z|x)}{p_{\theta }(z|x)}}\right]\\&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})p_{\theta }(x)}{p_{\theta }(x,z)}}\right]\\&=\ln p_{\theta }(x)+\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})}{p_{\theta }(x,z)}}\right].\end{aligned}}} Now, define the evidence lower bound (ELBO): L θ , ϕ ( x ) := E z ∼ q ϕ ( ⋅ | x ) [ ln ⁡ p θ ( x , z ) q ϕ ( z | x ) ] = ln ⁡ p θ ( x ) − D K L ( q ϕ ( ⋅ | x ) ∥ p θ ( ⋅ | x ) ) {\displaystyle L_{\theta ,\phi }(x):=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\ln p_{\theta }(x)-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }({\cdot |x}))} Maximizing the ELBO θ ∗ , ϕ ∗ = argmax θ , ϕ L θ , ϕ ( x ) {\dis

Winner-take-all (computing)

Winner-take-all is a computational principle applied in computational models of neural networks by which neurons compete with each other for activation. In the classical form, only the neuron with the highest activation stays active while all other neurons shut down; however, other variations allow more than one neuron to be active, for example the soft winner take-all, by which a power function is applied to the neurons. == Neural networks == In the theory of artificial neural networks, winner-take-all networks are a case of competitive learning in recurrent neural networks. Output nodes in the network mutually inhibit each other, while simultaneously activating themselves through reflexive connections. After some time, only one node in the output layer will be active, namely the one corresponding to the strongest input. Thus the network uses nonlinear inhibition to pick out the largest of a set of inputs. Winner-take-all is a general computational primitive that can be implemented using different types of neural network models, including both continuous-time and spiking networks. Winner-take-all networks are commonly used in computational models of the brain, particularly for distributed decision-making or action selection in the cortex. Important examples include hierarchical models of vision, and models of selective attention and recognition. They are also common in artificial neural networks and neuromorphic analog VLSI circuits. It has been formally proven that the winner-take-all operation is computationally powerful compared to other nonlinear operations, such as thresholding. In many practical cases, there is not only one single neuron which becomes active but there are exactly k neurons which become active for a fixed number k. This principle is referred to as k-winners-take-all. === Example algorithm === Consider a single linear neuron, with inputs x 1 , … , x n {\displaystyle x_{1},\dots ,x_{n}} . Each input has weight w i {\displaystyle w_{i}} , and the output of the neuron is ∑ i w i x i {\displaystyle \sum _{i}w_{i}x_{i}} . In the Instar learning rule, on each input vector, the weight vectors are modified according to Δ w i = η ( x i − w i ) {\displaystyle \Delta w_{i}=\eta (x_{i}-w_{i})} where η {\displaystyle \eta } is the learning rate. This rule is unsupervised, since we need just the input vector, not a reference output. Now, consider multiple linear neurons y 1 , … , y m {\displaystyle y_{1},\dots ,y_{m}} . The output of each satisfies y i = ∑ j w i j x j {\displaystyle y_{i}=\sum _{j}w_{ij}x_{j}} . In the winner-take-all algorithm, the weights are modified as follows. Given an input vector x {\displaystyle x} , each output is computed. The neuron with the largest output is selected, and the weights going into that neuron are modified according to the Instar learning rule. All other weights remain unchanged. The k-winners-take-all rule is similar, except that the Instar learning rule is applied to the weights going into the k neurons with the largest outputs. == Circuit example == A simple, but popular CMOS winner-take-all circuit is shown on the right. This circuit was originally proposed by Lazzaro et al. (1989) using MOS transistors biased to operate in the weak-inversion or subthreshold regime. In the particular case shown there are only two inputs (IIN,1 and IIN,2), but the circuit can be easily extended to multiple inputs in a straightforward way. It operates on continuous-time input signals (currents) in parallel, using only two transistors per input. In addition, the bias current IBIAS is set by a single global transistor that is common to all the inputs. The largest of the input currents sets the common potential VC. As a result, the corresponding output carries almost all the bias current, while the other outputs have currents that are close to zero. Thus, the circuit selects the larger of the two input currents, i.e., if IIN,1 > IIN,2, we get IOUT,1 = IBIAS and IOUT,2 = 0. Similarly, if IIN,2 > IIN,1, we get IOUT,1 = 0 and IOUT,2 = IBIAS. A SPICE-based DC simulation of the CMOS winner-take-all circuit in the two-input case is shown on the right. As shown in the top subplot, the input IIN,1 was fixed at 6nA, while IIN,2 was linearly increased from 0 to 10nA. The bottom subplot shows the two output currents. As expected, the output corresponding to the larger of the two inputs carries the entire bias current (10nA in this case), forcing the other output current nearly to zero. == Other uses == In stereo matching algorithms, following the taxonomy proposed by Scharstein and Szelliski, winner-take-all is a local method for disparity computation. Adopting a winner-take-all strategy, the disparity associated with the minimum or maximum cost value is selected at each pixel. It is axiomatic that in the electronic commerce market, early dominant players such as AOL or Yahoo! get most of the rewards. By 1998, one study found the top 5% of all web sites garnered more than 74% of all traffic. The winner-take-all hypothesis in economics suggests that once a technology or a firm gets ahead, it will do better and better over time, whereas lagging technology and firms will fall further behind. See First-mover advantage.

Grammar systems theory

Grammar systems theory is a field of theoretical computer science that studies systems of finite collections of formal grammars generating a formal language. Each grammar works on a string, a so-called sequential form that represents an environment. Grammar systems can thus be used as a formalization of decentralized or distributed systems of agents in artificial intelligence. Let A {\displaystyle \mathbb {A} } be a simple reactive agent moving on the table and trying not to fall down from the table with two reactions, t for turning and ƒ for moving forward. The set of possible behaviors of A {\displaystyle \mathbb {A} } can then be described as formal language L A = { ( f m t n f r ) + : 1 ≤ m ≤ k ; 1 ≤ n ≤ ℓ ; 1 ≤ r ≤ k } , {\displaystyle \mathbb {L_{A}} =\{(f^{m}t^{n}f^{r})^{+}:1\leq m\leq k;1\leq n\leq \ell ;1\leq r\leq k\},} where ƒ can be done maximally k times and t can be done maximally ℓ times considering the dimensions of the table. Let G A {\displaystyle \mathbb {G_{A}} } be a formal grammar which generates language L A {\displaystyle \mathbb {L_{A}} } . The behavior of A {\displaystyle \mathbb {A} } is then described by this grammar. Suppose the A {\displaystyle \mathbb {A} } has a subsumption architecture; each component of this architecture can be then represented as a formal grammar, too, and the final behavior of the agent is then described by this system of grammars. The schema on the right describes such a system of grammars which shares a common string representing an environment. The shared sequential form is sequentially rewritten by each grammar, which can represent either a component or generally an agent. If grammars communicate together and work on a shared sequential form, it is called a Cooperating Distributed (DC) grammar system. Shared sequential form is a similar concept to the blackboard approach in AI, which is inspired by an idea of experts solving some problem together while they share their proposals and ideas on a shared blackboard. Each grammar in a grammar system can also work on its own string and communicate with other grammars in a system by sending their sequential forms on request. Such a grammar system is then called a Parallel Communicating (PC) grammar system. PC and DC are inspired by distributed AI. If there is no communication between grammars, the system is close to the decentralized approaches in AI. These kinds of grammar systems are sometimes called colonies or Eco-Grammar systems, depending (besides others) on whether the environment is changing on its own (Eco-Grammar system) or not (colonies).

Latent Dirichlet allocation

In natural language processing, latent Dirichlet allocation (LDA) is a generative statistical model that explains how a collection of text documents can be described by a set of unobserved "topics." For example, given a set of news articles, LDA might discover that one topic is characterized by words like "president", "government", and "election", while another is characterized by "team", "game", and "score". It is one of the most common topic models. The LDA model was first presented as a graphical model for population genetics by J. K. Pritchard, M. Stephens and P. Donnelly in 2000. The model was subsequently applied to machine learning by David Blei, Andrew Ng, and Michael I. Jordan in 2003. Although its most frequent application is in modeling text corpora, it has also been used for other problems, such as in clinical psychology, social science, and computational musicology. The core assumption of LDA is that documents are represented as a random mixture of latent topics, and each topic is characterized by a probability distribution over words. The model is a generalization of probabilistic latent semantic analysis (pLSA), differing primarily in that LDA treats the topic mixture as a Dirichlet prior, leading to more reasonable mixtures and less susceptibility to overfitting. Learning the latent topics and their associated probabilities from a corpus is typically done using Bayesian inference, often with methods like Gibbs sampling or variational Bayes. == History == In the context of population genetics, LDA was proposed by J. K. Pritchard, M. Stephens and P. Donnelly in 2000. LDA was applied in machine learning by David Blei, Andrew Ng and Michael I. Jordan in 2003. == Overview == === Population genetics === In population genetics, the model is used to detect the presence of structured genetic variation in a group of individuals. The model assumes that alleles carried by individuals under study have origin in various extant or past populations. The model and various inference algorithms allow scientists to estimate the allele frequencies in those source populations and the origin of alleles carried by individuals under study. The source populations can be interpreted ex-post in terms of various evolutionary scenarios. In association studies, detecting the presence of genetic structure is considered a necessary preliminary step to avoid confounding. === Clinical psychology, mental health, and social science === In clinical psychology research, LDA has been used to identify common themes of self-images experienced by young people in social situations. Other social scientists have used LDA to examine large sets of topical data from discussions on social media (e.g., tweets about prescription drugs). Additionally, supervised Latent Dirichlet Allocation with covariates (SLDAX) has been specifically developed to combine latent topics identified in texts with other manifest variables. This approach allows for the integration of text data as predictors in statistical regression analyses, improving the accuracy of mental health predictions. One of the main advantages of SLDAX over traditional two-stage approaches is its ability to avoid biased estimates and incorrect standard errors, allowing for a more accurate analysis of psychological texts. In the field of social sciences, LDA has proven to be useful for analyzing large datasets, such as social media discussions. For instance, researchers have used LDA to investigate tweets discussing socially relevant topics, like the use of prescription drugs and cultural differences in China. By analyzing these large text corpora, it is possible to uncover patterns and themes that might otherwise go unnoticed, offering valuable insights into public discourse and perception in real time. === Musicology === In the context of computational musicology, LDA has been used to discover tonal structures in different corpora. === Machine learning === One application of LDA in machine learning – specifically, topic discovery, a subproblem in natural language processing – is to discover topics in a collection of documents, and then automatically classify any individual document within the collection in terms of how "relevant" it is to each of the discovered topics. A topic is considered to be a set of terms (i.e., individual words or phrases) that, taken together, suggest a shared theme. For example, in a document collection related to pet animals, the terms dog, spaniel, beagle, golden retriever, puppy, bark, and woof would suggest a DOG_related theme, while the terms cat, siamese, Maine coon, tabby, manx, meow, purr, and kitten would suggest a CAT_related theme. There may be many more topics in the collection – e.g., related to diet, grooming, healthcare, behavior, etc. that we do not discuss for simplicity's sake. (Very common, so called stop words in a language – e.g., "the", "an", "that", "are", "is", etc., – would not discriminate between topics and are usually filtered out by pre-processing before LDA is performed. Pre-processing also converts terms to their "root" lexical forms – e.g., "barks", "barking", and "barked" would be converted to "bark".) If the document collection is sufficiently large, LDA will discover such sets of terms (i.e., topics) based upon the co-occurrence of individual terms, though the task of assigning a meaningful label to an individual topic (i.e., that all the terms are DOG_related) is up to the user, and often requires specialized knowledge (e.g., for collection of technical documents). The LDA approach assumes that: The semantic content of a document is composed by combining one or more terms from one or more topics. Certain terms are ambiguous, belonging to more than one topic, with different probability. (For example, the term training can apply to both dogs and cats, but are more likely to refer to dogs, which are used as work animals or participate in obedience or skill competitions.) However, in a document, the accompanying presence of specific neighboring terms (which belong to only one topic) will disambiguate their usage. Most documents will contain only a relatively small number of topics. In the collection, e.g., individual topics will occur with differing frequencies. That is, they have a probability distribution, so that a given document is more likely to contain some topics than others. Within a topic, certain terms will be used much more frequently than others. In other words, the terms within a topic will also have their own probability distribution. When LDA machine learning is employed, both sets of probabilities are computed during the training phase, using Bayesian methods and an expectation–maximization algorithm. LDA is a generalization of older approach of probabilistic latent semantic analysis (pLSA), The pLSA model is equivalent to LDA under a uniform Dirichlet prior distribution. pLSA relies on only the first two assumptions above and does not care about the remainder. While both methods are similar in principle and require the user to specify the number of topics to be discovered before the start of training (as with k-means clustering) LDA has the following advantages over pLSA: LDA yields better disambiguation of words and a more precise assignment of documents to topics. Computing probabilities allows a "generative" process by which a collection of new "synthetic documents" can be generated that would closely reflect the statistical characteristics of the original collection. Unlike LDA, pLSA is vulnerable to overfitting especially when the size of corpus increases. The LDA algorithm is more readily amenable to scaling up for large data sets using the MapReduce approach on a computing cluster. == Model == With plate notation, which is often used to represent probabilistic graphical models (PGMs), the dependencies among the many variables can be captured concisely. The boxes are "plates" representing replicates, which are repeated entities. The outer plate represents documents, while the inner plate represents the repeated word positions in a given document; each position is associated with a choice of topic and word. The variable names are defined as follows: M denotes the number of documents N is number of words in a given document (document i has N i {\displaystyle N_{i}} words) α is the parameter of the Dirichlet prior on the per-document topic distributions β is the parameter of the Dirichlet prior on the per-topic word distribution θ i {\displaystyle \theta _{i}} is the topic distribution for document i φ k {\displaystyle \varphi _{k}} is the word distribution for topic k z i j {\displaystyle z_{ij}} is the topic for the j-th word in document i w i j {\displaystyle w_{ij}} is the specific word. The fact that W is grayed out means that words w i j {\displaystyle w_{ij}} are the only observable variables, and the other variables are latent variables. As proposed in the original paper, a sparse Dirichlet prior can be used to model the to