AI Content On Youtube

AI Content On Youtube — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • MetaMask

    MetaMask

    MetaMask is a software cryptocurrency wallet developed by ConsenSys for interacting with the Ethereum blockchain and other EVM-compatible networks. It enables users to manage Ethereum accounts and connect to decentralized applications (dApps) via a browser extension or mobile app. As of early 2026, MetaMask reports over 100 million users worldwide. == Overview == MetaMask allows users to store and manage private keys, send and receive Ethereum-based cryptocurrencies and tokens (including ERC-20 and ERC-721 standards), broadcast transactions, and interact with dApps. dApps connect to the wallet via JavaScript interfaces, prompting users to approve signatures or transactions. The wallet features MetaMask Swaps, an in-app token swap aggregator sourcing liquidity from multiple decentralized exchanges (DEXs), with a service fee of 0.875%. In 2025, MetaMask introduced the MetaMask Rewards program (initially mobile-only), where users earn points for activities such as swaps, bridging, and referrals. Season 1 (October 2025 – January 2026) distributed over $30 million in Linea tokens and other perks to participants. == History == MetaMask launched in 2016 as open-source software under the MIT license. It initially supported browser extensions for Chrome and Firefox. Mobile versions were in closed beta from 2019 and publicly released for iOS and Android in September 2020. In August 2020, the license changed to a custom proprietary one. MetaMask Swaps launched on desktop in October 2020 and on mobile in March 2021. The Rewards program launched in late 2025 with Linea integration. == Criticism == MetaMask has faced criticism over privacy, including default analytics settings that share some user data (which can be disabled). Its reliance on Infura (acquired by ConsenSys in 2019) has raised concerns about centralization in Ethereum infrastructure. The wallet regularly issues warnings about phishing scams and fake airdrops impersonating MetaMask.

    Read more →
  • Confirmatory blockmodeling

    Confirmatory blockmodeling

    Confirmatory blockmodeling is a deductive approach in blockmodeling, where a blockmodel (or part of it) is prespecify before the analysis, and then the analysis is fit to this model. When only a part of analysis is prespecify (like individual cluster(s) or location of the block types), it is called partially confirmatory blockmodeling. This is so-called indirect approach, where the blockmodeling is done on the blockmodel fitting (e.g., a priori hypothesized blockmodel). Opposite approach to the confirmatory blockmodeling is an inductive exploratory blockmodeling.

    Read more →
  • LogitBoost

    LogitBoost

    In machine learning and computational learning theory, LogitBoost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost function of logistic regression, one can derive the LogitBoost algorithm. == Minimizing the LogitBoost cost function == LogitBoost can be seen as a convex optimization. Specifically, given that we seek an additive model of the form f = ∑ t α t h t {\displaystyle f=\sum _{t}\alpha _{t}h_{t}} the LogitBoost algorithm minimizes the logistic loss: ∑ i log ⁡ ( 1 + e − y i f ( x i ) ) {\displaystyle \sum _{i}\log \left(1+e^{-y_{i}f(x_{i})}\right)}

    Read more →
  • Common Voice

    Common Voice

    Common Voice is a crowdsourcing project started by Mozilla to create a free and open speech corpus. The project is supported by volunteers who record sample sentences with a microphone and review recordings of other users. The transcribed sentences are collected in a voice database available under the public domain license CC0. This license ensures that developers can use the database for voice-to-text and text-to-voice applications without restrictions or costs. == Aims == Common Voice aims to provide diverse voice samples. According to Mozilla's Katharina Borchert, many existing projects took datasets from public radio or otherwise had datasets that underrepresented both women and people with pronounced accents. == Voice database == The first dataset was released in November 2017. More than 20,000 users worldwide had recorded 500 hours of English sentences. In February 2019, the first batch of languages was released for use. This included 18 languages such as English, French, German and Mandarin Chinese, but also less prevalent languages like Welsh and Kabyle. In total, this included almost 1,400 hours of recorded voice data from more than 42,000 contributors. By July 2020 the database had amassed 7,226 hours of voice recordings in 54 languages, 5,591 hours of which had been verified by volunteers. In May 2021, following the work to add Kinyarwanda, the project received a grant to add Kiswahili. At the beginning of 2022, Bengali.AI partnered with Common Voice to launch the "Bangla Speech Recognition" project that aims to make machines understand the Bangla language. 2000 hours of voice was collected. In September 2022, it was announced that the Twi language of Ghana was the 100th language to be added to the database. As of December 2025, Mozilla Common Voice collects voice data for over 250 languages, with the most hours having been collected in English, Catalan, Kinyarwanda, Belarusian and Esperanto.

    Read more →
  • System integrity

    System integrity

    In telecommunications, the term system integrity has the following meanings: That condition of a system wherein its mandated operational and technical parameters are within the prescribed limits. The quality of an AIS when it performs its intended function in an unimpaired manner, free from deliberate or inadvertent unauthorized manipulation of the system. The state that exists when there is complete assurance that under all conditions an IT system is based on the logical correctness and reliability of the operating system, the logical completeness of the hardware and software that implement the protection mechanisms, and data integrity.

    Read more →
  • Inverted pendulum

    Inverted pendulum

    An inverted pendulum is a pendulum that has its center of mass above its pivot point. It is unstable and falls over without additional help. It can be suspended stably in this inverted position by using a control system to monitor the angle of the pole and move the pivot point horizontally back under the center of mass when it starts to fall over, keeping it balanced. The inverted pendulum is a classic problem in dynamics and control theory and is used as a benchmark for testing control strategies. It is often implemented with the pivot point mounted on a cart that can move horizontally under control of an electronic servo system as shown in the photo; this is called a cart and pole apparatus. Most applications limit the pendulum to 1 degree of freedom by affixing the pole to an axis of rotation. Whereas a normal pendulum is stable when hanging downward, an inverted pendulum is inherently unstable, and must be actively balanced in order to remain upright; this can be done either by applying a torque at the pivot point, by moving the pivot point horizontally as part of a feedback system, changing the rate of rotation of a mass mounted on the pendulum on an axis parallel to the pivot axis and thereby generating a net torque on the pendulum, or by oscillating the pivot point vertically. A simple demonstration of moving the pivot point in a feedback system is achieved by balancing an upturned broomstick on the end of one's finger. A second type of inverted pendulum is a tiltmeter for tall structures, which consists of a wire anchored to the bottom of the foundation and attached to a float in a pool of oil at the top of the structure that has devices for measuring movement of the neutral position of the float away from its original position. == Overview == A pendulum with its bob hanging directly below the support pivot is at a stable equilibrium point, where it remains motionless because there is no torque on the pendulum. If displaced from this position, it experiences a restoring torque that returns it toward the equilibrium position. A pendulum with its bob in an inverted position, supported on a rigid rod directly above the pivot, 180° from its stable equilibrium position, is at an unstable equilibrium point. At this point again there is no torque on the pendulum, but the slightest displacement away from this position causes a gravitation torque on the pendulum that accelerates it away from equilibrium, causing it to fall over. In order to stabilize a pendulum in this inverted position, a feedback control system can be used, which monitors the pendulum's angle and moves the position of the pivot point sideways when the pendulum starts to fall over, to keep it balanced. The inverted pendulum is a classic problem in dynamics and control theory and is widely used as a benchmark for testing control algorithms (PID controllers, state-space representation, neural networks, fuzzy control, genetic algorithms, etc.). Variations on this problem include multiple links, allowing the motion of the cart to be commanded while maintaining the pendulum, and balancing the cart-pendulum system on a see-saw. The inverted pendulum is related to rocket or missile guidance, where the center of gravity is located behind the center of drag causing aerodynamic instability. The understanding of a similar problem can be shown by simple robotics in the form of a balancing cart. Balancing an upturned broomstick on the end of one's finger is a simple demonstration, and the problem is solved by self-balancing personal transporters such as the Segway PT, the self-balancing hoverboard and the self-balancing unicycle. Another way that an inverted pendulum may be stabilized, without any feedback or control mechanism, is by oscillating the pivot rapidly up and down. This is called Kapitza's pendulum. If the oscillation is sufficiently strong (in terms of its acceleration and amplitude) then the inverted pendulum can recover from perturbations in a strikingly counterintuitive manner. If the driving point moves in simple harmonic motion, the pendulum's motion is described by the Mathieu equation. == Equations of motion == The equations of motion of inverted pendulums are dependent on what constraints are placed on the motion of the pendulum. Inverted pendulums can be created in various configurations resulting in a number of Equations of Motion describing the behavior of the pendulum. === Stationary pivot point === In a configuration where the pivot point of the pendulum is fixed in space, the equation of motion is similar to that for an uninverted pendulum. The equation of motion below assumes no friction or any other resistance to movement, a rigid massless rod, and the restriction to 2-dimensional movement. θ ¨ − g ℓ sin ⁡ θ = 0 {\displaystyle {\ddot {\theta }}-{g \over \ell }\sin \theta =0} Where θ ¨ {\displaystyle {\ddot {\theta }}} is the angular acceleration of the pendulum, g {\displaystyle g} is the standard gravity on the surface of the Earth, ℓ {\displaystyle \ell } is the length of the pendulum, and θ {\displaystyle \theta } is the angular displacement measured from the equilibrium position. When θ ¨ {\displaystyle {\ddot {\theta }}} added to both sides, it has the same sign as the angular acceleration term: θ ¨ = g ℓ sin ⁡ θ {\displaystyle {\ddot {\theta }}={g \over \ell }\sin \theta } Thus, the inverted pendulum accelerates away from the vertical unstable equilibrium in the direction initially displaced, and the acceleration is inversely proportional to the length. Tall pendulums fall more slowly than short ones. Derivation using torque and moment of inertia: The pendulum is assumed to consist of a point mass, of mass m {\displaystyle m} , affixed to the end of a massless rigid rod, of length ℓ {\displaystyle \ell } , attached to a pivot point at the end opposite the point mass. The net torque of the system must equal the moment of inertia times the angular acceleration: τ n e t = I θ ¨ {\displaystyle {\boldsymbol {\tau }}_{\mathrm {net} }=I{\ddot {\theta }}} The torque due to gravity providing the net torque: τ n e t = m g ℓ sin ⁡ θ {\displaystyle {\boldsymbol {\tau }}_{\mathrm {net} }=mg\ell \sin \theta \,\!} Where θ {\displaystyle \theta \ } is the angle measured from the inverted equilibrium position. The resulting equation: I θ ¨ = m g ℓ sin ⁡ θ {\displaystyle I{\ddot {\theta }}=mg\ell \sin \theta \,\!} The moment of inertia for a point mass: I = m R 2 {\displaystyle I=mR^{2}} In the case of the inverted pendulum the radius is the length of the rod, ℓ {\displaystyle \ell } . Substituting in I = m ℓ 2 {\displaystyle I=m\ell ^{2}} m ℓ 2 θ ¨ = m g ℓ sin ⁡ θ {\displaystyle m\ell ^{2}{\ddot {\theta }}=mg\ell \sin \theta \,\!} Mass and ℓ 2 {\displaystyle \ell ^{2}} is divided from each side resulting in: θ ¨ = g ℓ sin ⁡ θ {\displaystyle {\ddot {\theta }}={g \over \ell }\sin \theta } === Inverted pendulum on a cart === An inverted pendulum on a cart consists of a mass m {\displaystyle m} at the top of a pole of length ℓ {\displaystyle \ell } pivoted on a horizontally moving base as shown in the adjacent image. The cart is restricted to linear motion and is subject to forces resulting in or hindering motion. === Essentials of stabilization === The essentials of stabilizing the inverted pendulum can be summarized qualitatively in three steps. 1. If the tilt angle θ {\displaystyle \theta } is to the right, the cart must accelerate to the right and vice versa. 2. The position of the cart x {\displaystyle x} relative to track center is stabilized by slightly modulating the null angle (the angle error that the control system tries to null) by the position of the cart, that is, null angle = θ + k x {\displaystyle =\theta +kx} where k {\displaystyle k} is small. This makes the pole want to lean slightly toward track center and stabilize at track center where the tilt angle is exactly vertical. Any offset in the tilt sensor or track slope that would otherwise cause instability translates into a stable position offset. A further added offset gives position control. 3. A normal pendulum subject to a moving pivot point such as a load lifted by a crane, has a peaked response at the pendulum radian frequency of ω p = g / ℓ {\displaystyle \omega _{p}={\sqrt {g/\ell }}} . To prevent uncontrolled swinging, the frequency spectrum of the pivot motion should be suppressed near ω p {\displaystyle \omega _{p}} . The inverted pendulum requires the same suppression filter to achieve stability. As a consequence of the null angle modulation strategy, the position feedback is positive, that is, a sudden command to move right produces an initial cart motion to the left followed by a move right to rebalance the pendulum. The interaction of the pendulum instability and the positive position feedback instability to produce a stable system is a feature that makes the mathematical analysis an interesting and challenging problem. === From Lagrange's equations === The equations of motion c

    Read more →
  • Diffusion model

    Diffusion model

    In machine learning, diffusion models, also known as diffusion-based generative models or score-based generative models, are a class of latent variable generative models. A diffusion model consists of two major components: the forward diffusion process, and the reverse sampling process. The goal of diffusion models is to learn a diffusion process for a given dataset, such that the process can generate new elements that are distributed similarly as the original dataset. A diffusion model models data as generated by a diffusion process, whereby a new datum performs a random walk with drift through the space of all possible data. A trained diffusion model can be sampled in many ways, with different efficiency and quality. There are various equivalent formalisms, including Markov chains, denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. They are typically trained using variational inference. The model responsible for denoising is typically called its "backbone". The backbone may be of any kind, but they are typically U-nets or transformers. As of 2024, diffusion models are mainly used for computer vision tasks, including image denoising, inpainting, super-resolution, image generation, and video generation. These typically involve training a neural network to sequentially denoise images blurred with Gaussian noise. The model is trained to reverse the process of adding noise to an image. After training to convergence, it can be used for image generation by starting with an image composed of random noise, and applying the network iteratively to denoise the image. Diffusion-based image generators have seen widespread commercial interest, such as Stable Diffusion and DALL-E. These models typically combine diffusion models with other models, such as text-encoders and cross-attention modules to allow text-conditioned generation. Other than computer vision, diffusion models have also found applications in natural language processing such as text generation and summarization, sound generation, and reinforcement learning. == Denoising diffusion model == === Non-equilibrium thermodynamics === Diffusion models were introduced in 2015 as a method to train a model that can sample from a highly complex probability distribution. They used techniques from non-equilibrium thermodynamics, especially diffusion. Consider, for example, how one might model the distribution of all naturally occurring photos. Each image is a point in the space of all images, and the distribution of naturally occurring photos is a "cloud" in space, which, by repeatedly adding noise to the images, diffuses out to the rest of the image space, until the cloud becomes all but indistinguishable from a Gaussian distribution N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} . A model that can approximately undo the diffusion can then be used to sample from the original distribution. This is studied in "non-equilibrium" thermodynamics, as the starting distribution is not in equilibrium, unlike the final distribution. The equilibrium distribution is the Gaussian distribution N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} , with pdf ρ ( x ) ∝ e − 1 2 ‖ x ‖ 2 {\displaystyle \rho (x)\propto e^{-{\frac {1}{2}}\|x\|^{2}}} . This is just the Maxwell–Boltzmann distribution of particles in a potential well V ( x ) = 1 2 ‖ x ‖ 2 {\displaystyle V(x)={\frac {1}{2}}\|x\|^{2}} at temperature 1. The initial distribution, being very much out of equilibrium, would diffuse towards the equilibrium distribution, making biased random steps that are a sum of pure randomness (like a Brownian walker) and gradient descent down the potential well. The randomness is necessary: if the particles were to undergo only gradient descent, then they will all fall to the origin, collapsing the distribution. === Denoising Diffusion Probabilistic Model (DDPM) === The 2020 paper proposed the Denoising Diffusion Probabilistic Model (DDPM), which improves upon the previous method by variational inference. ==== Forward diffusion ==== To present the model, some notation is required. β 1 , . . . , β T ∈ ( 0 , 1 ) {\displaystyle \beta _{1},...,\beta _{T}\in (0,1)} are fixed constants. α t := 1 − β t {\displaystyle \alpha _{t}:=1-\beta _{t}} α ¯ t := α 1 ⋯ α t {\displaystyle {\bar {\alpha }}_{t}:=\alpha _{1}\cdots \alpha _{t}} σ t := 1 − α ¯ t {\displaystyle \sigma _{t}:={\sqrt {1-{\bar {\alpha }}_{t}}}} σ ~ t := σ t − 1 σ t β t {\displaystyle {\tilde {\sigma }}_{t}:={\frac {\sigma _{t-1}}{\sigma _{t}}}{\sqrt {\beta _{t}}}} μ ~ t ( x t , x 0 ) := α t ( 1 − α ¯ t − 1 ) x t + α ¯ t − 1 ( 1 − α t ) x 0 σ t 2 {\displaystyle {\tilde {\mu }}_{t}(x_{t},x_{0}):={\frac {{\sqrt {\alpha _{t}}}(1-{\bar {\alpha }}_{t-1})x_{t}+{\sqrt {{\bar {\alpha }}_{t-1}}}(1-\alpha _{t})x_{0}}{\sigma _{t}^{2}}}} N ( μ , Σ ) {\displaystyle {\mathcal {N}}(\mu ,\Sigma )} is the normal distribution with mean μ {\displaystyle \mu } and variance Σ {\displaystyle \Sigma } , and N ( x | μ , Σ ) {\displaystyle {\mathcal {N}}(x|\mu ,\Sigma )} is the probability density at x {\displaystyle x} . A vertical bar denotes conditioning. A forward diffusion process starts at some starting point x 0 ∼ q {\displaystyle x_{0}\sim q} , where q {\displaystyle q} is the probability distribution to be learned, then repeatedly adds noise to it by x t = 1 − β t x t − 1 + β t z t {\displaystyle x_{t}={\sqrt {1-\beta _{t}}}x_{t-1}+{\sqrt {\beta _{t}}}z_{t}} where z 1 , . . . , z T {\displaystyle z_{1},...,z_{T}} are IID (Independent and identically distributed random variables) samples from N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} . The coefficients 1 − β t {\displaystyle {\sqrt {1-\beta _{t}}}} and β t {\displaystyle {\sqrt {\beta _{t}}}} ensure that Var ( X t ) = I {\displaystyle {\mbox{Var}}(X_{t})=I} assuming that Var ( X 0 ) = I {\displaystyle {\mbox{Var}}(X_{0})=I} . The values of β t {\displaystyle \beta _{t}} are chosen such that for any starting distribution of x 0 {\displaystyle x_{0}} , if it has finite second moment, then lim t → ∞ x t | x 0 {\displaystyle \lim _{t\to \infty }x_{t}|x_{0}} converges to N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} . The entire diffusion process then satisfies q ( x 0 : T ) = q ( x 0 ) q ( x 1 | x 0 ) ⋯ q ( x T | x T − 1 ) = q ( x 0 ) N ( x 1 | α 1 x 0 , β 1 I ) ⋯ N ( x T | α T x T − 1 , β T I ) {\displaystyle q(x_{0:T})=q(x_{0})q(x_{1}|x_{0})\cdots q(x_{T}|x_{T-1})=q(x_{0}){\mathcal {N}}(x_{1}|{\sqrt {\alpha _{1}}}x_{0},\beta _{1}I)\cdots {\mathcal {N}}(x_{T}|{\sqrt {\alpha _{T}}}x_{T-1},\beta _{T}I)} or ln ⁡ q ( x 0 : T ) = ln ⁡ q ( x 0 ) − ∑ t = 1 T 1 2 β t ‖ x t − 1 − β t x t − 1 ‖ 2 + C {\displaystyle \ln q(x_{0:T})=\ln q(x_{0})-\sum _{t=1}^{T}{\frac {1}{2\beta _{t}}}\|x_{t}-{\sqrt {1-\beta _{t}}}x_{t-1}\|^{2}+C} where C {\displaystyle C} is a normalization constant and often omitted. In particular, we note that x 1 : T | x 0 {\displaystyle x_{1:T}|x_{0}} is a Gaussian process, which affords us considerable freedom in reparameterization. For example, by standard manipulation with Gaussian process, x t | x 0 ∼ N ( α ¯ t x 0 , σ t 2 I ) {\displaystyle x_{t}|x_{0}\sim N\left({\sqrt {{\bar {\alpha }}_{t}}}x_{0},\sigma _{t}^{2}I\right)} x t − 1 | x t , x 0 ∼ N ( μ ~ t ( x t , x 0 ) , σ ~ t 2 I ) {\displaystyle x_{t-1}|x_{t},x_{0}\sim {\mathcal {N}}({\tilde {\mu }}_{t}(x_{t},x_{0}),{\tilde {\sigma }}_{t}^{2}I)} In particular, notice that for large t {\displaystyle t} , the variable x t | x 0 ∼ N ( α ¯ t x 0 , σ t 2 I ) {\displaystyle x_{t}|x_{0}\sim N\left({\sqrt {{\bar {\alpha }}_{t}}}x_{0},\sigma _{t}^{2}I\right)} converges to N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} . That is, after a long enough diffusion process, we end up with some x T {\displaystyle x_{T}} that is very close to N ( 0 , I ) {\displaystyle {\mathcal {N}}(0,I)} , with all traces of the original x 0 ∼ q {\displaystyle x_{0}\sim q} gone. For example, since x t | x 0 ∼ N ( α ¯ t x 0 , σ t 2 I ) {\displaystyle x_{t}|x_{0}\sim N\left({\sqrt {{\bar {\alpha }}_{t}}}x_{0},\sigma _{t}^{2}I\right)} we can sample x t | x 0 {\displaystyle x_{t}|x_{0}} directly "in one step", instead of going through all the intermediate steps x 1 , x 2 , . . . , x t − 1 {\displaystyle x_{1},x_{2},...,x_{t-1}} . ==== Backward diffusion ==== The key idea of DDPM is to use a neural network parametrized by θ {\displaystyle \theta } . The network takes in two arguments x t , t {\displaystyle x_{t},t} , and outputs a vector μ θ ( x t , t ) {\displaystyle \mu _{\theta }(x_{t},t)} and a matrix Σ θ ( x t , t ) {\displaystyle \Sigma _{\theta }(x_{t},t)} , such that each step in the forward diffusion process can be approximately undone by x t − 1 ∼ N ( μ θ ( x t , t ) , Σ θ ( x t , t ) ) {\displaystyle x_{t-1}\sim {\mathcal {N}}(\mu _{\theta }(x_{t},t),\Sigma _{\theta }(x_{t},t))} . This then gives us a backward diffusion process p θ {\displaystyle p_{\theta }} defined by p θ ( x T ) = N ( x T | 0 , I ) {\displaystyle p_{\theta }(x

    Read more →
  • Influence diagram

    Influence diagram

    An influence diagram (ID) (also called a relevance diagram, decision diagram or a decision network) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems (following the maximum expected utility criterion) can be modeled and solved. ID was first developed in the mid-1970s by decision analysts with an intuitive semantic that is easy to understand. It is now adopted widely and becoming an alternative to the decision tree which typically suffers from exponential growth in number of branches with each variable modeled. ID is directly applicable in team decision analysis, since it allows incomplete sharing of information among team members to be modeled and solved explicitly. Extensions of ID also find their use in game theory as an alternative representation of the game tree. == Semantics == An ID is a directed acyclic graph with three types (plus one subtype) of node and three types of arc (or arrow) between nodes. Nodes: Decision node (corresponding to each decision to be made) is drawn as a rectangle. Uncertainty node (corresponding to each uncertainty to be modeled) is drawn as an oval. Deterministic node (corresponding to special kind of uncertainty that its outcome is deterministically known whenever the outcome of some other uncertainties are also known) is drawn as a double oval. Value node (corresponding to each component of additively separable Von Neumann-Morgenstern utility function) is drawn as an octagon (or diamond). Arcs: Functional arcs (ending in value node) indicate that one of the components of additively separable utility function is a function of all the nodes at their tails. Conditional arcs (ending in uncertainty node) indicate that the uncertainty at their heads is probabilistically conditioned on all the nodes at their tails. Conditional arcs (ending in deterministic node) indicate that the uncertainty at their heads is deterministically conditioned on all the nodes at their tails. Informational arcs (ending in decision node) indicate that the decision at their heads is made with the outcome of all the nodes at their tails known beforehand. Given a properly structured ID: Decision nodes and incoming information arcs collectively state the alternatives (what can be done when the outcome of certain decisions and/or uncertainties are known beforehand) Uncertainty/deterministic nodes and incoming conditional arcs collectively model the information (what are known and their probabilistic/deterministic relationships) Value nodes and incoming functional arcs collectively quantify the preference (how things are preferred over one another). Alternative, information, and preference are termed decision basis in decision analysis, they represent three required components of any valid decision situation. Formally, the semantic of influence diagram is based on sequential construction of nodes and arcs, which implies a specification of all conditional independencies in the diagram. The specification is defined by the d {\displaystyle d} -separation criterion of Bayesian network. According to this semantic, every node is probabilistically independent on its non-successor nodes given the outcome of its immediate predecessor nodes. Likewise, a missing arc between non-value node X {\displaystyle X} and non-value node Y {\displaystyle Y} implies that there exists a set of non-value nodes Z {\displaystyle Z} , e.g., the parents of Y {\displaystyle Y} , that renders Y {\displaystyle Y} independent of X {\displaystyle X} given the outcome of the nodes in Z {\displaystyle Z} . == Example == Consider the simple influence diagram representing a situation where a decision-maker is planning their vacation. There is 1 decision node (Vacation Activity), 2 uncertainty nodes (Weather Condition, Weather Forecast), and 1 value node (Satisfaction). There are 2 functional arcs (ending in Satisfaction), 1 conditional arc (ending in Weather Forecast), and 1 informational arc (ending in Vacation Activity). Functional arcs ending in Satisfaction indicate that Satisfaction is a utility function of Weather Condition and Vacation Activity. In other words, their satisfaction can be quantified if they know what the weather is like and what their choice of activity is. (Note that they do not value Weather Forecast directly) Conditional arc ending in Weather Forecast indicates their belief that Weather Forecast and Weather Condition can be dependent. Informational arc ending in Vacation Activity indicates that they will only know Weather Forecast, not Weather Condition, when making their choice. In other words, actual weather will be known after they make their choice, and only forecast is what they can count on at this stage. It also follows semantically, for example, that Vacation Activity is independent on (irrelevant to) Weather Condition given Weather Forecast is known. == Applicability to value of information == The above example highlights the power of the influence diagram in representing an extremely important concept in decision analysis known as the value of information. Consider the following three scenarios; Scenario 1: The decision-maker could make their Vacation Activity decision while knowing what Weather Condition will be like. This corresponds to adding extra informational arc from Weather Condition to Vacation Activity in the above influence diagram. Scenario 2: The original influence diagram as shown above. Scenario 3: The decision-maker makes their decision without even knowing the Weather Forecast. This corresponds to removing informational arc from Weather Forecast to Vacation Activity in the above influence diagram. Scenario 1 is the best possible scenario for this decision situation since there is no longer any uncertainty on what they care about (Weather Condition) when making their decision. Scenario 3, however, is the worst possible scenario for this decision situation since they need to make their decision without any hint (Weather Forecast) on what they care about (Weather Condition) will turn out to be. The decision-maker is usually better off (definitely no worse off, on average) to move from scenario 3 to scenario 2 through the acquisition of new information. The most they should be willing to pay for such move is called the value of information on Weather Forecast, which is essentially the value of imperfect information on Weather Condition. The applicability of this simple ID and the value of information concept is tremendous, especially in medical decision making when most decisions have to be made with imperfect information about their patients, diseases, etc. == Related concepts == Influence diagrams are hierarchical and can be defined either in terms of their structure or in greater detail in terms of the functional and numerical relation between diagram elements. An ID that is consistently defined at all levels—structure, function, and number—is a well-defined mathematical representation and is referred to as a well-formed influence diagram (WFID). WFIDs can be evaluated using reversal and removal operations to yield answers to a large class of probabilistic, inferential, and decision questions. More recent techniques have been developed by artificial intelligence researchers concerning Bayesian network inference (belief propagation). An influence diagram having only uncertainty nodes (i.e., a Bayesian network) is also called a relevance diagram. An arc connecting node A to B implies not only that "A is relevant to B", but also that "B is relevant to A" (i.e., relevance is a symmetric relationship).

    Read more →
  • The Business Cloud

    The Business Cloud

    The Business Cloud is an API enabled self-service platform, developed by Domo, that provides an array of services like data connection and data visualization. == History == Domo, Inc. was founded in 2010 by Josh James who also co-founded the web analytics software company Omniture in 1996, which he took public in 2006. Domo launched the Domo Appstore, with 1,000 apps with social and mobile capabilities, in 2016. This appstore creates a network of business apps and an ecosystem of companies into a single, integrated business cloud. This decision came after Domo announced a $131 million round of funding from BlackRock. According to the company, the concept behind The Business Cloud is to connect smaller clouds relating to apps or other functional areas of a business into a single business cloud that allows self-service and other social features to customers. == Services == The Business Cloud is offered as a free service, claimed to be the world's first business cloud with Domo appstore as one of its core services. This free package includes all of the Domo's features and functionality including Domo platform, Domo Apps, visualizations, alerts, company directories, org charts, profiles, tasks and Domo Mobile. The Business Cloud allows customers to leverage their preferred cloud as well as on-premises software and monitor all aspects of their business in routine. The company is supported by a $500 million fund from investors all over the world.

    Read more →
  • Sufficient dimension reduction

    Sufficient dimension reduction

    In statistics, sufficient dimension reduction (SDR) is a paradigm for analyzing data that combines the ideas of dimension reduction with the concept of sufficiency. Dimension reduction has long been a primary goal of regression analysis. Given a response variable y and a p-dimensional predictor vector x {\displaystyle {\textbf {x}}} , regression analysis aims to study the distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} , the conditional distribution of y {\displaystyle y} given x {\displaystyle {\textbf {x}}} . A dimension reduction is a function R ( x ) {\displaystyle R({\textbf {x}})} that maps x {\displaystyle {\textbf {x}}} to a subset of R k {\displaystyle \mathbb {R} ^{k}} , k < p, thereby reducing the dimension of x {\displaystyle {\textbf {x}}} . For example, R ( x ) {\displaystyle R({\textbf {x}})} may be one or more linear combinations of x {\displaystyle {\textbf {x}}} . A dimension reduction R ( x ) {\displaystyle R({\textbf {x}})} is said to be sufficient if the distribution of y ∣ R ( x ) {\displaystyle y\mid R({\textbf {x}})} is the same as that of y ∣ x {\displaystyle y\mid {\textbf {x}}} . In other words, no information about the regression is lost in reducing the dimension of x {\displaystyle {\textbf {x}}} if the reduction is sufficient. == Graphical motivation == In a regression setting, it is often useful to summarize the distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} graphically. For instance, one may consider a scatterplot of y {\displaystyle y} versus one or more of the predictors or a linear combination of the predictors. A scatterplot that contains all available regression information is called a sufficient summary plot. When x {\displaystyle {\textbf {x}}} is high-dimensional, particularly when p ≥ 3 {\displaystyle p\geq 3} , it becomes increasingly challenging to construct and visually interpret sufficiency summary plots without reducing the data. Even three-dimensional scatter plots must be viewed via a computer program, and the third dimension can only be visualized by rotating the coordinate axes. However, if there exists a sufficient dimension reduction R ( x ) {\displaystyle R({\textbf {x}})} with small enough dimension, a sufficient summary plot of y {\displaystyle y} versus R ( x ) {\displaystyle R({\textbf {x}})} may be constructed and visually interpreted with relative ease. Hence sufficient dimension reduction allows for graphical intuition about the distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} , which might not have otherwise been available for high-dimensional data. Most graphical methodology focuses primarily on dimension reduction involving linear combinations of x {\displaystyle {\textbf {x}}} . The rest of this article deals only with such reductions. == Dimension reduction subspace == Suppose R ( x ) = A T x {\displaystyle R({\textbf {x}})=A^{T}{\textbf {x}}} is a sufficient dimension reduction, where A {\displaystyle A} is a p × k {\displaystyle p\times k} matrix with rank k ≤ p {\displaystyle k\leq p} . Then the regression information for y ∣ x {\displaystyle y\mid {\textbf {x}}} can be inferred by studying the distribution of y ∣ A T x {\displaystyle y\mid A^{T}{\textbf {x}}} , and the plot of y {\displaystyle y} versus A T x {\displaystyle A^{T}{\textbf {x}}} is a sufficient summary plot. Without loss of generality, only the space spanned by the columns of A {\displaystyle A} need be considered. Let η {\displaystyle \eta } be a basis for the column space of A {\displaystyle A} , and let the space spanned by η {\displaystyle \eta } be denoted by S ( η ) {\displaystyle {\mathcal {S}}(\eta )} . It follows from the definition of a sufficient dimension reduction that F y ∣ x = F y ∣ η T x , {\displaystyle F_{y\mid x}=F_{y\mid \eta ^{T}x},} where F {\displaystyle F} denotes the appropriate distribution function. Another way to express this property is y ⊥ ⊥ x ∣ η T x , {\displaystyle y\perp \!\!\!\perp {\textbf {x}}\mid \eta ^{T}{\textbf {x}},} or y {\displaystyle y} is conditionally independent of x {\displaystyle {\textbf {x}}} , given η T x {\displaystyle \eta ^{T}{\textbf {x}}} . Then the subspace S ( η ) {\displaystyle {\mathcal {S}}(\eta )} is defined to be a dimension reduction subspace (DRS). === Structural dimensionality === For a regression y ∣ x {\displaystyle y\mid {\textbf {x}}} , the structural dimension, d {\displaystyle d} , is the smallest number of distinct linear combinations of x {\displaystyle {\textbf {x}}} necessary to preserve the conditional distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} . In other words, the smallest dimension reduction that is still sufficient maps x {\displaystyle {\textbf {x}}} to a subset of R d {\displaystyle \mathbb {R} ^{d}} . The corresponding DRS will be d-dimensional. === Minimum dimension reduction subspace === A subspace S {\displaystyle {\mathcal {S}}} is said to be a minimum DRS for y ∣ x {\displaystyle y\mid {\textbf {x}}} if it is a DRS and its dimension is less than or equal to that of all other DRSs for y ∣ x {\displaystyle y\mid {\textbf {x}}} . A minimum DRS S {\displaystyle {\mathcal {S}}} is not necessarily unique, but its dimension is equal to the structural dimension d {\displaystyle d} of y ∣ x {\displaystyle y\mid {\textbf {x}}} , by definition. If S {\displaystyle {\mathcal {S}}} has basis η {\displaystyle \eta } and is a minimum DRS, then a plot of y versus η T x {\displaystyle \eta ^{T}{\textbf {x}}} is a minimal sufficient summary plot, and it is (d + 1)-dimensional. == Central subspace == If a subspace S {\displaystyle {\mathcal {S}}} is a DRS for y ∣ x {\displaystyle y\mid {\textbf {x}}} , and if S ⊂ S drs {\displaystyle {\mathcal {S}}\subset {\mathcal {S}}_{\text{drs}}} for all other DRSs S drs {\displaystyle {\mathcal {S}}_{\text{drs}}} , then it is a central dimension reduction subspace, or simply a central subspace, and it is denoted by S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} . In other words, a central subspace for y ∣ x {\displaystyle y\mid {\textbf {x}}} exists if and only if the intersection ⋂ S drs {\textstyle \bigcap {\mathcal {S}}_{\text{drs}}} of all dimension reduction subspaces is also a dimension reduction subspace, and that intersection is the central subspace S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} . The central subspace S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} does not necessarily exist because the intersection ⋂ S drs {\textstyle \bigcap {\mathcal {S}}_{\text{drs}}} is not necessarily a DRS. However, if S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} does exist, then it is also the unique minimum dimension reduction subspace. === Existence of the central subspace === While the existence of the central subspace S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} is not guaranteed in every regression situation, there are some rather broad conditions under which its existence follows directly. For example, consider the following proposition from Cook (1998): Let S 1 {\displaystyle {\mathcal {S}}_{1}} and S 2 {\displaystyle {\mathcal {S}}_{2}} be dimension reduction subspaces for y ∣ x {\displaystyle y\mid {\textbf {x}}} . If x {\displaystyle {\textbf {x}}} has density f ( a ) > 0 {\displaystyle f(a)>0} for all a ∈ Ω x {\displaystyle a\in \Omega _{x}} and f ( a ) = 0 {\displaystyle f(a)=0} everywhere else, where Ω x {\displaystyle \Omega _{x}} is convex, then the intersection S 1 ∩ S 2 {\displaystyle {\mathcal {S}}_{1}\cap {\mathcal {S}}_{2}} is also a dimension reduction subspace. It follows from this proposition that the central subspace S y ∣ x {\displaystyle {\mathcal {S}}_{y\mid x}} exists for such x {\displaystyle {\textbf {x}}} . == Methods for dimension reduction == There are many existing methods for dimension reduction, both graphical and numeric. For example, sliced inverse regression (SIR) and sliced average variance estimation (SAVE) were introduced in the 1990s and continue to be widely used. Although SIR was originally designed to estimate an effective dimension reducing subspace, it is now understood that it estimates only the central subspace, which is generally different. More recent methods for dimension reduction include likelihood-based sufficient dimension reduction, estimating the central subspace based on the inverse third moment (or kth moment), estimating the central solution space, graphical regression, envelope model, and the principal support vector machine. For more details on these and other methods, consult the statistical literature. Principal components analysis (PCA) and similar methods for dimension reduction are not based on the sufficiency principle. === Example: linear regression === Consider the regression model y = α + β T x + ε , where ε ⊥ ⊥ x . {\displaystyle y=\alpha +\beta ^{T}{\textbf {x}}+\varepsilon ,{\text{ where }}\varepsilon \perp \!\!\!\perp {\textbf {x}}.} Note that the distribution of y ∣ x {\displaystyle y\mid {\textbf {x}}} is the same as the distribution of y ∣ β T x {\displ

    Read more →
  • Random forest

    Random forest

    Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that works by creating a multitude of decision trees during training. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the output is the average of the predictions of the trees. Random forests correct for decision trees' habit of overfitting to their training set. The first algorithm for random decision forests was created in 1995 by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg. An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who registered "Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab, Inc.). The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance. == History == The general method of random decision forests was first proposed by Salzberg and Heath in 1993, with a method that used a randomized decision tree algorithm to create multiple trees and then combine them using majority voting. This idea was developed further by Ho in 1995. Ho established that forests of trees splitting with oblique hyperplanes can gain accuracy as they grow without suffering from overtraining, as long as the forests are randomly restricted to be sensitive to only selected feature dimensions. A subsequent work along the same lines concluded that other splitting methods behave similarly, as long as they are randomly forced to be insensitive to some feature dimensions. This observation that a more complex classifier (a larger forest) gets more accurate nearly monotonically is in sharp contrast to the common belief that the complexity of a classifier can only grow to a certain level of accuracy before being hurt by overfitting. The explanation of the forest method's resistance to overtraining can be found in Kleinberg's theory of stochastic discrimination. The early development of Breiman's notion of random forests was influenced by the work of Amit and Geman who introduced the idea of searching over a random subset of the available decisions when splitting a node, in the context of growing a single tree. The idea of random subspace selection from Ho was also influential in the design of random forests. This method grows a forest of trees, and introduces variation among the trees by projecting the training data into a randomly chosen subspace before fitting each tree or each node. Finally, the idea of randomized node optimization, where the decision at each node is selected by a randomized procedure, rather than a deterministic optimization was first introduced by Thomas G. Dietterich. The proper introduction of random forests was made in a paper by Leo Breiman, that has become one of the world's most cited papers. This paper describes a method of building a forest of uncorrelated trees using a CART like procedure, combined with randomized node optimization and bagging. In addition, this paper combines several ingredients, some previously known and some novel, which form the basis of the modern practice of random forests, in particular: Using out-of-bag error as an estimate of the generalization error. Measuring variable importance through permutation. The report also offers the first theoretical result for random forests in the form of a bound on the generalization error which depends on the strength of the trees in the forest and their correlation. == Algorithm == === Preliminaries: decision tree learning === Decision trees are a popular method for various machine learning tasks. Tree learning is almost "an off-the-shelf procedure for data mining", say Hastie et al., "because it is invariant under scaling and various other transformations of feature values, is robust to inclusion of irrelevant features, and produces inspectable models. However, they are seldom accurate". In particular, trees that are grown very deep tend to learn highly irregular patterns: they overfit their training sets, i.e. have low bias, but very high variance. Random forests are a way of averaging multiple deep decision trees, trained on different parts of the same training set, with the goal of reducing the variance. This comes at the expense of a small increase in the bias and some loss of interpretability, but generally greatly boosts the performance in the final model. === Bagging === The training algorithm for random forests applies the general technique of bootstrap aggregating, or bagging, to tree learners. Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging repeatedly (B times) selects a random sample with replacement of the training set and fits trees to these samples: After training, predictions for unseen samples x' can be made by averaging the predictions from all the individual regression trees on x': f ^ = 1 B ∑ b = 1 B f b ( x ′ ) {\displaystyle {\hat {f}}={\frac {1}{B}}\sum _{b=1}^{B}f_{b}(x')} or by taking the plurality vote in the case of classification trees. This bootstrapping procedure leads to better model performance because it decreases the variance of the model, without increasing the bias. This means that while the predictions of a single tree are highly sensitive to noise in its training set, the average of many trees is not, as long as the trees are not correlated. Simply training many trees on a single training set would give strongly correlated trees (or even the same tree many times, if the training algorithm is deterministic); bootstrap sampling is a way of de-correlating the trees by showing them different training sets. Additionally, an estimate of the uncertainty of the prediction can be made as the standard deviation of the predictions from all the individual regression trees on x′: σ = ∑ b = 1 B ( f b ( x ′ ) − f ^ ) 2 B − 1 . {\displaystyle \sigma ={\sqrt {\frac {\sum _{b=1}^{B}(f_{b}(x')-{\hat {f}})^{2}}{B-1}}}.} The number B of samples (equivalently, of trees) is a free parameter. Typically, a few hundred to several thousand trees are used, depending on the size and nature of the training set. B can be optimized using cross-validation, or by observing the out-of-bag error: the mean prediction error on each training sample xi, using only the trees that did not have xi in their bootstrap sample. The training and test error tend to level off after some number of trees have been fit. === From bagging to random forests === The above procedure describes the original bagging algorithm for trees. Random forests also include another type of bagging scheme: they use a modified tree learning algorithm that selects, at each candidate split in the learning process, a random subset of the features. This process is sometimes called "feature bagging". The reason for doing this is the correlation of the trees in an ordinary bootstrap sample: if one or a few features are very strong predictors for the response variable (target output), these features will be selected in many of the B trees, causing them to become correlated. An analysis of how bagging and random subspace projection contribute to accuracy gains under different conditions is given by Ho. Typically, for a classification problem with p {\displaystyle p} features, p {\displaystyle {\sqrt {p}}} (rounded down) features are used in each split. For regression problems the inventors recommend p / 3 {\displaystyle p/3} (rounded down) with a minimum node size of 5 as the default. In practice, the best values for these parameters should be tuned on a case-to-case basis for every problem. === ExtraTrees === Adding one further step of randomization yields extremely randomized trees, or ExtraTrees. As with ordinary random forests, they are an ensemble of individual trees, but there are two main differences: (1) each tree is trained using the whole learning sample (rather than a bootstrap sample), and (2) the top-down splitting is randomized: for each feature under consideration, a number of random cut-points are selected, instead of computing the locally optimal cut-point (based on, e.g., information gain or the Gini impurity). The values are chosen from a uniform distribution within the feature's empirical range (in the tree's training set). Then, of all the randomly chosen splits, the split that yields the highest score is chosen to split the node. Similar to ordinary random forests, the number of randomly selected features to be considered at each node can be specified. Default values for this parameter are p {\displaystyle {\sqrt {p}}} for classification and p {\displaystyle p} for regression, where p {\displaystyle p} is the number of features in the model. === Random forests for high-dimensional data === The basic random forest procedure may

    Read more →
  • Multiple discriminant analysis

    Multiple discriminant analysis

    Multiple Discriminant Analysis (MDA) is a multivariate dimensionality reduction technique. It has been used to predict signals as diverse as neural memory traces and corporate failure. MDA is not directly used to perform classification. It merely supports classification by yielding a compressed signal amenable to classification. The method described in Duda et al. (2001) §3.8.3 projects the multivariate signal down to an M−1 dimensional space where M is the number of categories. MDA is useful because most classifiers are strongly affected by the curse of dimensionality. In other words, when signals are represented in very-high-dimensional spaces, the classifier's performance is catastrophically impaired by the overfitting problem. This problem is reduced by compressing the signal down to a lower-dimensional space as MDA does. MDA has been used to reveal neural codes.

    Read more →
  • Meesho

    Meesho

    Meesho Limited (short for Meri shop, transl. My shop) is an Indian e-commerce company, headquartered in Bengaluru. Founded by Vidit Aatrey and Sanjeev Barnwal in December 2015, Meesho is an online marketplace in categories such as fashion, home and kitchen, beauty and personal care, electronics accessories, and daily use products. == History == Meesho Private Limited, formerly Fashnear Technologies Private Limited, was established by IIT Delhi graduates Vidit Aatrey and Sanjeev Barnwal in December, 2015 In 2016, the founders came up with the idea of re-establishing the platform as Meesho, one that would enable country-wide shipping for resellers with the use of social media sites as tools for marketing. In February 2019, the platform reported having around 209,000 users and about 1.2 million monthly orders, and in March 2020, it reported approximately 563,000 users and 3.1 million monthly orders. In 2021, the Meesho mobile application was ranked among the most downloaded shopping apps globally. In 2022, Meesho had about 120 million monthly users and about 910 million orders were made through the platform, with a gross merchandise value (GMV) of about $5 billion. According to report as of August 2023 Meesho delisted 42 lakh counterfeit listings and 10 lakh restricted products under its initiative Project Suraksha. During the same period, the platform blocked access for over 12,000 user accounts flagged for policy violations. The Court granted injunctive relief by directing domain registrars to suspend the infringing websites. Additionally, the Court ordered law enforcement authorities to initiate criminal investigations, freeze associated financial accounts against the identified offenders. In 2023, Meesho became the fastest shopping app to cross over 500 million downloads. In 2024, Meesho introduced Valmo, a logistics marketplace, to provide shipment services to sellers by aggregating multiple logistics providers. Meesho employs over 3,000 small businesses and 10-12 large firms for warehousing and sorting operations within its logistics framework. According to media reports, Valmo operating in approximately 15,000 pincodes in India with around 6,000 partners. It is reported to handle over 50% of Meesho's daily orders. In November 2024, Meesho introduced a generative AI-powered voice bot for customer support, managing approximately 60,000 calls daily in English and Hindi. According to media reports, the system resolves the majority of queries without human assistance, with only a small fraction of calls requiring manual intervention. According to media reports, in 2024, Meesho prevented over 22 million suspicious or potentially fraudulent transactions on its platform. The company initiated legal proceedings, resulting in the filing of twelve cases, including nine specifically targeting over forty individuals in the cities of Kolkata and Ranchi. The company filed a suit in the Delhi High Court for a permanent injunction against parties operating deceptive websites misappropriating its brand identity. Meesha went public through an initial public offering in December 2025, raising $603 million. It is listed on both the BSE and NSE. == Recognition == In 2023, Meesho was named one of the most influential companies of the year by Time (magazine).

    Read more →
  • Vladimir Batagelj

    Vladimir Batagelj

    Vladimir Batagelj (born June 14, 1948 in Idrija, Yugoslavia) is a Slovenian mathematician and an emeritus professor of mathematics at the University of Ljubljana. He is known for his work in discrete mathematics and combinatorial optimization, particularly analysis of social networks and other large networks (blockmodeling). == Education and career == Vladimir Batagelj completed his Ph.D. at the University of Ljubljana in 1986 under the direction of Tomaž Pisanski. He stayed at the University of Ljubljana as a professor until his retirement, where he was a professor of sociology and statistics, while also being a chair of the Department of Sociology of the Faculty of Social Sciences. As visiting professor, he was taught at the University of Pittsburgh (1990-91) and at the University of Konstanz (2002). He was also a member of editorial boards of two journals: Informatica and Journal of Social Structure. His work has been cited over 11000 times. His book Exploratory Social Network Analysis with Pajek on blockmodeling, coauthored with Wouter de Nooy and Andrej Mrvar, is Batagelj's most cited work and has over 3300 citations. The book was translated into Chinese and Japanese. The revised and expanded third edition has been published by Cambridge University Press. In 1975, 11 years before completing his PhD, Batagelj published a solo paper in Communications of the ACM. Batagelj authored more than 20 textbooks in Slovenian, covering topics like TeX, combinatorics and discrete mathematics. He has also written extensively in the Slovenian popular science journal Presek. Batagelj has advised 9 Ph.D. students. == Pajek == Batagelj is particularly known for his work on Pajek, a freely available software for analysis and visualization of large networks. He began work on Pajek in 1996 with Andrej Mrvar, who was then his PhD student. == Awards and honors == First prizes for contributions (with Andrej Mrvar) to Graph Drawing Contests in years: 1995, 1996, 1997, 1998, 1999, 2000 and 2005 / Graph Drawing Hall of Fame. In 2007 the book Generalized blockmodeling was awarded the Harrison White Outstanding Book Award by the Mathematical Sociology Section of American Sociological Association In 2007 he was awarded (together with Anuška Ferligoj) the Simmel Award by INSNA. In 2013, Vladimir Batagelj and Andrej Mrvar received the INSNA's William D. Richards Software award for their work on Pajek. == Selected bibliography == Vladimir Batagelj, Social Network Analysis, Large-Scale [1]. in R.A. Meyers, ed., Encyclopedia of Complexity and Systems Science, Springer 2009: 8245–8265. Vladimir Batagelj, Complex Networks, Visualization of [2]. in R.A. Meyers, ed., Encyclopedia of Complexity and Systems Science, Springer 2009: 1253–1268. Wouter de Nooy, Andrej Mrvar, Vladimir Batagelj, Mark Granovetter (Series Editor), Exploratory Social Network Analysis with Pajek (Structural Analysis in the Social Sciences), Cambridge University Press 2005 (ISBN 0-521-60262-9). ESNA in Japanese, TDU, 2010. Patrick Doreian, Vladimir Batagelj, Anuška Ferligoj, Mark Granovetter (Series Editor), Generalized Blockmodeling (Structural Analysis in the Social Sciences), Cambridge University Press 2004 (ISBN 0-521-84085-6)

    Read more →
  • Chromosome (evolutionary algorithm)

    Chromosome (evolutionary algorithm)

    A chromosome or genotype in evolutionary algorithms (EA) is a set of parameters which define a proposed solution of the problem that the evolutionary algorithm is trying to solve. The set of all solutions, also called individuals according to the biological model, is known as the population. The genome of an individual consists of one, more rarely of several, chromosomes and corresponds to the genetic representation of the task to be solved. A chromosome is composed of a set of genes, where a gene consists of one or more semantically connected parameters, which are often also called decision variables. They determine one or more phenotypic characteristics of the individual or at least have an influence on them. In the basic form of genetic algorithms, the chromosome is represented as a binary string, while in later variants and in EAs in general, a wide variety of other data structures are used. == Chromosome design == When creating the genetic representation of a task, it is determined which decision variables and other degrees of freedom of the task should be improved by the EA and possible additional heuristics and how the genotype-phenotype mapping should look like. The design of a chromosome translates these considerations into concrete data structures for which an EA then has to be selected, configured, extended, or, in the worst case, created. Finding a suitable representation of the problem domain for a chromosome is an important consideration, as a good representation will make the search easier by limiting the search space; similarly, a poorer representation will allow a larger search space. In this context, suitable mutation and crossover operators must also be found or newly defined to fit the chosen chromosome design. An important requirement for these operators is that they not only allow all points in the search space to be reached in principle, but also make this as easy as possible. The following requirements must be met by a well-suited chromosome: It must allow the accessibility of all admissible points in the search space. Design of the chromosome in such a way that it covers only the search space and no additional areas. so that there is no redundancy or only as little redundancy as possible. Observance of strong causality: small changes in the chromosome should only lead to small changes in the phenotype. This is also called locality of the relationship between search and problem space. Designing the chromosome in such a way that it excludes prohibited regions in the search space completely or as much as possible. While the first requirement is indispensable, depending on the application and the EA used, one usually only has to be satisfied with fulfilling the remaining requirements as far as possible. The evolutionary search is supported and possibly considerably accelerated by a fulfillment as complete as possible. == Examples of chromosomes == === Chromosomes for binary codings === In their classical form, GAs use bit strings and map the decision variables to be optimized onto them. An example for one Boolean and three integer decision variables with the value ranges 0 ≤ D 1 ≤ 60 {\displaystyle 0\leq D_{1}\leq 60} , 28 ≤ D 2 ≤ 30 {\displaystyle 28\leq D_{2}\leq 30} and − 12 ≤ D 3 ≤ 14 {\displaystyle -12\leq D_{3}\leq 14} may illustrate this: Note that the negative number here is given in two's complement. This straight forward representation uses five bits to represent the three values of D 2 {\displaystyle D_{2}} , although two bits would suffice. This is a significant redundancy. An improved alternative, where 28 is to be added for the genotype-phenotype mapping, could look like this: with D 2 = 28 + D 2 ′ = 29 {\displaystyle D_{2}=28+D'_{2}=29} . === Chromosomes with real-valued or integer genes === For the processing of tasks with real-valued or mixed-integer decision variables, EAs such as the evolution strategy or the real-coded GAs are suited. In the case of mixed-integer values, rounding is often used, but this represents some violation of the redundancy requirement. If the necessary precisions of the real values can be reasonably narrowed down, this violation can be remedied by using integer-coded GAs. For this purpose, the valid digits of real values are mapped to integers by multiplication with a suitable factor. For example, 12.380 becomes the integer 12380 by multiplying by 1000. This must of course be taken into account in genotype-phenotype mapping for evaluation and result presentation. A common form is a chromosome consisting of a list or an array of integer or real values. === Chromosomes for permutations === Combinatorial problems are mainly concerned with finding an optimal sequence of a set of elementary items. As an example, consider the problem of the traveling salesman who wants to visit a given number of cities exactly once on the shortest possible tour. The simplest and most obvious mapping onto a chromosome is to number the cities consecutively, to interpret a resulting sequence as permutation and to store it directly in a chromosome, where one gene corresponds to the ordinal number of a city. Then, however, the variation operators may only change the gene order and not remove or duplicate any genes. The chromosome thus contains the path of a possible tour to the cities. As an example the sequence 3 , 5 , 7 , 1 , 4 , 2 , 9 , 6 , 8 {\displaystyle 3,5,7,1,4,2,9,6,8} of nine cities may serve, to which the following chromosome corresponds: In addition to this encoding frequently called path representation, there are several other ways of representing a permutation, for example the ordinal representation or the matrix representation. === Chromosomes for co-evolution === When a genetic representation contains, in addition to the decision variables, additional information that influences evolution and/or the mapping of the genotype to the phenotype and is itself subject to evolution, this is referred to as co-evolution. A typical example is the evolution strategy (ES), which includes one or more mutation step sizes as strategy parameters in each chromosome. Another example is an additional gene to control a selection heuristic for resource allocation in a scheduling tasks. This approach is based on the assumption that good solutions are based on an appropriate selection of strategy parameters or on control gene(s) that influences genotype-phenotype mapping. The success of the ES gives evidence to this assumption. === Chromosomes for complex representations === The chromosomes presented above are well suited for processing tasks of continuous, mixed-integer, pure-integer or combinatorial optimization. For a combination of these optimization areas, on the other hand, it becomes increasingly difficult to map them to simple strings of values, depending on the task. The following extension of the gene concept is proposed by the EA GLEAM (General Learning Evolutionary Algorithm and Method) for this purpose: A gene is considered to be the description of an element or elementary trait of the phenotype, which may have multiple parameters. For this purpose, gene types are defined that contain as many parameters of the appropriate data type as are required to describe the particular element of the phenotype. A chromosome now consists of genes as data objects of the gene types, whereby, depending on the application, each gene type occurs exactly once as a gene or can be contained in the chromosome any number of times. The latter leads to chromosomes of dynamic length, as they are required for some problems. The gene type definitions also contain information on the permissible value ranges of the gene parameters, which are observed during chromosome generation and by corresponding mutations, so they cannot lead to lethal mutations. For tasks with a combinatorial part, there are suitable genetic operators that can move or reposition genes as a whole, i.e. with their parameters. A scheduling task is used as an illustration, in which workflows are to be scheduled that require different numbers of heterogeneous resources. A workflow specifies which work steps can be processed in parallel and which have to be executed one after the other. In this context, heterogeneous resources mean different processing times at different costs in addition to different processing capabilities. Each scheduling operation therefore requires one or more parameters that determine the resource selection, where the value ranges of the parameters depend on the number of alternative resources available for each work step. A suitable chromosome provides one gene type per work step and in this case one corresponding gene, which has one parameter for each required resource. The order of genes determines the order of scheduling operations and, therefore, the precedence in case of allocation conflicts. The exemplary gene type definition of work step 15 with two resources, for which there are four and seven alternatives respectively

    Read more →