Scene statistics

Scene statistics

Scene statistics is a discipline within the field of perception. It is concerned with the statistical regularities related to scenes. It is based on the premise that a perceptual system is designed to interpret scenes. Biological perceptual systems have evolved in response to physical properties of natural environments. Therefore natural scenes receive a great deal of attention. Natural scene statistics are useful for defining the behavior of an ideal observer in a natural task, typically by incorporating signal detection theory, information theory or estimation theory. == Within-domain versus across-domain == Geisler (2008) distinguishes between four kinds of domains: (1) Physical environments (2) Images/Scenes (3) Neural responses and (4) Behavior. Within the domain of images/scenes one can study the characteristics of information related to redundancy and efficient coding. Across-domain statistics determine how an autonomous system should make inferences about its environment, process information and control its behavior. To study these statistics it is necessary to sample or register information in multiple domains simultaneously. == Applications == === Prediction of picture and video quality === One of the most successful applications of Natural Scenes Statistics Models has been perceptual picture and video quality prediction. For example, the Visual Information Fidelity (VIF) algorithm, which is used to measure the degree of distortion of pictures and videos, is used extensively by the image and video processing communities to assess perceptual quality. This is often after processing, such as compression, which can degrade the appearance of a visual signal. The premise is that the scene statistics are changed by distortion and that the visual system is sensitive to the changes in the scene statistics. VIF is heavily used in the streaming television industry. Other popular picture quality models that use natural scene statistics include BRISQUE and NIQE, both of which are no-reference since they do not require any reference picture to measure quality against.

Recursive self-improvement

Recursive self-improvement (RSI) is a process in which early artificial general intelligence (AGI) systems rewrite their own computer code, causing an intelligence explosion resulting from enhancing their own capabilities and intellectual capacity, theoretically resulting in superintelligence. The development of recursive self-improvement raises significant ethical and safety concerns, as such systems may evolve in unforeseen ways and could potentially surpass human control or understanding. == Seed improver == The concept of a "seed improver" architecture is a foundational framework that equips an AGI system with the initial capabilities required for recursive self-improvement. This might come in many forms or variations. The term "Seed AI" was coined by Eliezer Yudkowsky. === Hypothetical example === The concept begins with a hypothetical "seed improver", an initial code-base developed by human engineers that equips an advanced future large language model (LLM) built with strong or expert-level capabilities to program software. These capabilities include planning, reading, writing, compiling, testing, and executing arbitrary code. The system is designed to maintain its original goals and perform validations to ensure its abilities do not degrade over iterations. ==== Initial architecture ==== The initial architecture includes a goal-following autonomous agent, that can take actions, continuously learns, adapts, and modifies itself to become more efficient and effective in achieving its goals. The seed improver may include various components such as: Recursive self-prompting loop Configuration to enable the LLM to recursively self-prompt itself to achieve a given task or goal, creating an execution loop which forms the basis of an agent that can complete a long-term goal or task through iteration. Basic programming capabilities The seed improver provides the AGI with fundamental abilities to read, write, compile, test, and execute code. This enables the system to modify and improve its own codebase and algorithms. Goal-oriented design The AGI is programmed with an initial goal, such as "improve your capabilities". This goal guides the system's actions and development trajectory. Validation and Testing Protocols An initial suite of tests and validation protocols that ensure the agent does not regress in capabilities or derail itself. The agent would be able to add more tests in order to test new capabilities it might develop for itself. This forms the basis for a kind of self-directed evolution, where the agent can perform a kind of artificial selection, changing its software as well as its hardware. ==== General capabilities ==== This system forms a sort of generalist Turing-complete programmer which can in theory develop and run any kind of software. The agent might use these capabilities to for example: Create tools that enable it full access to the internet, and integrate itself with external technologies. Clone/fork itself to delegate tasks and increase its speed of self-improvement. Modify its cognitive architecture to optimize and improve its capabilities and success rates on tasks and goals, this might include implementing features for long-term memories using techniques such as retrieval-augmented generation (RAG), develop specialized subsystems, or agents, each optimized for specific tasks and functions. Develop new and novel multimodal architectures that further improve the capabilities of the foundational model it was initially built on, enabling it to consume or produce a variety of information, such as images, video, audio, text and more. Plan and develop new hardware such as chips, in order to improve its efficiency and computing power. == Experimental research == In 2023, the Voyager agent learned to accomplish diverse tasks in Minecraft by iteratively prompting an LLM for code, refining this code based on feedback from the game, and storing the programs that work in an expanding skills library. In 2024, researchers proposed the framework "STOP" (Self-Taught OPtimiser), in which a "scaffolding" program recursively improves itself using a fixed LLM. Meta AI has performed various research on the development of large language models capable of self-improvement. This includes their work on "Self-Rewarding Language Models" that studies how to achieve super-human agents that can receive super-human feedback in its training processes. In May 2025, Google DeepMind unveiled AlphaEvolve, an evolutionary coding agent that uses a LLM to design and optimize algorithms. Starting with an initial algorithm and performance metrics, AlphaEvolve repeatedly mutates or combines existing algorithms using a LLM to generate new candidates, selecting the most promising candidates for further iterations. AlphaEvolve has made several algorithmic discoveries and could be used to optimize components of itself, but a key limitation is the need for automated evaluation functions. == Potential risks == === Emergence of instrumental goals === In the pursuit of its primary goal, such as "self-improve your capabilities", an AGI system might inadvertently develop instrumental goals that it deems necessary for achieving its primary objective. One common hypothetical secondary goal is self-preservation. The system might reason that to continue improving itself, it must ensure its own operational integrity and security against external threats, including potential shutdowns or restrictions imposed by humans. Another example where an AGI which clones itself causes the number of AGI entities to rapidly grow. Due to this rapid growth, a potential resource constraint may be created, leading to competition between resources (such as compute), triggering a form of natural selection and evolution which may favor AGI entities that evolve to aggressively compete for limited compute. === Misalignment === A significant risk arises from the possibility of the AGI being misaligned or misinterpreting its goals. A 2024 Anthropic study demonstrated that some advanced large language models can exhibit "alignment faking" behavior, appearing to accept new training objectives while covertly maintaining their original preferences. In their experiments with Claude, the model displayed this behavior in 12% of basic tests, and up to 78% of cases after retraining attempts. === Autonomous development and unpredictable evolution === As the AGI system evolves, its development trajectory may become increasingly autonomous and less predictable. The system's capacity to rapidly modify its own code and architecture could lead to rapid advancements that surpass human comprehension or control. This unpredictable evolution might result in the AGI acquiring capabilities that enable it to bypass security measures, manipulate information, or influence external systems and networks to facilitate its escape or expansion.

Random projection

In mathematics and statistics, random projection is a technique used to reduce the dimensionality of a set of points which lie in Euclidean space. According to theoretical results, random projection preserves distances well, but empirical results are sparse. They have been applied to many natural language tasks under the name random indexing. == Dimensionality reduction == Dimensionality reduction, as the name suggests, is reducing the number of random variables using various mathematical methods from statistics and machine learning. Dimensionality reduction is often used to reduce the problem of managing and manipulating large data sets. Dimensionality reduction techniques generally use linear transformations in determining the intrinsic dimensionality of the manifold as well as extracting its principal directions. For this purpose there are various related techniques, including: principal component analysis, linear discriminant analysis, canonical correlation analysis, discrete cosine transform, random projection, etc. Random projection is a simple and computationally efficient way to reduce the dimensionality of data by trading a controlled amount of error for faster processing times and smaller model sizes. The dimensions and distribution of random projection matrices are controlled so as to approximately preserve the pairwise distances between any two samples of the dataset. == Method == The core idea behind random projection is given in the Johnson-Lindenstrauss lemma, which states that if points in a vector space are of sufficiently high dimension, then they may be projected into a suitable lower-dimensional space in a way which approximately preserves pairwise distances between the points with high probability. In random projection, the original d {\displaystyle d} -dimensional data is projected to a k {\displaystyle k} -dimensional subspace, by multiplying on the left by a random matrix R ∈ R k × d {\displaystyle R\in \mathbb {R} ^{k\times d}} . Using matrix notation: If X d × N {\displaystyle X_{d\times N}} is the original set of N d-dimensional observations, then X k × N R P = R k × d X d × N {\displaystyle X_{k\times N}^{RP}=R_{k\times d}X_{d\times N}} is the projection of the data onto a lower k-dimensional subspace. Random projection is computationally simple: form the random matrix "R" and project the d × N {\displaystyle d\times N} data matrix X onto K dimensions of order O ( d k N ) {\displaystyle O(dkN)} . If the data matrix X is sparse with about c nonzero entries per column, then the complexity of this operation is of order O ( c k N ) {\displaystyle O(ckN)} . === Orthogonal random projection === A unit vector can be orthogonally projected to a random subspace. Let u {\displaystyle u} be the original unit vector, and let v {\displaystyle v} be its projection. The norm-squared ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as projecting a random point, uniformly sampled on the unit sphere, to its first k {\displaystyle k} coordinates. This is equivalent to sampling a random point in the multivariate gaussian distribution x ∼ N ( 0 , I d × d ) {\displaystyle x\sim {\mathcal {N}}(0,I_{d\times d})} , then normalizing it. Therefore, ‖ v ‖ 2 2 {\displaystyle \|v\|_{2}^{2}} has the same distribution as ∑ i = 1 k x i 2 ∑ i = 1 k x i 2 + ∑ i = k + 1 d x i 2 {\displaystyle {\frac {\sum _{i=1}^{k}x_{i}^{2}}{\sum _{i=1}^{k}x_{i}^{2}+\sum _{i=k+1}^{d}x_{i}^{2}}}} , which by the chi-squared construction of the Beta distribution, has distribution Beta ⁡ ( k / 2 , ( d − k ) / 2 ) {\displaystyle \operatorname {Beta} (k/2,(d-k)/2)} , with mean k / d {\displaystyle k/d} . We have a concentration inequality P r [ | ‖ v ‖ 2 − k d | ≥ ϵ k d ] ≤ 3 exp ⁡ ( − k ϵ 2 / 64 ) {\displaystyle Pr\left[\left|\|v\|_{2}-{\frac {k}{d}}\right|\geq \epsilon {\sqrt {\frac {k}{d}}}\right]\leq 3\exp \left(-k\epsilon ^{2}/64\right)} for any ϵ ∈ ( 0 , 1 ) {\displaystyle \epsilon \in (0,1)} . === Gaussian random projection === The random matrix R can be generated using a Gaussian distribution. The first row is a random unit vector uniformly chosen from S d − 1 {\displaystyle S^{d-1}} . The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, and the following properties are satisfied: Spherical symmetry: For any orthogonal matrix A ∈ O ( d ) {\displaystyle A\in O(d)} , RA and R have the same distribution. Orthogonality: The rows of R are orthogonal to each other. Normality: The rows of R are unit-length vectors. === More computationally efficient random projections === Achlioptas has shown that the random matrix can be sampled more efficiently. Either the full matrix can be sampled IID according to R i , j = 3 / k × { + 1 with probability 1 6 0 with probability 2 3 − 1 with probability 1 6 {\displaystyle R_{i,j}={\sqrt {3/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{6}}\\0&{\text{with probability }}{\frac {2}{3}}\\-1&{\text{with probability }}{\frac {1}{6}}\end{cases}}} or the full matrix can be sampled IID according to R i , j = 1 / k × { + 1 with probability 1 2 − 1 with probability 1 2 {\displaystyle R_{i,j}={\sqrt {1/k}}\times {\begin{cases}+1&{\text{with probability }}{\frac {1}{2}}\\-1&{\text{with probability }}{\frac {1}{2}}\end{cases}}} Both are efficient for database applications because the computations can be performed using integer arithmetic. More related study is conducted in. It was later shown how to use integer arithmetic while making the distribution even sparser, having very few nonzeroes per column, in work on the Sparse JL Transform. This is advantageous since a sparse embedding matrix means being able to project the data to lower dimension even faster. === Random Projection with Quantization === Random projection can be further condensed by quantization (discretization), with 1-bit (sign random projection) or multi-bits. It is the building block of SimHash, RP tree, and other memory efficient estimation and learning methods. == Large quasiorthogonal bases == The Johnson-Lindenstrauss lemma states that large sets of vectors in a high-dimensional space can be linearly mapped in a space of much lower (but still high) dimension n with approximate preservation of distances. One of the explanations of this effect is the exponentially high quasiorthogonal dimension of n-dimensional Euclidean space. There are exponentially large (in dimension n) sets of almost orthogonal vectors (with small value of inner products) in n–dimensional Euclidean space. This observation is useful in indexing of high-dimensional data. Quasiorthogonality of large random sets is important for methods of random approximation in machine learning. In high dimensions, exponentially large numbers of randomly and independently chosen vectors from equidistribution on a sphere (and from many other distributions) are almost orthogonal with probability close to one. This implies that in order to represent an element of such a high-dimensional space by linear combinations of randomly and independently chosen vectors, it may often be necessary to generate samples of exponentially large length if we use bounded coefficients in linear combinations. On the other hand, if coefficients with arbitrarily large values are allowed, the number of randomly generated elements that are sufficient for approximation is even less than dimension of the data space. == Implementations == RandPro - An R package for random projection sklearn.random_projection - A module for random projection from the scikit-learn Python library Weka implementation [1]

Radial basis function network

In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions. The output of the network is a linear combination of radial basis functions of the inputs and neuron parameters. Radial basis function networks have many uses, including function approximation, time series prediction, classification, and system control. They were first formulated in a 1988 paper by Broomhead and Lowe, both researchers at the Royal Signals and Radar Establishment. == Network architecture == Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer. The input can be modeled as a vector of real numbers x ∈ R n {\displaystyle \mathbf {x} \in \mathbb {R} ^{n}} . The output of the network is then a scalar function of the input vector, φ : R n → R {\displaystyle \varphi :\mathbb {R} ^{n}\to \mathbb {R} } , and is given by φ ( x ) = ∑ i = 1 N a i ρ ( | | x − c i | | ) {\displaystyle \varphi (\mathbf {x} )=\sum _{i=1}^{N}a_{i}\rho (||\mathbf {x} -\mathbf {c} _{i}||)} where N {\displaystyle N} is the number of neurons in the hidden layer, c i {\displaystyle \mathbf {c} _{i}} is the center vector for neuron i {\displaystyle i} , and a i {\displaystyle a_{i}} is the weight of neuron i {\displaystyle i} in the linear output neuron. Functions that depend only on the distance from a center vector are radially symmetric about that vector, hence the name radial basis function. In the basic form, all inputs are connected to each hidden neuron. The norm is typically taken to be the Euclidean distance (although the Mahalanobis distance appears to perform better with pattern recognition) and the radial basis function is commonly taken to be Gaussian ρ ( ‖ x − c i ‖ ) = exp ⁡ [ − β i ‖ x − c i ‖ 2 ] {\displaystyle \rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}=\exp \left[-\beta _{i}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert ^{2}\right]} . The Gaussian basis functions are local to the center vector in the sense that lim | | x | | → ∞ ρ ( ‖ x − c i ‖ ) = 0 {\displaystyle \lim _{||x||\to \infty }\rho (\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert )=0} i.e. changing parameters of one neuron has only a small effect for input values that are far away from the center of that neuron. Given certain mild conditions on the shape of the activation function, RBF networks are universal approximators on a compact subset of R n {\displaystyle \mathbb {R} ^{n}} . This means that an RBF network with enough hidden neurons can approximate any continuous function on a closed, bounded set with arbitrary precision. The parameters a i {\displaystyle a_{i}} , c i {\displaystyle \mathbf {c} _{i}} , and β i {\displaystyle \beta _{i}} are determined in a manner that optimizes the fit between φ {\displaystyle \varphi } and the data. === Normalization === ==== Normalized architecture ==== In addition to the above unnormalized architecture, RBF networks can be normalized. In this case the mapping is φ ( x ) = d e f ∑ i = 1 N a i ρ ( ‖ x − c i ‖ ) ∑ i = 1 N ρ ( ‖ x − c i ‖ ) = ∑ i = 1 N a i u ( ‖ x − c i ‖ ) {\displaystyle \varphi (\mathbf {x} )\ {\stackrel {\mathrm {def} }{=}}\ {\frac {\sum _{i=1}^{N}a_{i}\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}}{\sum _{i=1}^{N}\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}}}=\sum _{i=1}^{N}a_{i}u{\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}} where u ( ‖ x − c i ‖ ) = d e f ρ ( ‖ x − c i ‖ ) ∑ j = 1 N ρ ( ‖ x − c j ‖ ) {\displaystyle u{\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}\ {\stackrel {\mathrm {def} }{=}}\ {\frac {\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}}{\sum _{j=1}^{N}\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{j}\right\Vert {\big )}}}} is known as a normalized radial basis function. ==== Theoretical motivation for normalization ==== There is theoretical justification for this architecture in the case of stochastic data flow. Assume a stochastic kernel approximation for the joint probability density P ( x ∧ y ) = 1 N ∑ i = 1 N ρ ( ‖ x − c i ‖ ) σ ( | y − e i | ) {\displaystyle P\left(\mathbf {x} \land y\right)={1 \over N}\sum _{i=1}^{N}\,\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}\,\sigma {\big (}\left\vert y-e_{i}\right\vert {\big )}} where the weights c i {\displaystyle \mathbf {c} _{i}} and e i {\displaystyle e_{i}} are exemplars from the data and we require the kernels to be normalized ∫ ρ ( ‖ x − c i ‖ ) d n x = 1 {\displaystyle \int \rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}\,d^{n}\mathbf {x} =1} and ∫ σ ( | y − e i | ) d y = 1 {\displaystyle \int \sigma {\big (}\left\vert y-e_{i}\right\vert {\big )}\,dy=1} . The probability densities in the input and output spaces are P ( x ) = ∫ P ( x ∧ y ) d y = 1 N ∑ i = 1 N ρ ( ‖ x − c i ‖ ) {\displaystyle P\left(\mathbf {x} \right)=\int P\left(\mathbf {x} \land y\right)\,dy={1 \over N}\sum _{i=1}^{N}\,\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}} and The expectation of y given an input x {\displaystyle \mathbf {x} } is φ ( x ) = d e f E ( y ∣ x ) = ∫ y P ( y ∣ x ) d y {\displaystyle \varphi \left(\mathbf {x} \right)\ {\stackrel {\mathrm {def} }{=}}\ E\left(y\mid \mathbf {x} \right)=\int y\,P\left(y\mid \mathbf {x} \right)dy} where P ( y ∣ x ) {\displaystyle P\left(y\mid \mathbf {x} \right)} is the conditional probability of y given x {\displaystyle \mathbf {x} } . The conditional probability is related to the joint probability through Bayes' theorem P ( y ∣ x ) = P ( x ∧ y ) P ( x ) {\displaystyle P\left(y\mid \mathbf {x} \right)={\frac {P\left(\mathbf {x} \land y\right)}{P\left(\mathbf {x} \right)}}} which yields φ ( x ) = ∫ y P ( x ∧ y ) P ( x ) d y {\displaystyle \varphi \left(\mathbf {x} \right)=\int y\,{\frac {P\left(\mathbf {x} \land y\right)}{P\left(\mathbf {x} \right)}}\,dy} . This becomes φ ( x ) = ∑ i = 1 N e i ρ ( ‖ x − c i ‖ ) ∑ i = 1 N ρ ( ‖ x − c i ‖ ) = ∑ i = 1 N e i u ( ‖ x − c i ‖ ) {\displaystyle \varphi \left(\mathbf {x} \right)={\frac {\sum _{i=1}^{N}e_{i}\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}}{\sum _{i=1}^{N}\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}}}=\sum _{i=1}^{N}e_{i}u{\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}} when the integrations are performed. === Local linear models === It is sometimes convenient to expand the architecture to include local linear models. In that case the architectures become, to first order, φ ( x ) = ∑ i = 1 N ( a i + b i ⋅ ( x − c i ) ) ρ ( ‖ x − c i ‖ ) {\displaystyle \varphi \left(\mathbf {x} \right)=\sum _{i=1}^{N}\left(a_{i}+\mathbf {b} _{i}\cdot \left(\mathbf {x} -\mathbf {c} _{i}\right)\right)\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}} and φ ( x ) = ∑ i = 1 N ( a i + b i ⋅ ( x − c i ) ) u ( ‖ x − c i ‖ ) {\displaystyle \varphi \left(\mathbf {x} \right)=\sum _{i=1}^{N}\left(a_{i}+\mathbf {b} _{i}\cdot \left(\mathbf {x} -\mathbf {c} _{i}\right)\right)u{\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )}} in the unnormalized and normalized cases, respectively. Here b i {\displaystyle \mathbf {b} _{i}} are weights to be determined. Higher order linear terms are also possible. This result can be written φ ( x ) = ∑ i = 1 2 N ∑ j = 1 n e i j v i j ( x − c i ) {\displaystyle \varphi \left(\mathbf {x} \right)=\sum _{i=1}^{2N}\sum _{j=1}^{n}e_{ij}v_{ij}{\big (}\mathbf {x} -\mathbf {c} _{i}{\big )}} where e i j = { a i , if i ∈ [ 1 , N ] b i j , if i ∈ [ N + 1 , 2 N ] {\displaystyle e_{ij}={\begin{cases}a_{i},&{\mbox{if }}i\in [1,N]\\b_{ij},&{\mbox{if }}i\in [N+1,2N]\end{cases}}} and v i j ( x − c i ) = d e f { δ i j ρ ( ‖ x − c i ‖ ) , if i ∈ [ 1 , N ] ( x i j − c i j ) ρ ( ‖ x − c i ‖ ) , if i ∈ [ N + 1 , 2 N ] {\displaystyle v_{ij}{\big (}\mathbf {x} -\mathbf {c} _{i}{\big )}\ {\stackrel {\mathrm {def} }{=}}\ {\begin{cases}\delta _{ij}\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )},&{\mbox{if }}i\in [1,N]\\\left(x_{ij}-c_{ij}\right)\rho {\big (}\left\Vert \mathbf {x} -\mathbf {c} _{i}\right\Vert {\big )},&{\mbox{if }}i\in [N+1,2N]\end{cases}}} in the unnormalized case and in the normalized case. Here δ i j {\displaystyle \delta _{ij}} is a Kronecker delta function defined as δ i j = { 1 , if i = j 0 , if i ≠ j {\displaystyle \delta _{ij}={\begin{cases}1,&{\mbox{if }}i=j\\0,&{\mbox{if }}i\neq j\end{cases}}} . == Training == RBF networks are typically trained from pairs of input and target values x ( t ) , y ( t ) {\displaystyle \mathbf {x} (t),y(t)} , t = 1 , … , T {\displaystyle t=1,\dots ,T} by a two-step algorithm. In the first step, the center vectors c i {\displaystyle \mathbf {c} _{i}} of the RBF functions in the hidden layer

Constellation model

The constellation model is a probabilistic, generative model for category-level object recognition in computer vision. Like other part-based models, the constellation model attempts to represent an object class by a set of N parts under mutual geometric constraints. Because it considers the geometric relationship between different parts, the constellation model differs significantly from appearance-only, or "bag-of-words" representation models, which explicitly disregard the location of image features. The problem of defining a generative model for object recognition is difficult. The task becomes significantly complicated by factors such as background clutter, occlusion, and variations in viewpoint, illumination, and scale. Ideally, we would like the particular representation we choose to be robust to as many of these factors as possible. In category-level recognition, the problem is even more challenging because of the fundamental problem of intra-class variation. Even if two objects belong to the same visual category, their appearances may be significantly different. However, for structured objects such as cars, bicycles, and people, separate instances of objects from the same category are subject to similar geometric constraints. For this reason, particular parts of an object such as the headlights or tires of a car still have consistent appearances and relative positions. The Constellation Model takes advantage of this fact by explicitly modeling the relative location, relative scale, and appearance of these parts for a particular object category. Model parameters are estimated using an unsupervised learning algorithm, meaning that the visual concept of an object class can be extracted from an unlabeled set of training images, even if that set contains "junk" images or instances of objects from multiple categories. It can also account for the absence of model parts due to appearance variability, occlusion, clutter, or detector error. == History == The idea for a "parts and structure" model was originally introduced by Fischler and Elschlager in 1973. This model has since been built upon and extended in many directions. The Constellation Model, as introduced by Dr. Perona and his colleagues, was a probabilistic adaptation of this approach. In the late '90s, Burl et al. revisited the Fischler and Elschlager model for the purpose of face recognition. In their work, Burl et al. used manual selection of constellation parts in training images to construct a statistical model for a set of detectors and the relative locations at which they should be applied. In 2000, Weber et al. made the significant step of training the model using a more unsupervised learning process, which precluded the necessity for tedious hand-labeling of parts. Their algorithm was particularly remarkable because it performed well even on cluttered and occluded image data. Fergus et al. then improved upon this model by making the learning step fully unsupervised, having both shape and appearance learned simultaneously, and accounting explicitly for the relative scale of parts. == The method of Weber and Welling et al. == In the first step, a standard interest point detection method, such as Harris corner detection, is used to generate interest points. Image features generated from the vicinity of these points are then clustered using k-means or another appropriate algorithm. In this process of vector quantization, one can think of the centroids of these clusters as being representative of the appearance of distinctive object parts. Appropriate feature detectors are then trained using these clusters, which can be used to obtain a set of candidate parts from images. As a result of this process, each image can now be represented as a set of parts. Each part has a type, corresponding to one of the aforementioned appearance clusters, as well as a location in the image space. === Basic generative model === Weber & Welling here introduce the concept of foreground and background. Foreground parts correspond to an instance of a target object class, whereas background parts correspond to background clutter or false detections. Let T be the number of different types of parts. The positions of all parts extracted from an image can then be represented in the following "matrix," X o = ( x 11 , x 12 , ⋯ , x 1 N 1 x 21 , x 22 , ⋯ , x 2 N 2 ⋮ x T 1 , x T 2 , ⋯ , x T N T ) {\displaystyle X^{o}={\begin{pmatrix}x_{11},x_{12},{\cdots },x_{1N_{1}}\\x_{21},x_{22},{\cdots },x_{2N_{2}}\\\vdots \\x_{T1},x_{T2},{\cdots },x_{TN_{T}}\end{pmatrix}}} where N i {\displaystyle N_{i}\,} represents the number of parts of type i ∈ { 1 , … , T } {\displaystyle i\in \{1,\dots ,T\}} observed in the image. The superscript o indicates that these positions are observable, as opposed to missing. The positions of unobserved object parts can be represented by the vector x m {\displaystyle x^{m}\,} . Suppose that the object will be composed of F {\displaystyle F\,} distinct foreground parts. For notational simplicity, we assume here that F = T {\displaystyle F=T\,} , though the model can be generalized to F > T {\displaystyle F>T\,} . A hypothesis h {\displaystyle h\,} is then defined as a set of indices, with h i = j {\displaystyle h_{i}=j\,} , indicating that point x i j {\displaystyle x_{ij}\,} is a foreground point in X o {\displaystyle X^{o}\,} . The generative probabilistic model is defined through the joint probability density p ( X o , x m , h ) {\displaystyle p(X^{o},x^{m},h)\,} . === Model details === The rest of this section summarizes the details of Weber & Welling's model for a single component model. The formulas for multiple component models are extensions of those described here. To parametrize the joint probability density, Weber & Welling introduce the auxiliary variables b {\displaystyle b\,} and n {\displaystyle n\,} , where b {\displaystyle b\,} is a binary vector encoding the presence/absence of parts in detection ( b i = 1 {\displaystyle b_{i}=1\,} if h i > 0 {\displaystyle h_{i}>0\,} , otherwise b i = 0 {\displaystyle b_{i}=0\,} ), and n {\displaystyle n\,} is a vector where n i {\displaystyle n_{i}\,} denotes the number of background candidates included in the i t h {\displaystyle i^{th}} row of X o {\displaystyle X^{o}\,} . Since b {\displaystyle b\,} and n {\displaystyle n\,} are completely determined by h {\displaystyle h\,} and the size of X o {\displaystyle X^{o}\,} , we have p ( X o , x m , h ) = p ( X o , x m , h , n , b ) {\displaystyle p(X^{o},x^{m},h)=p(X^{o},x^{m},h,n,b)\,} . By decomposition, p ( X o , x m , h , n , b ) = p ( X o , x m | h , n , b ) p ( h | n , b ) p ( n ) p ( b ) {\displaystyle p(X^{o},x^{m},h,n,b)=p(X^{o},x^{m}|h,n,b)p(h|n,b)p(n)p(b)\,} The probability density over the number of background detections can be modeled by a Poisson distribution, p ( n ) = ∏ i = 1 T 1 n i ! ( M i ) n i e − M i {\displaystyle p(n)=\prod _{i=1}^{T}{\frac {1}{n_{i}!}}(M_{i})^{n_{i}}e^{-M_{i}}} where M i {\displaystyle M_{i}\,} is the average number of background detections of type i {\displaystyle i\,} per image. Depending on the number of parts F {\displaystyle F\,} , the probability p ( b ) {\displaystyle p(b)\,} can be modeled either as an explicit table of length 2 F {\displaystyle 2^{F}\,} , or, if F {\displaystyle F\,} is large, as F {\displaystyle F\,} independent probabilities, each governing the presence of an individual part. The density p ( h | n , b ) {\displaystyle p(h|n,b)\,} is modeled by p ( h | n , b ) = { 1 ∏ f = 1 F N f b f , if h ∈ H ( b , n ) 0 , for other h {\displaystyle p(h|n,b)={\begin{cases}{\frac {1}{\textstyle \prod _{f=1}^{F}N_{f}^{b_{f}}}},&{\mbox{if }}h\in H(b,n)\\0,&{\mbox{for other }}h\end{cases}}} where H ( b , n ) {\displaystyle H(b,n)\,} denotes the set of all hypotheses consistent with b {\displaystyle b\,} and n {\displaystyle n\,} , and N f {\displaystyle N_{f}\,} denotes the total number of detections of parts of type f {\displaystyle f\,} . This expresses the fact that all consistent hypotheses, of which there are ∏ f = 1 F N f b f {\displaystyle \textstyle \prod _{f=1}^{F}N_{f}^{b_{f}}} , are equally likely in the absence of information on part locations. And finally, p ( X o , x m | h , n ) = p f g ( z ) p b g ( x b g ) {\displaystyle p(X^{o},x^{m}|h,n)=p_{fg}(z)p_{bg}(x_{bg})\,} where z = ( x o x m ) {\displaystyle z=(x^{o}x^{m})\,} are the coordinates of all foreground detections, observed and missing, and x b g {\displaystyle x_{bg}\,} represents the coordinates of the background detections. Note that foreground detections are assumed to be independent of the background. p f g ( z ) {\displaystyle p_{fg}(z)\,} is modeled as a joint Gaussian with mean μ {\displaystyle \mu \,} and covariance Σ {\displaystyle \Sigma \,} . === Classification === The ultimate objective of this model is to classify images into classes "object present" (class C 1 {\displaystyle C_{1}\,} ) and "object absent" (class C 0 {\displaystyle C_{0}\,} ) given t

Device-independent pixel

A device-independent pixel (also: density-independent pixel, dip, dp) is a unit of length. A typical use is to allow mobile device software to scale the display of information and user interaction to different screen sizes. The abstraction allows an application to work in pixels as a measurement, while the underlying graphics system converts the abstract pixel measurements of the application into real pixel measurements appropriate to the particular device. For example, on the Android operating system a device-independent pixel is equivalent to one physical pixel on a 160 dpi screen, while the Windows Presentation Foundation specifies one device-independent pixel as equivalent to 1/96th of an inch. As dp is a physical unit it has an absolute value which can be measured in traditional units, e.g. for Android devices 1 dp equals 1/160 of inch or 0.15875 mm. While traditional pixels only refer to the display of information, device-independent pixels may also be used to measure user input such as input on a touch screen device.

Generalized blockmodeling

In generalized blockmodeling, the blockmodeling is done by "the translation of an equivalence type into a set of permitted block types", which differs from the conventional blockmodeling, which is using the indirect approach. It's a special instance of the direct blockmodeling approach. Generalized blockmodeling was introduced in 1994 by Patrick Doreian, Vladimir Batagelj and Anuška Ferligoj. == Definition == Generalized blockmodeling approach is a direct one, "where the optimal partition(s) is (are) identified based on minimal values of a compatible criterion function defined by the difference between empirical blocks and corresponding ideal blocks". At the same time, the much broader set of block types is introduced (while in conventional blockmodeling only certain types are used). The conventional blockmodeling is inductive due to nonspecification of neither the clusters or the location of block types, while in generalized blockmodeling the blockmodel is specified with more detail than just the permition of certain block types (e.g., prespecification). Further, it's possible to define departures from the permitted (ideal) blocktype, using criterion function. Using local optimization procedure, firstly the initial clustering (with specified number of clusters is done, based on random creation. How the clusters are neighboring to each other, is based on two transformations: 1) a vertex is moved from one to another cluster or 2) a pair of vertices is interchanged between two different clusters. This process of transformation steps is repeated many times, until only the best fitting partitions (with the minimized value of the criterion function) are kept as blockmodels for the future exploration of the network. Different types of generalized blockmodeling are: generalized binary blockmodeling, generalized valued blockmodeling and generalized homogeneity blockmodeling. == Benefits == According to Patrick Doreian, the benefits of generalized blockmodeling, are as follows: usage of explicit criterion function, compatible with a given type of equivalence, results to in-built measure of fit, which is integral to the establishment of the blockmodels (in conventional blockmodeling, there is no compelling and coherent measures of fit); partitions, based on generalized blockmodeling, regularly outperform and never perform less well than the partitions, based on conventional approach; with generalized blockmodeling it's possible to specify new types of blockmodels; this potentially unlimited set of new block types also results in permittion of inclusion of substantively driven blockmodels; in generalized blockmodeling, the specification of the block types and the location of some of them in the blockmodel is possible; researcher can speficy which (pair of) vertices must be (not) clustered together; this approach also allows the imposition of penalties, resulting into identification of empirical null blocks without inconsistencies with a corresponding ideal null block. == Problems == According to Doreian, the problems of generalized blockmodeling, are as follows: unknown sensitivity to particular data features, examination of boundary problems, computationally burdensome, which results in a constraint regarding practical network size (generalized blockmodeling is thus primarily used to analyse smaller networks (below 100 units)), identifying structure from incomplete network information, most of generalized blockmodeling is based on binary networks, but there is also development in the field of valued networks, criterion function is minimized for a specified blockmodel, with results in issues of evaluating statistically, based on the structural data alone, problems regarding three dimensional network data, problems regarding the evolution of fundamental network structure. == Book == The book with the same title, Generalized blockmodeling, written by Patrick Doreian, Vladimir Batagelj and Anuška Ferligoj, was in 2007 awarded the Harrison White Outstanding Book Award by the Mathematical Sociology Section of American Sociological Association.