Vapnik–Chervonenkis theory

Vapnik–Chervonenkis theory

Vapnik–Chervonenkis theory (also known as VC theory) was developed during 1960–1990 by Vladimir Vapnik and Alexey Chervonenkis. The theory is a form of computational learning theory, which attempts to explain the learning process from a statistical point of view. == Introduction == VC theory covers at least four parts (as explained in The Nature of Statistical Learning Theory): Theory of consistency of learning processes What are (necessary and sufficient) conditions for consistency of a learning process based on the empirical risk minimization principle? Nonasymptotic theory of the rate of convergence of learning processes How fast is the rate of convergence of the learning process? Theory of controlling the generalization ability of learning processes How can one control the rate of convergence (the generalization ability) of the learning process? Theory of constructing learning machines How can one construct algorithms that can control the generalization ability? VC Theory is a major subbranch of statistical learning theory. One of its main applications in statistical learning theory is to provide generalization conditions for learning algorithms. From this point of view, VC theory is related to stability, which is an alternative approach for characterizing generalization. In addition, VC theory and VC dimension are instrumental in the theory of empirical processes, in the case of processes indexed by VC classes. Arguably these are the most important applications of the VC theory, and are employed in proving generalization. Several techniques will be introduced that are widely used in the empirical process and VC theory. The discussion is mainly based on the book Weak Convergence and Empirical Processes: With Applications to Statistics. == Overview of VC theory in empirical processes == === Background on empirical processes === Let ( X , A ) {\displaystyle ({\mathcal {X}},{\mathcal {A}})} be a measurable space. For any measure Q {\displaystyle Q} on ( X , A ) {\displaystyle ({\mathcal {X}},{\mathcal {A}})} , and any measurable functions f : X → R {\displaystyle f:{\mathcal {X}}\to \mathbf {R} } , define Q f = ∫ f d Q {\displaystyle Qf=\int fdQ} Measurability issues will be ignored here, for more technical detail see. Let F {\displaystyle {\mathcal {F}}} be a class of measurable functions f : X → R {\displaystyle f:{\mathcal {X}}\to \mathbf {R} } and define: ‖ Q ‖ F = sup { | Q f | : f ∈ F } . {\displaystyle \|Q\|_{\mathcal {F}}=\sup\{\vert Qf\vert \ :\ f\in {\mathcal {F}}\}.} Let X 1 , … , X n {\displaystyle X_{1},\ldots ,X_{n}} be independent, identically distributed random elements of ( X , A ) {\displaystyle ({\mathcal {X}},{\mathcal {A}})} . Then define the empirical measure P n = n − 1 ∑ i = 1 n δ X i , {\displaystyle \mathbb {P} _{n}=n^{-1}\sum _{i=1}^{n}\delta _{X_{i}},} where δ here stands for the Dirac measure. The empirical measure induces a map F → R {\displaystyle {\mathcal {F}}\to \mathbf {R} } given by: f ↦ P n f = 1 n ( f ( X 1 ) + . . . + f ( X n ) ) {\displaystyle f\mapsto \mathbb {P} _{n}f={\frac {1}{n}}(f(X_{1})+...+f(X_{n}))} Now suppose P is the underlying true distribution of the data, which is unknown. Empirical Processes theory aims at identifying classes F {\displaystyle {\mathcal {F}}} for which statements such as the following hold: uniform law of large numbers: ‖ P n − P ‖ F → n 0 , {\displaystyle \|\mathbb {P} _{n}-P\|_{\mathcal {F}}{\underset {n}{\to }}0,} That is, as n → ∞ {\displaystyle n\to \infty } , | 1 n ( f ( X 1 ) + . . . + f ( X n ) ) − ∫ f d P | → 0 {\displaystyle \left|{\frac {1}{n}}(f(X_{1})+...+f(X_{n}))-\int fdP\right|\to 0} uniformly for all f ∈ F {\displaystyle f\in {\mathcal {F}}} . uniform central limit theorem: G n = n ( P n − P ) ⇝ G , in ℓ ∞ ( F ) {\displaystyle \mathbb {G} _{n}={\sqrt {n}}(\mathbb {P} _{n}-P)\rightsquigarrow \mathbb {G} ,\quad {\text{in }}\ell ^{\infty }({\mathcal {F}})} In the former case F {\displaystyle {\mathcal {F}}} is called Glivenko–Cantelli class, and in the latter case (under the assumption ∀ x , sup f ∈ F | f ( x ) − P f | < ∞ {\displaystyle \forall x,\sup \nolimits _{f\in {\mathcal {F}}}\vert f(x)-Pf\vert <\infty } ) the class F {\displaystyle {\mathcal {F}}} is called Donsker or P-Donsker. A Donsker class is Glivenko–Cantelli in probability by an application of Slutsky's theorem. These statements are true for a single f {\displaystyle f} , by standard LLN, CLT arguments under regularity conditions, and the difficulty in the Empirical Processes comes in because joint statements are being made for all f ∈ F {\displaystyle f\in {\mathcal {F}}} . Intuitively then, the set F {\displaystyle {\mathcal {F}}} cannot be too large, and as it turns out that the geometry of F {\displaystyle {\mathcal {F}}} plays a very important role. One way of measuring how big the function set F {\displaystyle {\mathcal {F}}} is to use the so-called covering numbers. The covering number N ( ε , F , ‖ ⋅ ‖ ) {\displaystyle N(\varepsilon ,{\mathcal {F}},\|\cdot \|)} is the minimal number of balls { g : ‖ g − f ‖ < ε } {\displaystyle \{g:\|g-f\|<\varepsilon \}} needed to cover the set F {\displaystyle {\mathcal {F}}} (here it is obviously assumed that there is an underlying norm on F {\displaystyle {\mathcal {F}}} ). The entropy is the logarithm of the covering number. Two sufficient conditions are provided below, under which it can be proved that the set F {\displaystyle {\mathcal {F}}} is Glivenko–Cantelli or Donsker. A class F {\displaystyle {\mathcal {F}}} is P-Glivenko–Cantelli if it is P-measurable with envelope F such that P ∗ F < ∞ {\displaystyle P^{\ast }F<\infty } and satisfies: ∀ ε > 0 sup Q N ( ε ‖ F ‖ Q , F , L 1 ( Q ) ) < ∞ . {\displaystyle \forall \varepsilon >0\quad \sup \nolimits _{Q}N(\varepsilon \|F\|_{Q},{\mathcal {F}},L_{1}(Q))<\infty .} The next condition is a version of Dudley's theorem. If F {\displaystyle {\mathcal {F}}} is a class of functions such that ∫ 0 ∞ sup Q log ⁡ N ( ε ‖ F ‖ Q , 2 , F , L 2 ( Q ) ) d ε < ∞ {\displaystyle \int _{0}^{\infty }\sup \nolimits _{Q}{\sqrt {\log N\left(\varepsilon \|F\|_{Q,2},{\mathcal {F}},L_{2}(Q)\right)}}d\varepsilon <\infty } then F {\displaystyle {\mathcal {F}}} is P-Donsker for every probability measure P such that P ∗ F 2 < ∞ {\displaystyle P^{\ast }F^{2}<\infty } . In the last integral, the notation means ‖ f ‖ Q , 2 = ( ∫ | f | 2 d Q ) 1 2 {\displaystyle \|f\|_{Q,2}=\left(\int |f|^{2}dQ\right)^{\frac {1}{2}}} . === Symmetrization === The majority of the arguments about how to bound the empirical process rely on symmetrization, maximal and concentration inequalities, and chaining. Symmetrization is usually the first step of the proofs, and since it is used in many machine learning proofs on bounding empirical loss functions (including the proof of the VC inequality which is discussed in the next section). It is presented here: Consider the empirical process: f ↦ ( P n − P ) f = 1 n ∑ i = 1 n ( f ( X i ) − P f ) {\displaystyle f\mapsto (\mathbb {P} _{n}-P)f={\dfrac {1}{n}}\sum _{i=1}^{n}(f(X_{i})-Pf)} Turns out that there is a connection between the empirical and the following symmetrized process: f ↦ P n 0 f = 1 n ∑ i = 1 n ε i f ( X i ) {\displaystyle f\mapsto \mathbb {P} _{n}^{0}f={\dfrac {1}{n}}\sum _{i=1}^{n}\varepsilon _{i}f(X_{i})} The symmetrized process is a Rademacher process, conditionally on the data X i {\displaystyle X_{i}} . Therefore, it is a sub-Gaussian process by Hoeffding's inequality. Lemma (Symmetrization). For every nondecreasing, convex Φ: R → R and class of measurable functions F {\displaystyle {\mathcal {F}}} , E Φ ( ‖ P n − P ‖ F ) ≤ E Φ ( 2 ‖ P n 0 ‖ F ) {\displaystyle \mathbb {E} \Phi (\|\mathbb {P} _{n}-P\|_{\mathcal {F}})\leq \mathbb {E} \Phi \left(2\left\|\mathbb {P} _{n}^{0}\right\|_{\mathcal {F}}\right)} The proof of the Symmetrization lemma relies on introducing independent copies of the original variables X i {\displaystyle X_{i}} (sometimes referred to as a ghost sample) and replacing the inner expectation of the LHS by these copies. After an application of Jensen's inequality different signs could be introduced (hence the name symmetrization) without changing the expectation. The proof can be found below because of its instructive nature. The same proof method can be used to prove the Glivenko–Cantelli theorem. A typical way of proving empirical CLTs, first uses symmetrization to pass the empirical process to P n 0 {\displaystyle \mathbb {P} _{n}^{0}} and then argue conditionally on the data, using the fact that Rademacher processes are simple processes with nice properties. === VC Connection === It turns out that there is a fascinating connection between certain combinatorial properties of the set F {\displaystyle {\mathcal {F}}} and the entropy numbers. Uniform covering numbers can be controlled by the notion of Vapnik–Chervonenkis classes of sets – or shortly VC sets. Consider a collection C {\displaystyle {\mathcal {C}}} of subsets of the sample space X {\displaystyle

Gorn address

A Gorn address (Gorn, 1967) is a method of identifying and addressing any node within a tree data structure. This notation is often used for identifying nodes in a parse tree defined by phrase structure rules. The Gorn address is a sequence of zero or more integers conventionally separated by dots, e.g., 0 or 1.0.1. The root which Gorn calls can be regarded as the empty sequence. And the j {\displaystyle j} -th child of the i {\displaystyle i} -th child has an address i . j {\displaystyle i.j} , counting from 0. It is named after American computer scientist Saul Gorn.

Best AI Writing Assistants in 2026

In search of the best AI writing assistant? An AI writing assistant is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI writing assistant slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

AI Video Editors Reviews: What Actually Works in 2026

Curious about the best AI video editor? An AI video editor is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI video editor slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

Optical Character Recognition (Unicode block)

Optical Character Recognition is a Unicode block containing signal characters for OCR and MICR standards. == Block == == Subheadings == The Optical Character Recognition block has three informal subheadings (groupings) within its character collection: OCR-A, MICR, and OCR. === OCR-A === The OCR-A subheading contains six characters taken from the OCR-A font described in the ISO 1073-1:1976 standard: U+2440 ⑀ OCR HOOK, U+2441 ⑁ OCR CHAIR, U+2442 ⑂ OCR FORK, U+2443 ⑃ OCR INVERTED FORK, U+2444 ⑄ OCR BELT BUCKLE, and U+2445 ⑅ OCR BOW TIE. The OCR bow tie is given the informative alias "unique asterisk". The hook, chair and fork, in addition to a long vertical bar, are included in the most basic "numeric" implementation level of OCR-A, which includes digits but excludes letters and conventional punctuation. By contrast, the most basic implementation level of OCR-B instead includes the digits, plus sign, less-than sign, greater-than sign, long vertical bar and seven of the capital letters; as such, there are no characters specific to OCR-B in the Optical Character Recognition block. === MICR === The MICR subheading contains four punctuation characters for bank cheque identifiers, taken from the magnetic ink character recognition E-13B font (codified in the ISO 1004:1995 standard): U+2446 ⑆ OCR BRANCH BANK IDENTIFICATION, U+2447 ⑇ OCR AMOUNT OF CHECK, U+2448 ⑈ OCR DASH, and U+2449 ⑉ OCR CUSTOMER ACCOUNT NUMBER. The latter two characters are misnamed: their names were inadvertently switched when they were named in the 1993 (first) edition of ISO/IEC 10646, a mistake which had been present since Unicode 1.0.0. Although their formal names remain unchanged due to the Unicode stability policy, they both have corrected normative aliases: U+2448 ⑈ is MICR ON US SYMBOL, and U+2449 ⑉ is MICR DASH SYMBOL (the standard notes that "the Unicode character names include several misnomers"). These symbols had previously been encoded by the ISO-IR-98 encoding defined by ISO 2033:1983, in which they were simply named SYMBOL ONE through SYMBOL FOUR. All four characters have informative aliases in the Unicode charts: "transit", "amount", "on us", and "dash" respectively. === OCR === The OCR subheading consists of a single character: U+244A ⑊ OCR DOUBLE BACKSLASH. == History == The following Unicode-related documents record the purpose and process of defining specific characters in the Optical Character Recognition block:

Deep Learning Indaba

The Deep Learning Indaba is an annual conference and educational event that aims to strengthen machine learning and artificial intelligence (AI) capacity across Africa. Launched in 2017, it brings together students, researchers, industry practitioners, and policymakers from across the African continent. == History == The Deep Learning Indaba began in 2017 at the University of the Witwatersrand with over 300 participants from 23 African countries, offering tutorials in advanced AI topics and featuring notable speakers like Nando de Freitas. In 2018, it expanded to 650 delegates at Stellenbosch University, introducing parallel sessions to encourage collaboration. The 2019 edition in Nairobi, Kenya, reflected further growth, with increasing sponsorship and support from major tech companies like Google and Microsoft. === Deep Learning IndabaX ===

Krohn–Rhodes theory

In mathematics and computer science, the Krohn–Rhodes theory (or algebraic automata theory) is an approach to the study of finite semigroups and automata that seeks to decompose them in terms of elementary components. These components correspond to finite aperiodic semigroups and finite simple groups that are combined in a feedback-free manner (called a "wreath product" or "cascade"). Krohn and Rhodes found a general decomposition for finite automata. The authors discovered and proved an unexpected major result in finite semigroup theory, revealing a deep connection between finite automata and semigroups. Decidability of Krohn-Rhodes complexity long motivated much work in semigroup theory. In June 2024, Stuart Margolis, John Rhodes, and Anne Schilling announced a proof that the complexity is decidable. == Definitions and description of the Krohn–Rhodes theorem == Let T {\displaystyle T} be a semigroup. A semigroup S {\displaystyle S} that is a homomorphic image of a subsemigroup of T {\displaystyle T} is said to be a divisor of T {\displaystyle T} . The Krohn–Rhodes theorem for finite semigroups states that every finite semigroup S {\displaystyle S} is a divisor of a finite alternating wreath product of finite simple groups, each a divisor of S {\displaystyle S} , and finite aperiodic semigroups (which contain no nontrivial subgroups). In the automata formulation, the Krohn–Rhodes theorem for finite automata states that given a finite automaton A {\displaystyle A} with states Q {\displaystyle Q} and input alphabet I {\displaystyle I} , output alphabet U {\displaystyle U} , then one can expand the states to Q ′ {\displaystyle Q'} such that the new automaton A ′ {\displaystyle A'} embeds into a cascade of "simple", irreducible automata: In particular, A {\displaystyle A} is emulated by a feed-forward cascade of (1) automata whose transformation semigroups are finite simple groups and (2) automata that are banks of flip-flops running in parallel. The new automaton A ′ {\displaystyle A'} has the same input and output symbols as A {\displaystyle A} . Here, both the states and inputs of the cascaded automata have a very special hierarchical coordinate form. Moreover, each simple group (prime) or non-group irreducible semigroup (subsemigroup of the flip-flop monoid) that divides the transformation semigroup of A {\displaystyle A} must divide the transformation semigroup of some component of the cascade, and only the primes that must occur as divisors of the components are those that divide A {\displaystyle A} 's transformation semigroup. == Group complexity == The Krohn–Rhodes complexity (also called group complexity or just complexity) of a finite semigroup S is the least number of groups in a wreath product of finite groups and finite aperiodic semigroups of which S is a divisor. All finite aperiodic semigroups have complexity 0, while non-trivial finite groups have complexity 1. In fact, there are semigroups of every non-negative integer complexity. For example, for any n greater than 1, the multiplicative semigroup of all (n+1) × (n+1) upper-triangular matrices over any fixed finite field has complexity n (Kambites, 2007). A major open problem in finite semigroup theory is the decidability of complexity: is there an algorithm that will compute the Krohn–Rhodes complexity of a finite semigroup, given its multiplication table? Upper bounds and ever more precise lower bounds on complexity have been obtained (see, e.g. Rhodes & Steinberg, 2009). Rhodes has conjectured that the problem is decidable. In June 2024, Stuart Margolis, John Rhodes, and Anne Schilling announced a proof in the affirmative of the conjecture, though as of 2025 the result has yet to be confirmed. == History and applications == At a conference in 1962, Kenneth Krohn and John Rhodes announced a method for decomposing a (deterministic) finite automaton into "simple" components that are themselves finite automata. This joint work, which has implications for philosophy, comprised both Krohn's doctoral thesis at Harvard University and Rhodes' doctoral thesis at MIT. Simpler proofs, and generalizations of the theorem to infinite structures, have been published since then (see Chapter 4 of Rhodes and Steinberg's 2009 book The q-Theory of Finite Semigroups for an overview). In the 1965 paper by Krohn and Rhodes, the proof of the theorem on the decomposition of finite automata (or, equivalently sequential machines) made extensive use of the algebraic semigroup structure. Later proofs contained major simplifications using finite wreath products of finite transformation semigroups. The theorem generalizes the Jordan–Hölder decomposition for finite groups (in which the primes are the finite simple groups), to all finite transformation semigroups (for which the primes are again the finite simple groups plus all subsemigroups of the "flip-flop" (see above)). Both the group and more general finite automata decomposition require expanding the state-set of the general, but allow for the same number of input symbols. In the general case, these are embedded in a larger structure with a hierarchical "coordinate system". One must be careful in understanding the notion of "prime" as Krohn and Rhodes explicitly refer to their theorem as a "prime decomposition theorem" for automata. The components in the decomposition, however, are not prime automata (with prime defined in a naïve way); rather, the notion of prime is more sophisticated and algebraic: the semigroups and groups associated to the constituent automata of the decomposition are prime (or irreducible) in a strict and natural algebraic sense with respect to the wreath product (Eilenberg, 1976). Also, unlike earlier decomposition theorems, the Krohn–Rhodes decompositions usually require expansion of the state-set, so that the expanded automaton covers (emulates) the one being decomposed. These facts have made the theorem difficult to understand and challenging to apply in a practical way—until recently, when computational implementations became available (Egri-Nagy & Nehaniv 2005, 2008). H.P. Zeiger (1967) proved an important variant called the holonomy decomposition (Eilenberg 1976). The holonomy method appears to be relatively efficient and has been implemented computationally by A. Egri-Nagy (Egri-Nagy & Nehaniv 2005). Meyer and Thompson (1969) give a version of Krohn–Rhodes decomposition for finite automata that is equivalent to the decomposition previously developed by Hartmanis and Stearns, but for useful decompositions, the notion of expanding the state-set of the original automaton is essential (for the non-permutation automata case). Many proofs and constructions now exist of Krohn–Rhodes decompositions (e.g., [Krohn, Rhodes & Tilson 1968], [Ésik 2000], [Diekert et al. 2012]), with the holonomy method the most popular and efficient in general (although not in all cases). [Zimmermann 2010] gives an elementary proof of the theorem. Owing to the close relation between monoids and categories, a version of the Krohn–Rhodes theorem is applicable to category theory. This observation and a proof of an analogous result were offered by Wells (1980). The Krohn–Rhodes theorem for semigroups/monoids is an analogue of the Jordan–Hölder theorem for finite groups (for semigroups/monoids rather than groups). As such, the theorem is a deep and important result in semigroup/monoid theory. The theorem was also surprising to many mathematicians and computer scientists since it had previously been widely believed that the semigroup/monoid axioms were too weak to admit a structure theorem of any strength, and prior work (Hartmanis & Stearns) was only able to show much more rigid and less general decomposition results for finite automata. Work by Egri-Nagy and Nehaniv (2005, 2008–) continues to further automate the holonomy version of the Krohn–Rhodes decomposition extended with the related decomposition for finite groups (so-called Frobenius–Lagrange coordinates) using the computer algebra system GAP. Applications outside of the semigroup and monoid theories are now computationally feasible. They include computations in biology and biochemical systems (e.g. Egri-Nagy & Nehaniv 2008), artificial intelligence, finite-state physics, psychology, and game theory (see, for example, Rhodes 2009).