Information gain (decision tree)

Information gain (decision tree)

In the context of decision trees in information theory and machine learning, information gain refers to the conditional expected value of the Kullback–Leibler divergence of the univariate probability distribution of one variable from the conditional distribution of this variable given the other one. (In broader contexts, information gain can also be used as a synonym for either Kullback–Leibler divergence or mutual information, but the focus of this article is on the more narrow meaning below.) Explicitly, the information gain of a random variable X {\displaystyle X} obtained from an observation of a random variable A {\displaystyle A} taking value a {\displaystyle a} is defined as: I G ( X , a ) = D KL ( P X ∣ a ∥ P X ) {\displaystyle {\mathit {IG}}(X,a)=D_{\text{KL}}{\bigl (}P_{X\mid a}\parallel P_{X}{\bigr )}} In other words, it is the Kullback–Leibler divergence of P X ( x ) {\displaystyle P_{X}(x)} (the prior distribution for X {\displaystyle X} ) from P X ∣ a ( x ) {\displaystyle P_{X\mid a}(x)} (the posterior distribution for X {\displaystyle X} given A = a {\displaystyle A=a} ). The expected value of the information gain is the mutual information I ( X ; A ) {\displaystyle I(X;A)} : E A ⁡ [ I G ( X , A ) ] = I ( X ; A ) {\displaystyle \operatorname {E} _{A}[{\mathit {IG}}(X,A)]=I(X;A)} i.e. the reduction in the entropy of X {\displaystyle X} achieved by learning the state of the random variable A {\displaystyle A} . In machine learning, this concept can be used to define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X. Such a sequence (which depends on the outcome of the investigation of previous attributes at each stage) is called a decision tree, and when applied in the area of machine learning is known as decision tree learning. Usually an attribute with high mutual information should be preferred to other attributes. == General definition == In general terms, the expected information gain is the reduction in information entropy Η from a prior state to a state that takes some information as given: I G ( T , a ) = H ( T ) − H ( T | a ) , {\displaystyle IG(T,a)=\mathrm {H} {(T)}-\mathrm {H} {(T|a)},} where H ( T | a ) {\displaystyle \mathrm {H} {(T|a)}} is the conditional entropy of T {\displaystyle T} given the value of attribute a {\displaystyle a} . This is intuitively plausible when interpreting entropy Η as a measure of uncertainty of a random variable T {\displaystyle T} : by learning (or assuming) a {\displaystyle a} about T {\displaystyle T} , our uncertainty about T {\displaystyle T} is reduced (i.e. I G ( T , a ) {\displaystyle IG(T,a)} is positive), unless of course T {\displaystyle T} is independent of a {\displaystyle a} , in which case H ( T | a ) = H ( T ) {\displaystyle \mathrm {H} (T|a)=\mathrm {H} (T)} , meaning I G ( T , a ) = 0 {\displaystyle IG(T,a)=0} . == Formal definition == Let T denote a set of training examples, each of the form ( x , y ) = ( x 1 , x 2 , x 3 , . . . , x k , y ) {\displaystyle ({\textbf {x}},y)=(x_{1},x_{2},x_{3},...,x_{k},y)} where x a ∈ v a l s ( a ) {\displaystyle x_{a}\in \mathrm {vals} (a)} is the value of the a th {\displaystyle a^{\text{th}}} attribute or feature of example x {\displaystyle {\textbf {x}}} and y is the corresponding class label. The information gain for an attribute a is defined in terms of Shannon entropy H ( − ) {\displaystyle \mathrm {H} (-)} as follows. For a value v taken by attribute a, let S a ( v ) = { x ∈ T | x a = v } {\displaystyle S_{a}{(v)}=\{{\textbf {x}}\in T|x_{a}=v\}} be defined as the set of training inputs of T for which attribute a is equal to v. Then the information gain of T for attribute a is the difference between the a priori Shannon entropy H ( T ) {\displaystyle \mathrm {H} (T)} of the training set and the conditional entropy H ( T | a ) {\displaystyle \mathrm {H} {(T|a)}} . H ( T | a ) = ∑ v ∈ v a l s ( a ) | S a ( v ) | | T | ⋅ H ( S a ( v ) ) . {\displaystyle \mathrm {H} (T|a)=\sum _{v\in \mathrm {vals} (a)}{{\frac {|S_{a}{(v)}|}{|T|}}\cdot \mathrm {H} \left(S_{a}{\left(v\right)}\right)}.} I G ( T , a ) = H ( T ) − H ( T | a ) {\displaystyle IG(T,a)=\mathrm {H} (T)-\mathrm {H} (T|a)} The mutual information is equal to the total entropy for an attribute if for each of the attribute values a unique classification can be made for the result attribute. In this case, the relative entropies subtracted from the total entropy are 0. In particular, the values v ∈ v a l s ( a ) {\displaystyle v\in vals(a)} defines a partition of the training set data T into mutually exclusive and all-inclusive subsets, inducing a categorical probability distribution P a ( v ) {\textstyle P_{a}{(v)}} on the values v ∈ v a l s ( a ) {\textstyle v\in vals(a)} of attribute a. The distribution is given P a ( v ) := | S a ( v ) | | T | {\textstyle P_{a}{(v)}:={\frac {|S_{a}{(v)}|}{|T|}}} . In this representation, the information gain of T given a can be defined as the difference between the unconditional Shannon entropy of T and the expected entropy of T conditioned on a, where the expectation value is taken with respect to the induced distribution on the values of a. I G ( T , a ) = H ( T ) − ∑ v ∈ v a l s ( a ) P a ( v ) H ( S a ( v ) ) = H ( T ) − E P a [ H ( S a ( v ) ) ] = H ( T ) − H ( T | a ) . {\displaystyle {\begin{alignedat}{2}IG(T,a)&=\mathrm {H} (T)-\sum _{v\in \mathrm {vals} (a)}{P_{a}{(v)}\mathrm {H} \left(S_{a}{(v)}\right)}\\&=\mathrm {H} (T)-\mathbb {E} _{P_{a}}{\left[\mathrm {H} {(S_{a}{(v)})}\right]}\\&=\mathrm {H} (T)-\mathrm {H} {(T|a)}.\end{alignedat}}} == Example == In engineering applications, information is analogous to signal, and entropy is analogous to noise. It determines how a decision tree chooses to split data. The leftmost figure below is very impure and has high entropy corresponding to higher disorder and lower information value. As we go to the right, the entropy decreases, and the information value increases. Now, it is clear that information gain is the measure of how much information a feature provides about a class. Let's visualize information gain in a decision tree as shown in the right: The node t is the parent node, and the sub-nodes tL and tR are child nodes. In this case, the parent node t has a collection of cancer and non-cancer samples denoted as C and NC respectively. We can use information gain to determine how good the splitting of nodes is in a decision tree. In terms of entropy, information gain is defined as: To understand this idea, let's start by an example in which we create a simple dataset and want to see if gene mutations could be related to patients with cancer. Given four different gene mutations, as well as seven samples, the training set for a decision can be created as follows: In this dataset, a 1 means the sample has the mutation (True), while a 0 means the sample does not (False). A sample with C denotes that it has been confirmed to be cancerous, while NC means it is non-cancerous. Using this data, a decision tree can be created with information gain used to determine the candidate splits for each node. For the next step, the entropy at parent node t of the above simple decision tree is computed as:H(t) = −[pC,t log2(pC,t) + pNC,t log2(pNC,t)] where, probability of selecting a class ‘C’ sample at node t, pC,t = n(t, C) / n(t), probability of selecting a class ‘NC’ sample at node t, pNC,t = n(t, NC) / n(t), n(t), n(t, C), and n(t, NC) are the number of total samples, ‘C’ samples and ‘NC’ samples at node t respectively.Using this with the example training set, the process for finding information gain beginning with H ( t ) {\displaystyle \mathrm {H} {(t)}} for Mutation 1 is as follows: pC, t = 4/7 pNC, t = 3/7 H ( t ) {\displaystyle \mathrm {H} {(t)}} = −(4/7 × log2(4/7) + 3/7 × log2(3/7)) = 0.985 Note: H ( t ) {\displaystyle \mathrm {H} {(t)}} will be the same for all mutations at the root. The relatively high value of entropy H ( t ) = 0.985 {\displaystyle \mathrm {H} {(t)}=0.985} (1 is the optimal value) suggests that the root node is highly impure and the constituents of the input at the root node would look like the leftmost figure in the above Entropy Diagram. However, such a set of data is good for learning the attributes of the mutations used to split the node. At a certain node, when the homogeneity of the constituents of the input occurs (as shown in the rightmost figure in the above Entropy Diagram), the dataset would no longer be good for learning. Moving on, the entropy at left and right child nodes of the above decision tree is computed using the formulae:H(tL) = −[pC,L log2(pC,L) + pNC,L log2(pNC,L)]H(tR) = −[pC,R log2(pC,R) + pNC,R log2(pNC,R)]where, probability of selecting a class ‘C’ sample at the left child node, pC,L = n(tL, C) / n(tL), probability of selecting a class ‘NC’ sample at the left child node, pNC,L = n(tL, NC) / n(tL), probability of selecting a class ‘C’ sample at the right child node, pC,R = n(tR, C) / n(tR), prob

Microsoft Sway

Microsoft Sway is a presentation program and is part of the Microsoft 365 family of products. Sway was offered for general release by Microsoft in August 2015. It allows users who have a Microsoft account to combine text and media to create a presentable website. Users can pull content locally from the device in use, or from internet sources such as Bing, Facebook, OneDrive, and YouTube. Sway is distinguished from Microsoft FrontPage and Microsoft Expression Web – unrelated web design programs previously developed by Microsoft – in that Sway includes a method for hosting sites. Sway sites are stored on Microsoft's servers and are tied to the user's Microsoft account. They can be viewed and edited from any web browser through Office on the web. There is no offline editing or viewing function, but sites can be accessed using the app for Windows, and formerly iOS. == History == Sway was developed internally by Microsoft. In late 2014, the company announced an invite-only preview version of Sway and announced that Sway would not require an Office 365 subscription. An iOS app was released as a preview on 31 October 2014, but was discontinued on 17 December 2018 due to low usage. As of July 17, 2021, the Sway iOS app's discontinuance in 2018 was the last piece of news posted in the Sway tech blog. The Sway feature blog has not received an update since April 2017. The Microsoft Office Roadmap did not include any items related to Sway ever since. The iOS application is no longer under active development, and is not available for download. Since 2023, Microsoft has been consolidating the domains of its Microsoft 365 apps and services under cloud.microsoft. By 2025, the vast majority of services, including Sway, have already migrated to the cloud.microsoft domain. == Features == Users are able to add content from various sources into their Sway presentations. Some of the integrated services are owned by Microsoft, including OneNote, Bing, and other Sway sites. The program also provides native integration with other services, including YouTube, Facebook, Twitter, Mixcloud, and Infogram.

Pixorial

Pixorial was a cloud-based consumer photo sharing, video sharing and video editing platform. The company was formed in 2007 in Centennial, Colorado as a media conversion service. In 2013, Pixorial was chosen as one of two video storage companies to partner with the launch of Google Drive. Pixorial allowed users to edit and share videos on social channels by connecting through their Pixorial account. The company closed on July 18, 2014, and its assets were acquired by LifeLogger Technologies Corp in November 2015. == History == The company was founded in 2007 and launched in 2009 by former Netscape employee Andres Espineira. Changing its focus to video editing software in 2009, Pixorial began developing an app that would be launched for iOS and Android devices in 2011. Later developments in the app in 2012 would also included real time filters, which were later removed. With the launch of Google Drive in 2012, Pixorial was chosen as an integrated video partner. This integration with Google Drive allowed users to access videos stored in Google Drive within the web app of Pixorial. After the Google Drive launch, Pixorial developed a crowdsourced, location-based video sharing app, Krowds. The app was cited in July 2012 by PC Magazine as one of "The 8 Best Apps for Making and Sharing Videos on Your iPhone". In late July, Pixorial replaced its original mobile app with the MyPlayer HD app that optimized HD video viewing for large screen viewing including tablets and smart televisions. Pixorial's services terminated on July 18, 2014. == Products == === Krowds App === Pixorial's app was launched in April 2013 for iOS, and in May for Android, as a tool to aggregate event videos through location based collections. The app was launched to generally positive reviews. === Movie Creator === Launched July 12, 2012 Pixorial's Movie Creator allowed users to edit movies in a simple story-telling platform Movie Creator's features include transitions, text boxes, access to free music tracks, credits, and social media sharing capabilities. The Pixorial platform allowed users to view, share, and edit videos without modifying the original. Movie Creator integrated pictures and video to create user movies. == Awards == 2012 Apex Award from the Colorado Technology Association, for Best Technology Project of the Year 2010 Computerworld Laureate for Media, Arts and Entertainment

TIMIT

TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time. TIMIT was designed to further acoustic-phonetic knowledge and automatic speech recognition systems. It was commissioned by DARPA and corpus design was a joint effort between the Massachusetts Institute of Technology, SRI International, and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and verified and prepared for publishing by the National Institute of Standards and Technology (NIST). There is also a telephone bandwidth version called NTIMIT (Network TIMIT). TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset. == Data == TIMIT contains ~5 hours of speech, of 10 sentences spoken by each of 630 speakers. The sentences were randomly sampled from a corpus of 2342 sentences. The speakers were native speakers of American English, classified under 8 major dialect regions: New England, Northern, North Midland, South Midland, Southern, New York City, Western, Army Brat (moved around). The speakers were 70% male and 30% female. Recordings were made in a noise-isolated recording booth at Texas Instrument, using a semi-automatic computer system (STEROIDS) to control the presentation of prompts to the speaker and the recording. Two-channel recordings were made using a Sennheiser HMD 414 headset-mounted microphone and a Brüel & Kjær 1/2" far-field pressure microphone (#4165). The speech was digitized at a sample rate of 20 kHz then and downsampled to 16 kHz. == History == The TIMIT telephone corpus was an early attempt to create a database with speech samples. It was published in the year 1988 on CD-ROM and consists of only 10 sentences per speaker. Two 'dialect' sentences were read by each speaker, as well as another 8 sentences selected from a larger set Each sentence averages 3 seconds long and is spoken by 630 different speakers. It was the first notable attempt in creating and distributing a speech corpus and the overall project has produced costs of 1.5 million US$. An update was released in October 1990. It included full 630-speaker corpus; checked and corrected transcriptions; word-alignment transcriptions; NIST SPHERE-headered waveform files and header manipulation software; phonemic dictionary; new test and training subsets balanced for dialectal and phonetic coverage; more extensive documentation. The full name of the project is DARPA-TIMIT Acoustic-Phonetic Continuous Speech Corpus and the acronym TIMIT stands for Texas Instruments/Massachusetts Institute of Technology. The main reason why a corpus of telephone speech was created was to train speech recognition software. In the Blizzard challenge, different software has the obligation to convert audio recordings into textual data and the TIMIT corpus was used as a standardized baseline.

Human–robot collaboration

Human-Robot Collaboration is the study of collaborative processes in human and robot agents work together to achieve shared goals. Many new applications for robots require them to work alongside people as capable members of human-robot teams. These include robots for homes, hospitals, and offices, space exploration and manufacturing. Human-Robot Collaboration (HRC) is an interdisciplinary research area comprising classical robotics, human-computer interaction, artificial intelligence, process design, layout planning, ergonomics, cognitive sciences, and psychology. Industrial applications of human-robot collaboration involve Collaborative Robots, or cobots, that physically interact with humans in a shared workspace to complete tasks such as collaborative manipulation or object handovers. == Collaborative Activity == Collaboration is defined as a special type of coordinated activity, one in which two or more agents work jointly with each other, together performing a task or carrying out the activities needed to satisfy a shared goal. The process typically involves shared plans, shared norms and mutually beneficial interactions. Although collaboration and cooperation are often used interchangeably, collaboration differs from cooperation as it involves a shared goal and joint action where the success of both parties depend on each other. For effective human-robot collaboration, it is imperative that the robot is capable of understanding and interpreting several communication mechanisms similar to the mechanisms involved in human-human interaction. The robot must also communicate its own set of intents and goals to establish and maintain a set of shared beliefs and to coordinate its actions to execute the shared plan. In addition, all team members demonstrate commitment to doing their own part, to the others doing theirs, and to the success of the overall task. == Theories Informing Human-Robot Collaboration == Human-human collaborative activities are studied in depth in order to identify the characteristics that enable humans to successfully work together. These activity models usually aim to understand how people work together in teams, how they form intentions and achieve a joint goal. Theories on collaboration inform human-robot collaboration research to develop efficient and fluent collaborative agents. === Belief Desire Intention Model === The belief-desire-intention (BDI) model is a model of human practical reasoning that was originally developed by Michael Bratman. The approach is used in intelligent agents research to describe and model intelligent agents. The BDI model is characterized by the implementation of an agent's beliefs (the knowledge of the world, state of the world), desires (the objective to accomplish, desired end state) and intentions (the course of actions currently under execution to achieve the desire of the agent) in order to deliberate their decision-making processes. BDI agents are able to deliberate about plans, select plans and execute plans. === Shared Cooperative Activity === Shared Cooperative Activity defines certain prerequisites for an activity to be considered shared and cooperative: mutual responsiveness, commitment to the joint activity and commitment to mutual support. An example case to illustrate these concepts would be a collaborative activity where agents are moving a table out the door, mutual responsiveness ensures that movements of the agents are synchronized; a commitment to the joint activity reassures each team member that the other will not at some point drop his side; and a commitment to mutual support deals with possible breakdowns due to one team member's inability to perform part of the plan. === Joint Intention Theory === Joint Intention Theory proposes that for joint action to emerge, team members must communicate to maintain a set of shared beliefs and to coordinate their actions towards the shared plan. In collaborative work, agents should be able to count on the commitment of other members, therefore each agent should inform the others when they reach the conclusion that a goal is achievable, impossible, or irrelevant. == Approaches to Human-Robot Collaboration == The approaches to human-robot collaboration include human emulation (HE) and human complementary (HC) approaches. Although these approaches have differences, there are research efforts to develop a unified approach stemming from potential convergences such as Collaborative Control. === Human Emulation === The human emulation approach aims to enable computers to act like humans or have human-like abilities in order to collaborate with humans. It focuses on developing formal models of human-human collaboration and applying these models to human-computer collaboration. In this approach, humans are viewed as rational agents who form and execute plans for achieving their goals and infer other people's plans. Agents are required to infer the goals and plans of other agents, and collaborative behavior consists of helping other agents to achieve their goals. === Human Complementary === The human complementary approach seeks to improve human-computer interaction by making the computer a more intelligent partner that complements and collaborates with humans. The premise is that the computer and humans have fundamentally asymmetric abilities. Therefore, researchers invent interaction paradigms that divide responsibility between human users and computer systems by assigning distinct roles that exploit the strengths and overcome the weaknesses of both partners. == Key Aspects == Specialization of Roles: Based on the level of autonomy and intervention, there are several human-robot relationships including master-slave, supervisor–subordinate, partner–partner, teacher–learner and fully autonomous robot. In addition to these roles, homotopy (a weighting function that allows a continuous change between leader and follower behaviors) was introduced as a flexible role distribution. Establishing shared goal(s): Through direct discussion about goals or inference from statements and actions, agents must determine the shared goals they are trying to achieve. Allocation of Responsibility and Coordination: Agents must decide how to achieve their goals, determine what actions will be done by each agent, and how to coordinate the actions of individual agents and integrate their results. Shared context: Agents must be able to track progress toward their goals. They must keep track of what has been achieved and what remains to be done. They must evaluate the effects of actions and determine whether an acceptable solution has been achieved. Communication: Any collaboration requires communication to define goals, negotiate over how to proceed and who will do what, and evaluate progress and results. Adaptation and learning: Collaboration over time require partners to adapt themselves to each other and learn from one's partner both directly or indirectly. Time and space: The time-space taxonomy divides human-robot interaction into four categories based on whether the humans and robots are using computing systems at the same time (synchronous) or different times (asynchronous) and while in the same place (collocated) or in different places (non-collocated). Ergonomics: Human factors and ergonomics are one of the key aspects for a sustainable human-robot collaboration. The robot control system can use biomechanical models and sensors to optimize various ergonomic metrics, such as muscle fatigue.

Deductive language

A deductive language is a computer programming language in which the program is a collection of predicates ('facts') and rules that connect them. Such a language is used to create knowledge based systems or expert systems which can deduce answers to problem sets by applying the rules to the facts they have been given. An example of a deductive language is Prolog, or its database-query cousin, Datalog. == History == As the name implies, deductive languages are rooted in the principles of deductive reasoning; making inferences based upon current knowledge. The first recommendation to use a clausal form of logic for representing computer programs was made by Cordell Green (1969) at Stanford Research Institute (now SRI International). This idea can also be linked back to the battle between procedural and declarative information representation in early artificial intelligence systems. Deductive languages and their use in logic programming can also be dated to the same year when Foster and Elcock introduced Absys, the first deductive/logical programming language. Shortly after, the first Prolog system was introduced in 1972 by Colmerauer through collaboration with Robert Kowalski. == Components == The components of a deductive language are a system of formal logic and a knowledge base upon which the logic is applied. === Formal Logic === Formal logic is the study of inference in regards to formal content. The distinguishing feature between formal and informal logic is that in the former case, the logical rule applied to the content is not specific to a situation. The laws hold regardless of a change in context. Although first-order logic is described in the example below to demonstrate the uses of a deductive language, no formal system is mandated and the use of a specific system is defined within the language rules or grammar. As input, a predicate takes any object(s) in the domain of interest and outputs either one of two Boolean values: true or false. For example, consider the sentences "Barack Obama is the 44th president" and "If it rains today, I will bring an umbrella". The first is a statement with an associated truth value. The second is a conditional statement relying on the value of some other statement. Either of these sentences can be broken down into predicates which can be compared and form the knowledge base of a deductive language. Moreover, variables such as 'Barack Obama' or 'president' can be quantified over. For example, take 'Barack Obama' as variable 'x'. In the sentence "There exists an 'x' such that if 'x' is the president, then 'x' is the commander in chief." This is an example of the existential quantifier in first order logic. Take 'president' to be the variable 'y'. In the sentence "For every 'y', 'y' is the leader of their nation." This is an example of the universal quantifier. === Knowledge Base === A collection of 'facts' or predicates and variables form the knowledge base of a deductive language. Depending on the language, the order of declaration of these predicates within the knowledge base may or may not influence the result of applying logical rules. Upon application of certain 'rules' or inferences, new predicates may be added to a knowledge base. As new facts are established or added, they form the basis for new inferences. As the core of early expert systems, artificial intelligence systems which can make decisions like an expert human, knowledge bases provided more information than databases. They contained structured data, with classes, subclasses, and instances. == Prolog == Prolog is an example of a deductive, declarative language that applies first- order logic to a knowledge base. To run a program in Prolog, a query is posed and based upon the inference engine and the specific facts in the knowledge base, a result is returned. The result can be anything appropriate from a new relation or predicate, to a literal such as a Boolean (true/false), depending on the engine and type system.

Super-resolution imaging

Super-resolution imaging (SR) is a class of techniques that improve the resolution of an imaging system. In optical SR the diffraction limit of systems is transcended, while in geometrical SR the resolution of digital imaging sensors is enhanced. In some radar and sonar imaging applications (e.g. magnetic resonance imaging (MRI), high-resolution computed tomography), subspace decomposition-based methods (e.g. MUSIC) and compressed sensing-based algorithms (e.g., SAMV) are employed to achieve SR over standard periodogram algorithm. Super-resolution imaging techniques are used in general image processing and in super-resolution microscopy. == Super-resolution principles == Several concepts are fundamental to super-resolution imaging: Diffraction limit: the capacity of an optical instrument to reproduce the details of an object in an image has limits that are imposed by laws of physics: the diffraction equations in the wave theory of light, or the uncertainty principle for photons in quantum mechanics. Information transfer can never be increased beyond this boundary, but packets outside the limits can be cleverly swapped for (or multiplexed with) some inside it. Super-resolution microscopy does not so much “break” as “circumvent” the diffraction limit. New procedures probing electro-magnetic disturbances at the molecular level (in the so-called near field) remain fully consistent with Maxwell's equations. Spatial frequency domain: A succinct expression of the diffraction limit is given in the spatial frequency domain. In Fourier optics light distributions are expressed as superpositions of a series of grating light patterns in a range of fringe widths - these widths represent the spatial frequencies. It is generally taught that diffraction theory stipulates an upper limit, the cut-off spatial-frequency, beyond which pattern elements fail to be transferred into the optical image, i.e., are not resolved. But in fact what is set by diffraction theory is the width of the passband, not a fixed upper limit. No laws of physics are broken when a spatial frequency band beyond the cut-off spatial frequency is swapped for one inside it: this has long been implemented in dark-field microscopy. Nor are information-theoretical rules broken when superimposing several bands, disentangling them in the received image needs assumptions of object invariance during multiple exposures, i.e., the substitution of one kind of uncertainty for another. Information: When the term super-resolution is used in techniques based on the inference of object details using a statistical treatment of the image within standard resolution limits (for example, averaging multiple exposures), it involves an exchange of one kind of information (extracting signal from noise) for another (the assumption that the target has remained invariant). Recent breakthroughs incorporate quantum-transformer hybrids into super-resolution, such as QUIET‑SR, a 2025 model that employs shifted quantum window attention within a transformer to enhance image detail while respecting diffraction and information-theory limits Similarly, frequency-integrated transformers (e.g., FIT) enrich super-resolution by explicitly combining spatial and frequency-domain information via FFT-based attention, improving reconstruction across scales Resolution and localization: True resolution involves the distinction of whether a target, e.g. a star or a spectral line, is single or double, ordinarily requiring separable peaks in the image. When a target is known to be single, its location can be determined with higher precision than the image width by finding the centroid (center of gravity) of its image light distribution. The word ultra-resolution had been proposed for this process but it did not catch on, and the high-precision localization procedure is typically referred to as super-resolution. == Techniques == === Optical or diffractive super-resolution === Substituting spatial-frequency bands: Though the bandwidth allowable by diffraction is fixed, it can be positioned anywhere in the spatial-frequency spectrum. Dark-field illumination in microscopy is an example. See also aperture synthesis. ==== Multiplexing spatial-frequency bands ==== An image is formed using the normal passband of the optical device. Then, some known light structure (for example, a set of light fringes) is superimposed on the target. The image now contains components resulting from the combination of the target and the superimposed light structure, e.g. moiré fringes, and carries information about target detail which simple unstructured illumination does not. The “superresolved” components, however, need disentangling to be revealed. For an example, see structured illumination (figure to left). ==== Multiple parameter use within traditional diffraction limit ==== If a target has no special polarization or wavelength properties, two polarization states or non-overlapping wavelength regions can be used to encode target details, one in a spatial-frequency band inside the cut-off limit the other beyond it. Both would use normal passband transmission but are then separately decoded to reconstitute target structure with extended resolution. ==== Probing near-field electromagnetic disturbance ==== Super-resolution microscopy is generally discussed within the realm of conventional optical imagery. However, modern technology allows the probing of electromagnetic disturbance within molecular distances of the source, which has superior resolution properties. See also evanescent waves and the development of the new super lens. === Geometrical or image-processing super-resolution === ==== Multi-exposure image noise reduction ==== When an image is degraded by noise, the resolution may be improved by averaging multiple exposures. See example on the right. ==== Single-frame deblurring ==== Known defects in a given imaging situation, such as defocus or aberrations, can sometimes be mitigated in whole or in part by suitable spatial-frequency filtering of even a single image. Such procedures all stay within the diffraction-mandated passband, and do not extend it. ==== Sub-pixel image localization ==== The location of a single source can be determined by computing the "center of gravity" (centroid) of the light distribution extending over several adjacent pixels (see figure on the left). Provided that there is enough light, this can be achieved with arbitrary precision, very much better than pixel width of the detecting apparatus and the resolution limit for the decision of whether the source is single or double. This technique, which requires the presupposition that all the light comes from a single source, is at the basis of what has become known as super-resolution microscopy, e.g. stochastic optical reconstruction microscopy (STORM), where fluorescent probes attached to molecules give nanoscale distance information. It is also the mechanism underlying visual hyperacuity. ==== Bayesian induction beyond traditional diffraction limit ==== Some object features, though beyond the diffraction limit, may be known to be associated with other object features that are within the limits and hence contained in the image. Then conclusions can be drawn, using statistical methods, from the available image data about the presence of the full object. The classical example is Toraldo di Francia's proposition of judging whether an image is that of a single or double star by determining whether its width exceeds the spread from a single star. This can be achieved at separations well below the classical resolution bounds, and requires the prior limitation to the choice "single or double?" The approach can take the form of extrapolating the image in the frequency domain, by assuming that the object is an analytic function, and that we can exactly know the function values in some interval. This method is severely limited by the ever-present noise in digital imaging systems, but it can work for radar, astronomy, microscopy or magnetic resonance imaging. More recently, a fast single image super-resolution algorithm based on a closed-form solution to ℓ 2 − ℓ 2 {\displaystyle \ell _{2}-\ell _{2}} problems has been proposed and demonstrated to accelerate most of the existing Bayesian super-resolution methods significantly. == Aliasing == Geometrical SR reconstruction algorithms are possible if and only if the input low resolution images have been under-sampled and therefore contain aliasing. Because of this aliasing, the high-frequency content of the desired reconstruction image is embedded in the low-frequency content of each of the observed images. Given a sufficient number of observation images, and if the set of observations vary in their phase (i.e. if the images of the scene are shifted by a sub-pixel amount), then the phase information can be used to separate the aliased high-frequency content from the true low-frequency content, and the full-resolution image can be accurate