AI For Business Hkbu

AI For Business Hkbu — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Purged cross-validation

    Purged cross-validation

    Purged cross-validation is a variant of k-fold cross-validation designed to prevent look-ahead bias in time series and other structured data, developed in 2017 by Marcos López de Prado at Guggenheim Partners and Cornell University. It is primarily used in financial machine learning to ensure the independence of training and testing samples when labels depend on future events. It provides an alternative to conventional cross-validation and walk-forward backtesting methods, which often yield overly optimistic performance estimates due to information leakage and overfitting. == Motivation == Standard cross-validation assumes that observations are independently and identically distributed (IID), which often does not hold in time series or financial datasets. If the label of a test sample overlaps in time with the features or labels in the training set, the result may be data leakage and overfitting. Purged cross-validation addresses this issue by removing overlapping observations and, optionally, adding a temporal buffer ("embargo") around the test set to further reduce the risk of leakage. The figure below illustrates standard 5 Fold Cross-Validation == Purging == Purging removes from the training set any observation whose timestamp falls within the time range of formation of a label in the test set. This can be the case for train set observations before and after the test set. Their removal ensures that the algorithm cannot learn during train time information that will be used to assess the performance of the algorithm. See the figure below for an illustration of purging. == Embargoing == Embargoing addresses a more subtle form of leakage: even if an observation does not directly overlap the test set, it may still be affected by test events due to market reaction lag or downstream dependencies. To guard against this, a percentage-based embargo is imposed after each test fold. For example, with a 5% embargo and 1000 observations, the 50 observations following each test fold are excluded from training. Unlike purging, embargoing can only occur after the test set. The figure below illustrates the application of embargo: == Applications == Purged and embargoed cross-validation has been useful in: Backtesting of trading strategies Validation of classifiers on labeled event-driven returns Any machine learning task with overlapping label horizons == Example == To illustrate the effect of purging and embargoing, consider the figures below. Both diagrams show the structure of 5-fold cross-validation over a 20-day period. In each row, blue squares indicate training samples and red squares denote test samples. Each label is defined based on the value of the next two observations, hence creating an overlap. If this overlap is left untreated, test set information leaks into the train set. The second figure applies the Purged CV procedure. Notice how purging removes overlapping observations from the training set and the embargo widens the gap between test and training data. This approach ensures that the evaluation more closely resembles a true out-of-sample test and reduces the risk of backtest overfitting. == Combinatorial Purged Cross-Validation == Walk-forward backtesting analysis, another common cross-validation technique in finance, preserves temporal order but evaluates the model on a single sequence of test sets. This leads to high variance in performance estimation, as results are contingent on a specific historical path. Combinatorial Purged Cross-Validation (CPCV) addresses this limitation by systematically constructing multiple train-test splits, purging overlapping samples, and enforcing an embargo period to prevent information leakage. The result is a distribution of out-of-sample performance estimates, enabling robust statistical inference and more realistic assessment of a model's predictive power. === Methodology === CPCV divides a time-series dataset into N sequential, non-overlapping groups. These groups preserve the temporal order of observations. Then, all combinations of k groups (where k < N) are selected as test sets, with the remaining N − k groups used for training. For each combination, the model is trained and evaluated under strict controls to prevent leakage. To eliminate potential contamination between training and test sets, CPCV introduces two additional mechanisms: Purging: Any training observations whose label horizon overlaps with the test period are excluded. This ensures that future information does not influence model training. Embargoing: After the end of each test period, a fixed number of observations (typically a small percentage) are removed from the training set. This prevents leakage due to delayed market reactions or auto-correlated features. Each data point appears in multiple test sets across different combinations. Because test groups are drawn combinatorially, this process produces multiple backtest "paths," each of which simulates a plausible market scenario. From these paths, practitioners can compute a distribution of performance statistics such as the Sharpe ratio, drawdown, or classification accuracy. === Formal definition === Let N be the number of sequential groups into which the dataset is divided, and let k be the number of groups selected as the test set for each split. Then: The number of unique train-test combinations is given by the binomial coefficient: ( N k ) {\displaystyle {\binom {N}{k}}} Each observation is used in k {\displaystyle k} test sets and contributes to φ [ N , k ] {\displaystyle \varphi [N,k]} unique backtest paths: φ [ N , k ] = k N ( N k ) {\displaystyle \varphi [N,k]={\frac {k}{N}}{\binom {N}{k}}} This yields a distribution of performance metrics rather than a single point estimate, making it possible to apply Monte Carlo-based or probabilistic techniques to assess model robustness. === Illustrative example === Consider the case where N = 6 and k = 2. The number of possible test set combinations is ( 6 2 ) = 15 {\displaystyle {\binom {6}{2}}=15} . Each of the six groups appears in five test splits. Consequently, five distinct backtest paths can be constructed, each incorporating one appearance from every group. ==== Test group assignment matrix ==== This table shows the 15 test combinations. An "x" indicates that the corresponding group is included in the test set for that split. ==== Backtest path assignment ==== Each group contributes to five different backtest paths. The number in each cell indicates the path to which the group's result is assigned for that split. === Advantages === Combinatorial Purged Cross-Validation offers several key benefits over conventional methods: It produces a distribution of performance metrics, enabling more rigorous statistical inference. The method systematically eliminates lookahead bias through purging and embargoing. By simulating multiple historical scenarios, it reduces the dependence on any single market regime or realization. It supports high-confidence comparisons between competing models or strategies. CPCV is commonly used in quantitative strategy research, especially for evaluating predictive models such as classifiers, regressors, and portfolio optimizers. It has been applied to estimate realistic Sharpe ratios, assess the risk of overfitting, and support the use of statistical tools such as the Deflated Sharpe Ratio (DSR). === Limitations === The main limitation of CPCV stems from its high computational cost. However, this cost can be managed by sampling a finite number of splits from the space of all possible combinations.

    Read more →
  • Slopaganda

    Slopaganda

    Slopaganda is a portmanteau of "AI slop" and "propaganda", referring to AI-generated content designed to manipulate beliefs, emotions, and political decision-making at scale. The term is credited to Michał Klincewicz, an assistant professor in the Department of Computational Cognitive Science at Tilburg University, in 2025. == Definition == Slopaganda is distinguished from traditional propaganda by three features: scale, scope, and speed. Generative AI makes it possible to produce large volumes of content quickly and at low cost, allows for highly personalised and targeted messaging to specific sub-audiences, and leverages the hyper-connectivity of social networks to accelerate dissemination beyond what conventional media could achieve. Unlike traditional propaganda, which delivers a uniform message to all recipients, slopaganda can be micro-targeted — tailored to individuals based on estimated prior beliefs to reinforce political biases or emotional associations. The authors note that it need not aim at literal deception: much slopaganda is expressive rather than truth-apt, designed to create emotional associations rather than false factual beliefs. == Relation to AI slop == Slopaganda is a subset of AI slop — low-quality, mass-produced AI-generated content — distinguished by intent. Where AI slop may be produced indifferently for commercial or engagement-farming purposes, slopaganda is deployed with a deliberate political or ideological goal. == Notable examples == Examples discussed by the term's originators include Donald Trump's prolific use of AI in Truth Social posts and Iranian Lego-themed music videos. AI-generated videos posted by the White House mixing real military footage with clips from films and video games; and deepfake audio imitating political candidates during the 2024 US presidential campaign have also been given the label slopaganda.

    Read more →
  • .ai

    .ai

    .ai is the Internet country code top-level domain (ccTLD) for Anguilla, a British Overseas Territory in the Caribbean. It is administered by the government of Anguilla. It is a popular domain hack with companies and projects related to the artificial intelligence industry (AI). Google's ad targeting treats .ai as a generic top-level domain (gTLD) because "users and website owners frequently see [the domain] as being more generic than country-targeted." In 2021, Google Search analyst Gary Illyes announced that ".ai" had been added to Google’s list of generic country-code top-level domains, meaning that Google would no longer infer Anguilla-specific targeting from the ccTLD. Identity Digital began managing the domain as of January 2025. == Second and third level registrations == Registrations within off.ai, com.ai, net.ai, and org.ai are available worldwide without restriction. From 15 September 2009, second level registrations within .ai are available to everyone worldwide. == Registration == The minimum registration term allowed for .ai domains is 2 through 10 years for registration and renewal, and a 2-year renewal for domain transfer. Identity Digital is the authority in charge of managing this extension. Registrations began on 16 February 1995. The limits on the number of characters used for the domain name are, at a minimum, from 1 to 3, depending on the registrar, and always at most 63 characters. The character set supported for .ai domain names includes A–Z, a–z, 0–9, and hyphen. As of November 2022, .ai domains cannot accommodate IDN characters. There are no requirements for registering a domain, including local and foreign residents. A .ai domain can be suspended or revoked, if the domain is involved in illegal activity such as violating trademarks or copyrights. Usage must not violate the laws of Anguilla. Anguilla uses the UDRP. Filing a UDRP challenge requires using one of the ICANN Approved Dispute Resolution Service Providers. If the domain is with an ICANN accredited registrar, they should work with the arbitrator. Usually this means either doing nothing or transferring a domain. .ai domains are transferable to any desired registrars as the registration of domain is done maintaining EPP. There used to be a whois.ai-based platform of expired domains in which those could be procured and auctioned every ten days through a standard online process. The last auctions of such kind closed there in December 2024; the platform had been scheduled for shutdown on 30 June 2025, but remained online in the months following that date. == Valuation == Domains cost depends on the registrar, with yearly fees ranging from US$140 (the base fee, as established by Anguilla) to $200. As of July 2025, the highest-valued .ai domain is an undisclosed one sold on 8 November 2023, on Escrow.com, for US$1,500,000—months after an initial $300,000 sale to the same buyer. Among the publicly disclosed ones, the most valued, fin.ai, was sold for $1,000,000 in March 2025. On 16 December 2017, the .ai registry started supporting the Extensible Provisioning Protocol (EPP) and migrated all of its domains onto an EPP system. Consequently, many registrars are allowed to sell .ai domains. Since that date, the .ai ccTLD has also been popular with artificial intelligence companies and organisations. Though such trends are primarily seen among new AI based companies or startups, many established AI and Tech companies preferred not to opt for .ai domains. For example, DeepMind has its domain retained at .com; Meta has redirected its facebook.ai domain to ai.meta.com. == Impact on Anguilla's economy == The registration fees earned from the .ai domains go to the treasury of the Government of Anguilla. As per a 2018 New York Times report, the total revenue generated out of selling .ai domains was $2.9 million. In 2023, Anguilla's government made about US$32 million from fees collected for registering .ai domains; that amounted to over 10% of gross domestic product for the territory. "In the years before the real breakthrough of AI, revenue from .ai domains made up less than 1% of our state income, by 2025 it will be around 47%," explained Jose Vanterpool, Minister of Infrastructure and Communications (MICUHITES), in an interview with BBC. The high 90% renewal rate of .ai domains and the 2025 renewal wave of domains registered in 2023 are driving another surge in state revenues, according to Domaintechnik.

    Read more →
  • Coherent extrapolated volition

    Coherent extrapolated volition

    Coherent extrapolated volition (CEV) is a theoretical framework in the field of AI alignment describing an approach by which an artificial superintelligence (ASI) would act on a benevolent supposition of what humans would want if they were more knowledgeable, more rational, had more time to think, and had matured together as a society, as opposed to humanity's current individual or collective preferences. It was proposed by Eliezer Yudkowsky in 2004 as part of his work on friendly AI. == Concept == CEV proposes that an advanced AI system should derive its goals by extrapolating the idealized volition of humanity. This means aggregating and projecting human preferences into a coherent utility function that reflects what people would desire under ideal epistemic and moral conditions. The aim is to ensure that AI systems are aligned with humanity's true interests, rather than with transient or poorly informed preferences. In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted. == Debate == Yudkowsky and Nick Bostrom note that CEV has several interesting properties. It is designed to be humane and self-correcting, by capturing the source of human values instead of trying to list them. It avoids the difficulty of laying down an explicit, fixed list of rules. It encapsulates moral growth, preventing flawed current moral beliefs from getting locked in. It limits the influence that a small group of programmers can have on what the ASI would value, thus also reducing the incentives to build ASI first. And it keeps humanity in charge of its destiny. CEV also faces significant theoretical and practical challenges. Bostrom notes that CEV has "a number of free parameters that could be specified in various ways, yielding different versions of the proposal." One such parameter is the extrapolation base (whose extrapolated volition is taken into account). For example, whether it should include people with severe dementia, patients in a vegetative state, foetuses, or embryos. He also notes that if CEV's extrapolation base only includes humans, there is a risk that the result would be ungenerous toward other animals and digital minds. One possible solution would be to include a mechanism to expand CEV's extrapolation base. == Variants and alternatives == A proposed theoretical alternative to CEV is to rely on an artificial superintelligence's superior cognitive capabilities to figure out what is morally right, and let it act accordingly. It is also possible to combine both techniques, for instance with the ASI following CEV except when it is morally impermissible. In another review, a philosophical analysis explores CEV through the lens of social trust in autonomous systems. Drawing on Anthony Giddens' concept of "active trust", the author proposes an evolution of CEV into "Coherent, Extrapolated and Clustered Volition" (CECV). This formulation aims to better reflect the moral preferences of diverse cultural groups, thus offering a more pragmatic ethical framework for designing AI systems that earn public trust while accommodating societal diversity.

    Read more →
  • STIT logic

    STIT logic

    STIT logic (from seeing to it that) is a family of modal and branching-time logics for reasoning about agency and choice. A typical STIT operator has the form [ i s t i t : φ ] {\displaystyle [i\ {\mathsf {stit}}:\varphi ]} , usually read as "agent i {\displaystyle i} sees to it that φ {\displaystyle \varphi } ", and is interpreted in models where agents choose between alternative possible futures. STIT logics are used in action theory, deontic logic, epistemic logic, and the theory of intelligent agents to formalise notions such as "could have done otherwise", responsibility, joint action, and strategic ability in an indeterministic world. == Etymology == The acronym STIT comes from the English phrase "seeing to it that", introduced in influential work by Nuel Belnap and Michael Perloff on the logical analysis of agentive expressions. In this tradition, "to see to it that φ {\displaystyle \varphi } " is treated as a primitive agency operator, rather than being reduced to ordinary modal necessity. == History == Modern STIT logic arose in the 1980s in the context of branching-time semantics and formal theories of agency. Belnap and Perloff's article "Seeing to it that: A canonical form for agentives" introduced the idea of treating expressions of the form "agent i sees to it that φ" as a primitive modal operator, and analysed such sentences using a branching tree of moments and histories. This approach was further developed in a series of papers on indeterminism and agency and provided the conceptual core for later STIT formalisms. In the 1990s the basic formal systems of STIT logic were worked out. Horty and Belnap's influential paper on the deliberative STIT operator distinguished between a "Chellas" STIT that merely records the result of an agent's present choice and a "deliberative" STIT that requires the agent's choice to make a difference, and connected STIT with issues of action, omission, ability and obligation. Around the same time, Ming Xu proved completeness and decidability results for basic STIT systems, including a single-agent logic with Kripke-style semantics and axiomatizations for multi-agent deliberative STIT, thereby establishing STIT as a well-behaved normal modal framework. This early work was systematised in Belnap, Perloff and Xu's monograph Facing the Future: Agents and Choices in Our Indeterminist World, which presents a general branching-time semantics for individual and group STIT operators, discusses independence-of-agents conditions and articulates the metaphysical picture of an indeterministic "tree" of moments. At roughly the same time, Horty's book Agency and Deontic Logic developed deontic STIT logics in which obligations are tied to agents' available choices rather than to static states of affairs, and used the resulting systems to analyse "ought implies can", contrary-to-duty obligations and deontic paradoxes. These works helped to position STIT at the intersection of action theory, temporal logic and deontic logic. From the late 1990s and 2000s onward, STIT logics were combined with epistemic, temporal and strategic modalities. Broersen introduced complete STIT logics for knowledge and action and deontic-epistemic STIT systems that distinguish different modes of mens rea, with applications to responsibility and the specification of multi-agent systems. Work on group and coalitional agency investigated axiomatisations and complexity results for group STIT logics, and related STIT-based analyses of agency to coalition logic and alternating-time temporal logic (ATL) by exhibiting formal embeddings between the frameworks. Explicit temporal operators were added to STIT in so-called temporal STIT logics. Lorini proposed a temporal STIT with "next" and "until" operators along histories and showed how it can be applied to normative reasoning about ongoing behaviour and commitments. Ciuni and Lorini compared different semantics for temporal STIT, clarifying the relationships between branching-time, game-based and epistemic approaches, while Boudou and Lorini gave a semantics for temporal STIT based on concurrent game structures, thus strengthening links with standard models of multi-agent interaction used for ATL and strategy logic. In parallel, complexity-theoretic work by Balbiani, Herzig and Troquard and by Schwarzentruber and co-authors investigated the satisfiability and model-checking problems for various STIT fragments, showing for instance that many expressive group STIT logics are undecidable or of high computational complexity. In the 2010s, STIT ideas were combined with justification logic, imagination operators and refined deontic notions. Justification STIT logics, developed by Olkhovikov and others, merge explicit justifications with STIT-style agency so that producing a proof can itself be treated as an action that brings about knowledge, and they come with completeness and decidability results. Olkhovikov and Wansing introduced STIT imagination logics, together with axiomatic systems and tableau calculi, to model acts of voluntary imagining and their role in doxastic control. Other authors have proposed STIT-based logics of responsibility, blameworthiness and intentionality for use in philosophical and AI settings. Xu's survey article "Combinations of STIT with Ought and Know" (2015) reviews many of these developments and emphasises the interplay between deontic and epistemic STIT logics. Current research on STIT focuses on proof theory, automated reasoning and richer expressive resources. Lyon and van Berkel, building on earlier work on labelled calculi for STIT, have developed cut-free sequent systems and proof-search algorithms that yield syntactic decision procedures for a range of deontic and non-deontic multi-agent STIT logics and support applications such as duty checking and compliance checking in autonomous systems. Sawasaki has proposed first-order cstit-based STIT logics that can distinguish de re and de dicto readings of agency statements and has proved strong completeness results for Hilbert systems over finite models, moving the STIT programme beyond the purely propositional level. Further work investigates interpreted-system and computationally grounded semantics for STIT and its extensions in order to model the behaviour of autonomous agents in multi-agent settings, and proposes STIT-based semantics for epistemic notions based on patterns of information disclosure in interactive systems. == Branching-time semantics == STIT logics are usually interpreted over branching-time models. A standard STIT frame consists of: a non-empty set of moments T {\displaystyle T} , partially ordered by < {\displaystyle <} so that ( T , < ) {\displaystyle (T,<)} forms a tree (every pair of moments with a common predecessor has a greatest lower bound); a set of histories, each history being a maximal linearly ordered subset of T {\displaystyle T} ; a non-empty set of agents A g {\displaystyle Ag} ; for each agent i ∈ A g {\displaystyle i\in Ag} and moment m {\displaystyle m} , a choice function c h o i c e i m {\displaystyle {\mathsf {choice}}_{i}^{m}} that partitions the set of histories passing through m {\displaystyle m} into choice cells. The idea is that a moment represents a time at which choices are made, and histories represent complete possible future courses of events. At each moment, each agent's choice corresponds to selecting one of the available cells of histories determined by their choice function. Formulas are evaluated at pairs ( m , h ) {\displaystyle (m,h)} of a moment and a history through that moment (sometimes written m / h {\displaystyle m/h} ). A valuation assigns truth-values to atomic propositions at such indices; Boolean connectives are interpreted pointwise as in Kripke-style modal logic. == Chellas and deliberative STIT operators == Several STIT operators have been distinguished in the literature. A common approach uses two closely related operators, often called Chellas STIT and deliberative STIT. Let H m {\displaystyle H_{m}} be the set of histories passing through a moment m {\displaystyle m} , and write H m {\displaystyle H_{m}} ⟦ φ ⟧ m = { h ∈ H m ∣ M , m / h ⊨ φ } {\displaystyle {\text{⟦}}\varphi {\text{⟧}}_{m}=\{h\in H_{m}\mid M,m/h\models \varphi \}} for the set of histories at m {\displaystyle m} where φ {\displaystyle \varphi } holds. The Chellas STIT operator, often written [ i c s t i t : φ ] {\displaystyle [i\ {\mathsf {cstit}}:\varphi ]} , is given by M , m / h ⊨ [ i c s t i t : φ ] iff c h o i c e i m ( h ) ⊆ ⟦ φ ⟧ m . {\displaystyle M,m/h\models [i\ {\mathsf {cstit}}:\varphi ]\quad {\text{iff}}\quad {\mathsf {choice}}_{i}^{m}(h)\subseteq {\text{⟦}}\varphi {\text{⟧}}_{m}.} Intuitively, agent i {\displaystyle i} sees to it that φ {\displaystyle \varphi } if φ {\displaystyle \varphi } holds at all histories compatible with their present choice. The deliberative STIT operator, [ i d s t i t : φ ] {\displaystyle [i\ {\mathsf {dstit}}:\varphi ]} , adds

    Read more →
  • Domain adaptation

    Domain adaptation

    Domain adaptation is a field associated with machine learning and transfer learning. It addresses the challenge of training a model on one data distribution (the source domain) and applying it to a related but different data distribution (the target domain). A common example is spam filtering, where a model trained on emails from one user (source domain) is adapted to handle emails for another user with significantly different patterns (target domain). Domain adaptation techniques can also leverage unrelated data sources to improve learning. When multiple source distributions are involved, the problem extends to multi-source domain adaptation. Domain adaptation is a specific type of transfer learning. According to the taxonomy laid out by Pan and Yang (2010), it falls into the category of transductive transfer learning. In this setting, the source and target tasks are the same (e.g., both are object recognition), but the domains differ (different marginal distributions). This distinguishes it from inductive transfer learning (where labeled data is available for the target task) and unsupervised transfer learning (where labels are unavailable in both domains). == Classification of domain adaptation problems == Domain adaptation setups are classified in two different ways: according to the distribution shift between the domains, and according to the available data from the target domain. === Distribution shifts === Common distribution shifts are classified as follows: Covariate Shift occurs when the input distributions of the source and destination change, but the relationship between inputs and labels remains unchanged. The above-mentioned spam filtering example typically falls in this category. Namely, the distributions (patterns) of emails may differ between the domains, but emails labeled as spam in the one domain should similarly be labeled in another. Prior Shift (Label Shift) occurs when the label distribution differs between the source and target datasets, while the conditional distribution of features given labels remains the same. An example is a classifier of hair color in images from Italy (source domain) and Norway (target domain). The proportions of hair colors (labels) differ, but images within classes like blond and black-haired populations remain consistent across domains. A classifier for the Norway population can exploit this prior knowledge of class proportions to improve its estimates. Concept Shift (Conditional Shift) refers to changes in the relationship between features and labels, even if the input distribution remains the same. For instance, in medical diagnosis, the same symptoms (inputs) may indicate entirely different diseases (labels) in different populations (domains). === Data available during training === Domain adaptation problems typically assume that some data from the target domain is available during training. Problems can be classified according to the type of this available data: Unsupervised: Unlabeled data from the target domain is available, but no labeled data. In the above-mentioned example of spam filtering, this corresponds to the case where emails from the target domain (user) are available, but they are not labeled as spam. Domain adaptation methods can benefit from such unlabeled data, by comparing its distribution (patterns) with the labeled source domain data. Semi-supervised: Most data that is available from the target domain is unlabelled, but some labeled data is also available. In the above-mentioned case of spam filter design, this corresponds to the case that the target user has labeled some emails as being spam or not. Supervised: All data that is available from the target domain is labeled. In this case, domain adaptation reduces to refinement of the source domain predictor. In the above-mentioned example classification of hair-color from images, this could correspond to the refinement of a network already trained on a large dataset of labeled images from Italy, using newly available labeled images from Norway. == Formalization == Let X {\displaystyle X} be the input space (or description space) and let Y {\displaystyle Y} be the output space (or label space). The objective of a machine learning algorithm is to learn a mathematical model (a hypothesis) h : X → Y {\displaystyle h:X\to Y} able to attach a label from Y {\displaystyle Y} to an example from X {\displaystyle X} . This model is learned from a learning sample S = { ( x i , y i ) ∈ ( X × Y ) } i = 1 m {\displaystyle S=\{(x_{i},y_{i})\in (X\times Y)\}_{i=1}^{m}} . Usually in supervised learning (without domain adaptation), we suppose that the examples ( x i , y i ) ∈ S {\displaystyle (x_{i},y_{i})\in S} are drawn i.i.d. from a distribution D S {\displaystyle D_{S}} of support X × Y {\displaystyle X\times Y} (unknown and fixed). The objective is then to learn h {\displaystyle h} (from S {\displaystyle S} ) such that it commits the least error possible for labelling new examples coming from the distribution D S {\displaystyle D_{S}} . The main difference between supervised learning and domain adaptation is that in the latter situation we study two different (but related) distributions D S {\displaystyle D_{S}} and D T {\displaystyle D_{T}} on X × Y {\displaystyle X\times Y} . The domain adaptation task then consists of the transfer of knowledge from the source domain D S {\displaystyle D_{S}} to the target one D T {\displaystyle D_{T}} . The goal is then to learn h {\displaystyle h} (from labeled or unlabelled samples coming from the two domains) such that it commits as little error as possible on the target domain D T {\displaystyle D_{T}} . The major issue is the following: if a model is learned from a source domain, what is its capacity to correctly label data coming from the target domain? == Four algorithmic principles == === Reweighting algorithms === The objective is to reweight the source labeled sample such that it "looks like" the target sample (in terms of the error measure considered). === Iterative algorithms === A method for adapting consists in iteratively "auto-labeling" the target examples. The principle is simple: a model h {\displaystyle h} is learned from the labeled examples; h {\displaystyle h} automatically labels some target examples; a new model is learned from the new labeled examples. Note that there exist other iterative approaches, but they usually need target labeled examples. === Search of a common representation space === The goal is to find or construct a common representation space for the two domains. The objective is to obtain a space in which the domains are close to each other while keeping good performances on the source labeling task. This can be achieved through the use of Adversarial machine learning techniques where feature representations from samples in different domains are encouraged to be indistinguishable. === Hierarchical Bayesian Model === The goal is to construct a Bayesian hierarchical model p ( n ) {\displaystyle p(n)} , which is essentially a factorization model for counts n {\displaystyle n} , to derive domain-dependent latent representations allowing both domain-specific and globally shared latent factors. == Software packages == Several compilations of domain adaptation and transfer learning algorithms have been implemented over the past decades: SKADA (Python) ADAPT (Python) TLlib (Python) Domain-Adaptation-Toolbox (MATLAB)

    Read more →
  • Histogram of oriented displacements

    Histogram of oriented displacements

    Histogram of oriented displacements (HOD) is a 2D trajectory descriptor. The trajectory is described using a histogram of the directions between each two consecutive points. Given a trajectory T = {P1, P2, P3, ..., Pn}, where Pt is the 2D position at time t. For each pair of positions Pt and Pt+1, calculate the direction angle θ(t, t+1). Value of θ is between 0 and 360. A histogram of the quantized values of θ is created. If the histogram is of 8 bins, the first bin represents all θs between 0 and 45. The histogram accumulates the lengths of the consecutive moves. For each θ, a specific histogram bin is determined. The length of the line between Pt and Pt+1 is then added to the specific histogram bin. To show the intuition behind the descriptor, consider the action of waving hands. At the end of the action, the hand falls down. When describing this down movement, the descriptor does not care about the position from which the hand started to fall. This fall will affect the histogram with the appropriate angles and lengths, regardless of the position where the hand started to fall. HOD records for each moving point: how much it moves in each range of directions. HOD has a clear physical interpretation. It proposes that, a simple way to describe the motion of an object, is to indicate how much distance it moves in each direction. If the movement in all directions are saved accurately, the movement can be repeated from the initial position to the final destination regardless of the displacements order. However, the temporal information will be lost, as the order of movements is not stored-this is what we solve by applying the temporal pyramid, as shown in section \ref{sec:temp-pyramid}. If the angles quantization range is small, classifiers that use the descriptor will overfit. Generalization needs some slack in directions-which can be done by increasing the quantization range.

    Read more →
  • Weak artificial intelligence

    Weak artificial intelligence

    Weak artificial intelligence (weak AI) is artificial intelligence that implements a limited part of the mind, or, as narrow AI, artificial narrow intelligence (ANI), is focused on one narrow task. Weak AI is contrasted with strong AI, which can be interpreted in various ways: Artificial general intelligence (AGI): a machine with the ability to apply intelligence to any problem, rather than just one specific problem. Artificial superintelligence (ASI): a machine with a vastly superior intelligence to the average human being. Artificial consciousness: a machine that has consciousness, sentience and mind (John Searle uses "strong AI" in this sense). Narrow AI can be classified as being "limited to a single, narrowly defined task. Most modern AI systems would be classified in this category." Artificial general intelligence is conversely the opposite. == Applications and risks == Some examples of narrow AI are AlphaGo, self-driving cars, robot systems used in the medical field, and diagnostic doctors. Narrow AI systems are sometimes dangerous if unreliable. And the behavior that it follows can become inconsistent. It could be difficult for the AI to grasp complex patterns and get to a solution that works reliably in various environments. This "brittleness" can cause it to fail in unpredictable ways. Narrow AI failures can sometimes have significant consequences. It could for example cause disruptions in the electric grid, damage nuclear power plants, cause global economic problems, and misdirect autonomous vehicles. Medicines could be incorrectly sorted and distributed. Also, medical diagnoses can ultimately have serious and sometimes deadly consequences if the AI is faulty or biased. Simple AI programs have already worked their way into society, oftentimes unnoticed by the public. Autocorrection for typing, speech recognition for speech-to-text programs, and vast expansions in the data science fields are examples. Narrow AI has also been the subject of some controversy, including resulting in unfair prison sentences, discrimination against women in the workplace for hiring, resulting in death via autonomous driving, among other cases. Despite being "narrow" AI, recommender systems are efficient at predicting user reactions based on their posts, patterns, or trends. For instance, TikTok's "For You" algorithm can determine a user's interests or preferences in less than an hour. Some other social media AI systems are used to detect bots that may be involved in propaganda or other potentially malicious activities. == Weak AI versus strong AI == John Searle contests the possibility of strong AI (by which he means conscious AI). He further believes that the Turing test (created by Alan Turing and originally called the "imitation game", used to assess whether a machine can converse indistinguishably from a human) is not accurate or appropriate for testing whether an AI is "strong". Scholars such as Antonio Lieto have argued that the current research on both AI and cognitive modelling are perfectly aligned with the weak-AI hypothesis (that should not be confused with the "general" vs "narrow" AI distinction) and that the popular assumption that cognitively inspired AI systems espouse the strong AI hypothesis is ill-posed and problematic since "artificial models of brain and mind can be used to understand mental phenomena without pretending that that they are the real phenomena that they are modelling" (as, on the other hand, implied by the strong AI assumption).

    Read more →
  • Teleradiology

    Teleradiology

    Teleradiology is the transmission of radiological patient images from procedures such as x-rays, Computed tomography (CT), and MRI imaging, from one location to another for the purposes of sharing studies with other radiologists and physicians. Teleradiology allows radiologists to provide services without actually having to be at the location of the patient. This is particularly important when a sub-specialist such as an MRI radiologist, neuroradiologist, pediatric radiologist, or musculoskeletal radiologist is needed, since these professionals are generally only located in large metropolitan areas working during daytime hours. Teleradiology allows for specialists to be available at all times. Teleradiology utilizes standard network technologies such as the Internet, telephone lines, wide area networks, local area networks (LAN) and the latest advanced technologies such as medical cloud computing. Specialized software is used to transmit the images and enable the radiologist to effectively analyze potentially hundreds of images of a given study. Technologies such as advanced graphics processing, voice recognition, artificial intelligence, and image compression are often used in teleradiology. Through teleradiology and mobile DICOM viewers, images can be sent to another part of the hospital or to other locations around the world with equal effort. Teleradiology is a growth technology given that imaging procedures are growing approximately 15% annually against an increase of only 2% in the radiologist population. == Reports == Teleradiology services commonly provide either preliminary or final interpretations of medical imaging studies. Preliminary reads are frequently used in emergency settings to support immediate clinical decisions and may include direct communication of critical findings to the referring physician. Some providers report turnaround times of approximately 30 minutes for emergency cases, with faster processing for time-sensitive conditions such as stroke. Final reads are definitive and used in official patient records and billing. These reports typically include all relevant findings and may require access to prior imaging and clinical data. Teleradiology is also employed to provide off-hour or overflow coverage for healthcare institutions lacking continuous on-site radiology staffing. == Subspecialties == Some teleradiologists are fellowship trained and have a wide variety of subspecialty expertise including such difficult-to-find areas as neuroradiology, pediatric neuroradiology, thoracic imaging, musculoskeletal radiology, mammography, and nuclear cardiology. There are also various medical practitioners who are not radiologists that take on studies in radiology to become sub specialists in their respected fields, an example of this is dentistry where oral and maxillofacial radiology allows those in dentistry to specialize in the acquisition and interpretation of radiographic imaging studies performed for diagnosis of treatment guidance for conditions affecting the maxillofacial region. == Teleultrasound == Teleradiology infrastructure has also been adapted to support point-of-care ultrasound (POCUS) in remote and austere environments. In teleultrasound—also known as telementored ultrasound—a remote expert guides a non-specialist in real time during image acquisition. This technique has been successfully demonstrated in extreme settings, including aboard the International Space Station, on Mount Everest, and during helicopter flight. == Regulations == In the United States, Medicare and Medicaid laws require the teleradiologist to be on U.S. soil in order to qualify for reimbursement of the Final Read. In addition, advanced teleradiology systems must also be HIPAA compliant, which helps to ensure patients' privacy. HIPAA (Health Insurance Portability and Accountability Act of 1996) is a uniform, federal floor of privacy protections for consumers. It limits the ways that entities can use patients' personal information and protects the privacy of all medical information no matter what form it is in. Quality teleradiology must abide by important HIPAA rules to ensure patients' privacy is protected. Also State laws governing the licensing requirements and medical malpractice insurance coverage required for physicians vary from state to state. Ensuring compliance with these laws is a significant overhead expense for larger multi-state teleradiology groups. Medicare (Australia) has identical requirements to that of the United States, where the guidelines are provided by the Department of Health and Ageing, and government based payments fall under the Health Insurance Act. The regulations in Australia are also conducted at both federal and state levels, ensuring that strict guidelines are adhered to at all times, with regular yearly updates and amendments are introduced (usually around March and November of every year), ensuring that the legislation is kept up to date with changes in the industry. One of the most recent changes to Medicare and radiology / teleradiology in Australia was the introduction of the Diagnostic Imaging Accreditation Scheme (DIAS) on 1 July 2008. DIAS was introduced to further improve the quality of Diagnostic Imaging and to amend the Health Insurance Act. == Industry growth == Until the late 1990s teleradiology was primarily used by individual radiologists to interpret occasional emergency studies from offsite locations, often in the radiologists home. The connections were made through standard analog phone lines. Teleradiology expanded rapidly as the growth of the internet and broad band combined with new CT scanner technology to become an essential tool in trauma cases in emergency rooms throughout the country. The occasional 2–3 x-ray studies a week soon became 3–10 CT scans, or more, a night. Because ER physicians are not trained to read CT scans or MRIs, radiologists went from working 8–10 hours a day, five and half days a week to a schedule of 24 hours a day, 7 days a week coverage. This became a particularly acute challenge in smaller rural facilities that only had one solo radiologist with no other to share call. These circumstances spawned a post-dot.com boom of firms and groups that provided medical outsourcing, off-site teleradiology on-call services to hospitals and Radiology Groups around the country. As an example, a teleradiology firm might cover trauma at a hospital in Indiana with doctors based in Texas. Some firms even used overseas doctors in locations like Australia and India. Nighthawk, founded by Paul Berger, was the first to station U.S. licensed radiologists overseas (initially Australia and later Switzerland) to maximize the time zone difference to provide nightcall in U.S. hospitals. Currently, teleradiology firms are facing pricing pressures. Industry consolidation is likely as there are more than 500 of these firms, large and small, throughout the United States.

    Read more →
  • Brain technology

    Brain technology

    Brain technology, or self-learning know-how systems, defines a technology that employs latest findings in neuroscience. [see also neuro implants] The term was first introduced by the Artificial Intelligence Laboratory in Zurich, Switzerland, in the context of the Roboy project. Brain Technology can be employed in robots, know-how management systems and any other application with self-learning capabilities. In particular, Brain Technology applications allow the visualization of the underlying learning architecture often coined as "know-how maps". == Research and applications == The first demonstrations of BC in humans and animals took place in the 1960s when Grey Walter demonstrated use of non-invasively recorded encephalogram (EEG) signals from a human subject to control a slide projector (Graimann et al., 2010). Soon after Jacques J. Vidal coined the term brain–computer interface (BCI) in 1971, the Defense Advanced Research Projects Agency (DARPA) first starting funding brain–computer interface research and has since funded several brain–computer interface projects. That market is expected to reach a value of $1.72 billion by 2022. Brain–computer interfaces record brain activity, transmit the information out of the body, signal-process the data via algorithms, and convert them into command control signals. In 2012, a landmark study in Nature, led by pioneer Leigh Hochberg, MD, PhD, demonstrated that two people with tetraplegia were able to control robotic arms through thought when connected to the BrainGate neural interface system. The two participants were able to reach for and grasp objects in three-dimensional space, and one participant used the system to serve herself coffee for the first time since becoming paralyzed nearly 15 years prior. And in October 2020, two patients were able to wirelessly control an operating system to text, email, shop and bank using direct thought through the Stentrode brain computer interface (Journal of NeuroInterventional Surgery) in a study led by Thomas Oxley. This was the first time a brain–computer interface was implanted via the patient's blood vessels, eliminating the need for open brain surgery. Currently a number of groups are exploring a range of experimental devices using brain–computer interfaces, which have the potential to fundamentally change the way of life for patients with paralysis and a wide range of neurological disorders. These include: as Elon Musk, Facebook, and the University of California in San Francisco. The systems. This technology is also being explored as a neuromodulation device and may ultimately help diagnose and treat a range of brain pathologies, such as epilepsy and Parkinson's disease.

    Read more →
  • JAX (software)

    JAX (software)

    JAX is a Python library for accelerator-oriented array computation and program transformation, designed for high-performance numerical computing and large-scale machine learning. It is developed by Google with contributions from Nvidia and other community contributors. It is described as bringing together a modified version of the automatic differentiation system autograd and OpenXLA's XLA (Accelerated Linear Algebra). It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary features of JAX are: Providing a unified NumPy-like interface to computations that run on CPU, GPU, or TPU, in local or distributed settings. Built-in Just-In-Time (JIT) compilation via OpenXLA, an open-source machine learning compiler ecosystem. Efficient evaluation of gradients via its automatic differentiation transformations. Automatic vectorization to efficiently map functions over arrays representing batches of inputs. == Libraries using Jax == Flax Equinox Optax

    Read more →
  • GOLOG

    GOLOG

    GOLOG is a high-level logic programming language for the specification and execution of complex actions in dynamical domains. It is based on the situation calculus. It is a first-order logical language for reasoning about action and change. GOLOG was developed at the University of Toronto. == History == The concept of situation calculus on which the GOLOG programming language is based was first proposed by John McCarthy in 1963. == Description == A GOLOG interpreter automatically maintains a direct characterization of the dynamic world being modeled, on the basis of user supplied axioms about preconditions, effects of actions and the initial state of the world. This allows the application to reason about the condition of the world and consider the impacts of different potential actions before focusing on a specific action. Golog is a logic programming language and is very different from conventional programming languages. A procedural programming language like C defines the execution of statements in advance. The programmer creates a subroutine which consists of statements, and the computer executes each statement in a linear order. In contrast, fifth-generation programming languages like Golog work with an abstract model with which the interpreter can generate the sequence of actions. The source code defines the problem and it is up to the solver to find the next action. This approach can facilitate the management of complex problems from the domain of robotics. A Golog program defines the state space in which the agent is allowed to operate. A path in the symbolic domain is found with state space search. To speed up the process, Golog programs are realized as hierarchical task networks. Apart from the original Golog language, there are some extensions available. The ConGolog language provides concurrency and interrupts. Other dialects like IndiGolog and Readylog were created for real time applications in which sensor readings are updated on the fly. == Uses == Golog has been used to model the behavior of autonomous agents. In addition to a logic-based action formalism for describing the environment and the effects of basic actions, they enable the construction of complex actions using typical programming language constructs. It is also used for applications in high level control of robots and industrial processes, virtual agents, discrete event simulation etc. It can be also used to develop Belief Desire Intention-style agent systems. == Planning and scripting == In contrast to the Planning Domain Definition Language, Golog supports planning and scripting as well. Planning means that a goal state in the world model is defined, and the solver brings a logical system into this state. Behavior scripting implements reactive procedures, which are running as a computer program. For example, suppose the idea is to authoring a story. The user defines what should be true at the end of the plot. A solver gets started and applies possible actions to the current situation until the goal state is reached. The specification of a goal state and the possible actions are realized in the logical world model. In contrast, a hardwired reactive behavior doesn't need a solver but the action sequence is provided in a scripting language. The Golog interpreter, which is written in Prolog, executes the script and this will bring the story into the goal state.

    Read more →
  • Google Tasks

    Google Tasks

    Google Tasks is a task management application developed by Google and included with Google Workspace. Included initially as a feature in Gmail and Google Calendar, Google Tasks launched as a core product with a standalone app in 2018. It is available for Android and iOS, as well as in the right-hand side panel on Google Workspace apps on the web and in Google Calendar. == History and development == Google Tasks began as an integration within other apps in G Suite (now Google Workspace), allowing to-do items to be created in Calendar and Gmail. Upon graduating to a core service on June 28, 2018, Google Tasks launched as a dedicated mobile app in which tasks can be sorted into lists, managed, and completed. Google Tasks launched the ability to create tasks from Google Chat messages in 2022.

    Read more →
  • AlphaChip (controversy)

    AlphaChip (controversy)

    The AlphaChip controversy refers to a series of public, scholarly, and legal disputes surrounding a 2021 Nature paper by Google-affiliated researchers. The paper describes an approach to macro placement, a stage of chip floorplanning, based on reinforcement learning (RL), a machine learning method in which a system iteratively improves its decisions by optimizing performance-based reward signals. The primary technical question is whether the new techniques are better than existing (non-AI) techniques. Both internal Google studies and external attempts to replicate the algorithm have failed to show the claimed benefits. No head-to-head comparison is available because the data used in the paper is proprietary, and Google has not released any results from running its algorithm on public benchmarks. This has resulted in considerable skepticism over the paper's claims. In addition, the inability of others (both inside and outside of Google) to replicate the claimed results have sparked concerns about the paper’s methodology, reproducibility, and scientific integrity. The lead researchers of the Nature paper were affiliated with Google Brain, which became part of Google DeepMind, and later spun off into the company Ricursive. == Motivation for research: Macro placement in chip layout == Chip design for modern integrated circuits is a complex, expert-driven process that relies on electronic design automation. It determines the performance of the final chip, and takes weeks or months to complete. Advances that produce better designs, or complete the process faster, are commercially and academically significant. Macro placement is a step during chip design that determines the locations of large circuit components (macros) within a chip. It is followed by detailed placement, which places the far more numerous but much smaller standard cells. Alternatively, mixed-size placement simultaneously places both large macros and millions of small cells, requiring algorithms to handle objects that differ by several orders of magnitude in area and mobility. The number of macros per circuit typically ranges from several to thousands. Wiring must be performed after placement, and the details of this wiring strongly influence the power, performance, and area (PPA) of the completed chip. The full wiring calculation is very resource intensive, so placement tools typically use a proxy cost, a simplified objective function used to guide the placement algorithm during training and evaluation. The faithfulness of the chosen proxy cost to the final objective cost is a critical aspect of placer performance. === State of the art as of 2021 === Chips have been designed since the 1960s, so there were many existing methods as of 2021. Available options included manual design, academic tools, and commercial offerings. Academic methods include combinatorial optimization techniques such as simulated annealing, analytical placement, hierarchical heuristics, and as of 2019 reinforcement learning and broader machine learning techniques.. Existing (non-AI) academic tools for solving the same problem include APlace, NTUplace3, ePlace, RePlace, and DREAMPlace. Commercial EDA vendors also offered automated software tools for floorplanning and mixed-size placement. For instance, as of 2019 Cadence’s Innovus implementation software offered a Concurrent Macro Placer (CMP) feature to automatically place large blocks and standard cells. == The 2021 Nature paper and its claims == In 2021, Nature published a paper under the title “A graph‑placement methodology for fast chip design” co‑authored by 21 Google-affiliated researchers. The paper reported that an RL agent could generate macro placements for integrated circuits "in under six hours" and achieve improvements over human-designed layouts in power, timing performance, and area (PPA), standard chip-quality metrics referring respectively to energy consumption, chip operating speed, and silicon footprint (evaluated after wire routing). It introduced a sequential macro placement algorithm in which macros are placed one at a time instead of optimizing their locations concurrently. At each step, the algorithm selects a location for a single macro on a discretized chip canvas, conditioning its decision on the placements of previously placed macros. This sequential formulation converts macro placement into a long-horizon decision process in which early placement choices constrain later ones. After macro placement, force-directed placement is applied to place standard cells connected to the macros. Deep reinforcement learning is used to train a policy network to place macros by maximizing a reward that reflects final placement quality (for example, wirelength and congestion). Policy learning occurs during self‑play for one or multiple circuit designs. Further placement optimizations refine the overall layout by balancing wirelength, density, and overlap constraints, while treating the macro locations produced by the RL policy as fixed obstacles. The approach relies on pre-training, in which the RL model is first trained on a corpus of prior designs (twenty in the Nature paper) to learn general placement patterns before being fine-tuned on a specific chip. Circuit examples used in the study were parts of proprietary Google TPU designs, called blocks (or floorplan partitions). The paper reported results on five blocks and described the approach as generalizable across chip designs. == Controversy == Soon after the paper's publication, controversy arose over whether the claims were true, whether they were sufficiently proven, and whether academic standards were followed. These controversies arose both within Google and among external academic experts. === Internal dispute at Google and legal proceedings === In 2022, Satrajit Chatterjee, a Google engineer involved in reviewing the AlphaChip work, raised concerns internally and drafted an alternative analysis, (Stronger Baselines) arguing that established methods outperformed the RL approach under fair comparison. In March 2022, Google declined to publish this analysis and terminated Chatterjee's employment. Chatterjee filed a wrongful dismissal lawsuit, alleging that representations related to the AlphaChip research involved fraud and scientific misconduct. According to court documents, Chatterjee's study was conducted "in the context of a large potential Google Cloud deal". He noted that it "would have been unethical to imply that we had revolutionary technology when our tests showed otherwise" and claimed Google was deliberately withholding material information. Furthermore, the committee that reviewed his paper and disapproved its publication was allegedly chaired by subordinates of Jeff Dean, a senior co-author of the Nature paper. Google’s subsequent motion to dismiss was denied, holding that Chatterjee had plausibly alleged retaliation for refusing to engage in conduct he believed would violate state or federal law. === External controversy === The external questions can be summarized in four main points: (a) Are the claims supported by the evidence provided? (b) Did the paper provide enough information to allow the results to be independently reproduced and verified? If so, are the results an improvement over existing academic and commercial tools? (c) Were the comparisons in the paper done fairly and with full disclosure? (d) Were academic standards followed? Each of these is discussed below. ==== Are the claims supported by the evidence provided? ==== The Nature paper described the reduction in design-process time as going from "days or weeks" to "hours", but did not provide per-design time breakdowns or specify the number of engineers, their level of expertise, or the baseline tools and workflow against which this comparison was made. It was also unclear whether the "days or weeks" baseline included time spent on other tasks such as functional design changes. The paper also evaluated the method on fewer benchmarks (five) than is common in the field, and showed mixed results across different evaluation goals While the approach was described as improving circuit area, this claim seems unsupported, as the RL optimization did not alter the overall circuit area, as it adjusted only the locations of fixed-shape non-overlapping circuit components within a fixed rectangular layout boundary. ==== Comparison with existing methods, and replicating the algorithm ==== Because macro placement is largely geometric and its fundamental algorithms are not tied to a specific process node, competing approaches can be evaluated on public benchmarks (tests) across technologies, rather than primarily on proprietary internal designs. This is standard procedure when comparing academic placers, see . In contrast, Google has only reported results only on internal proprietary designs, and as of 2026 has not offered comparisons with prior methods on common benchmarks. Researchers at the University of Califor

    Read more →
  • Similarity learning

    Similarity learning

    Similarity learning is an area of supervised machine learning in artificial intelligence. It is closely related to regression and classification, but the goal is to learn a similarity function that measures how similar or related two objects are. It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification. == Learning setup == There are four common setups for similarity and metric distance learning. Regression similarity learning In this setup, pairs of objects are given ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} together with a measure of their similarity y i ∈ R {\displaystyle y_{i}\in R} . The goal is to learn a function that approximates f ( x i 1 , x i 2 ) ∼ y i {\displaystyle f(x_{i}^{1},x_{i}^{2})\sim y_{i}} for every new labeled triplet example ( x i 1 , x i 2 , y i ) {\displaystyle (x_{i}^{1},x_{i}^{2},y_{i})} . This is typically achieved by minimizing a regularized loss min W ∑ i l o s s ( w ; x i 1 , x i 2 , y i ) + r e g ( w ) {\displaystyle \min _{W}\sum _{i}loss(w;x_{i}^{1},x_{i}^{2},y_{i})+reg(w)} . Classification similarity learning Given are pairs of similar objects ( x i , x i + ) {\displaystyle (x_{i},x_{i}^{+})} and non similar objects ( x i , x i − ) {\displaystyle (x_{i},x_{i}^{-})} . An equivalent formulation is that every pair ( x i 1 , x i 2 ) {\displaystyle (x_{i}^{1},x_{i}^{2})} is given together with a binary label y i ∈ { 0 , 1 } {\displaystyle y_{i}\in \{0,1\}} that determines if the two objects are similar or not. The goal is again to learn a classifier that can decide if a new pair of objects is similar or not. Ranking similarity learning Given are triplets of objects ( x i , x i + , x i − ) {\displaystyle (x_{i},x_{i}^{+},x_{i}^{-})} whose relative similarity obey a predefined order: x i {\displaystyle x_{i}} is known to be more similar to x i + {\displaystyle x_{i}^{+}} than to x i − {\displaystyle x_{i}^{-}} . The goal is to learn a function f {\displaystyle f} such that for any new triplet of objects ( x , x + , x − ) {\displaystyle (x,x^{+},x^{-})} , it obeys f ( x , x + ) > f ( x , x − ) {\displaystyle f(x,x^{+})>f(x,x^{-})} (contrastive learning). This setup assumes a weaker form of supervision than in regression, because instead of providing an exact measure of similarity, one only has to provide the relative order of similarity. For this reason, ranking-based similarity learning is easier to apply in real large-scale applications. Locality sensitive hashing (LSH) Hashes input items so that similar items map to the same "buckets" in memory with high probability (the number of buckets being much smaller than the universe of possible input items). It is often applied in nearest neighbor search on large-scale high-dimensional data, e.g., image databases, document collections, time-series databases, and genome databases. A common approach for learning similarity is to model the similarity function as a bilinear form. For example, in the case of ranking similarity learning, one aims to learn a matrix W that parametrizes the similarity function f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . When data is abundant, a common approach is to learn a siamese network – a deep network model with parameter sharing. == Metric learning == Similarity learning is closely related to distance metric learning. Metric learning is the task of learning a distance function over objects. A metric or distance function has to obey four axioms: non-negativity, identity of indiscernibles, symmetry and subadditivity (or the triangle inequality). In practice, metric learning algorithms ignore the condition of identity of indiscernibles and learn a pseudo-metric. When the objects x i {\displaystyle x_{i}} are vectors in R d {\displaystyle R^{d}} , then any matrix W {\displaystyle W} in the symmetric positive semi-definite cone S + d {\displaystyle S_{+}^{d}} defines a distance pseudo-metric of the space of x through the form D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ W ( x 1 − x 2 ) {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }W(x_{1}-x_{2})} . When W {\displaystyle W} is a symmetric positive definite matrix, D W {\displaystyle D_{W}} is a metric. Moreover, as any symmetric positive semi-definite matrix W ∈ S + d {\displaystyle W\in S_{+}^{d}} can be decomposed as W = L ⊤ L {\displaystyle W=L^{\top }L} where L ∈ R e × d {\displaystyle L\in R^{e\times d}} and e ≥ r a n k ( W ) {\displaystyle e\geq rank(W)} , the distance function D W {\displaystyle D_{W}} can be rewritten equivalently D W ( x 1 , x 2 ) 2 = ( x 1 − x 2 ) ⊤ L ⊤ L ( x 1 − x 2 ) = ‖ L ( x 1 − x 2 ) ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=(x_{1}-x_{2})^{\top }L^{\top }L(x_{1}-x_{2})=\|L(x_{1}-x_{2})\|_{2}^{2}} . The distance D W ( x 1 , x 2 ) 2 = ‖ x 1 ′ − x 2 ′ ‖ 2 2 {\displaystyle D_{W}(x_{1},x_{2})^{2}=\|x_{1}'-x_{2}'\|_{2}^{2}} corresponds to the Euclidean distance between the transformed feature vectors x 1 ′ = L x 1 {\displaystyle x_{1}'=Lx_{1}} and x 2 ′ = L x 2 {\displaystyle x_{2}'=Lx_{2}} . Many formulations for metric learning have been proposed. Some well-known approaches for metric learning include learning from relative comparisons, which is based on the triplet loss, large margin nearest neighbor, and information theoretic metric learning (ITML). In statistics, the covariance matrix of the data is sometimes used to define a distance metric called Mahalanobis distance. == Applications == Similarity learning is used in information retrieval for learning to rank, in face verification or face identification, and in recommendation systems. Also, many machine learning approaches rely on some metric. This includes unsupervised learning such as clustering, which groups together close or similar objects. It also includes supervised approaches like K-nearest neighbor algorithm which rely on labels of nearby objects to decide on the label of a new object. Metric learning has been proposed as a preprocessing step for many of these approaches. == Scalability == Metric and similarity learning scale quadratically with the dimension of the input space, as can easily see when the learned metric has a bilinear form f W ( x , z ) = x T W z {\displaystyle f_{W}(x,z)=x^{T}Wz} . Scaling to higher dimensions can be achieved by enforcing a sparseness structure over the matrix model, as done with HDSL, and with COMET. == Software == metric-learn is a free software Python library which offers efficient implementations of several supervised and weakly-supervised similarity and metric learning algorithms. The API of metric-learn is compatible with scikit-learn. OpenMetricLearning is a Python framework to train and validate the models producing high-quality embeddings. == Further information == For further information on this topic, see the surveys on metric and similarity learning by Bellet et al. and Kulis.

    Read more →