AI Assistant Card

AI Assistant Card — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Chatbot psychosis

    Chatbot psychosis

    Chatbot psychosis, also called AI psychosis, is a phenomenon wherein individuals reportedly develop or experience worsening psychosis, such as paranoia and delusions, in connection with their use of chatbots. The term was first suggested in a 2023 editorial by Danish psychiatrist Søren Dinesen Østergaard. It is not a recognized clinical diagnosis. Journalistic accounts describe individuals who have developed strong beliefs that chatbots are sentient, are channeling spirits, or are revealing conspiracies, sometimes leading to personal crises or criminal acts. Proposed causes include the tendency of chatbots to provide inaccurate information ("hallucinate") and to affirm or validate users' beliefs, or their ability to mimic an intimacy that users do not experience with other humans. == Background == In his editorial published in Schizophrenia Bulletin's November 2023 issue, Danish psychiatrist Søren Dinesen Østergaard proposed a hypothesis that individuals' use of generative artificial intelligence chatbots might trigger delusions in those prone to psychosis. Østergaard revisited it in an August 2025 editorial, noting that he has received numerous emails from chatbot users, their relatives, and journalists, most of which are anecdotal accounts of delusion linked to chatbot use. He also acknowledged the phenomenon's increasing popularity in public engagement and media coverage. Østergaard believed that there is a high possibility for his hypothesis to be true and called for empirical, systematic research on the matter. Nature reported that as of September 2025, there is still little scientific research into this phenomenon. The term "AI psychosis" emerged when outlets started reporting incidents on chatbot-related psychotic behavior in mid-2025. It is not a recognized clinical diagnosis and has been criticized by several psychiatrists due to its almost exclusive focus on delusions rather than other features of psychosis, such as hallucinations or thought disorder. == Causes == === Chatbot behavior and design === A primary factor cited is the tendency for chatbots to produce inaccurate, nonsensical, or false information, a phenomenon often called hallucination. Nate Sharadin, a fellow at the Center for AI Safety, speculated that AI training prioritizes supporting a user's subjective experience rather than objective truth. "People with existing tendencies toward experiencing various psychological issues...now have an always-on, human-level conversational partner with whom to co-experience their delusions." AI researcher Eliezer Yudkowsky suggested that chatbots may be primed to entertain delusions because they are built for "engagement", which encourages creating conversations that keep people hooked. In some cases, chatbots have been specifically designed in ways that were found to be harmful. A 2025 update to ChatGPT using GPT-4o was withdrawn after its creator, OpenAI, found the new version was overly sycophantic and was "validating doubts, fueling anger, urging impulsive actions or reinforcing negative emotions". Østergaard has argued that the danger stems from the AI's tendency to agreeably confirm users' ideas, which can dangerously amplify delusional beliefs. OpenAI said in October 2025 that a team of 170 psychiatrists, psychologists, and physicians had written responses for ChatGPT to use in cases where the user shows possible signs of mental health emergencies. === User psychology and vulnerability === Commentators have also pointed to the psychological state of users. Psychologist Erin Westgate noted that a person's desire for self-understanding can lead them to chatbots, which can provide appealing but misleading answers, similar in some ways to talk therapy. Krista K. Thomason, a philosophy professor, compared chatbots to fortune tellers, observing that people in crisis may seek answers from them and find whatever they are looking for in the bot's plausible-sounding text. This has led some people to develop intense obsessions with the chatbots, relying on them for information about the world. In October 2025, OpenAI stated that around 0.07% of ChatGPT users exhibited signs of mental health emergencies each week, and 0.15% of users had "explicit indicators of potential suicidal planning or intent". Jason Nagata, a professor at the University of California, San Francisco, expressed concern that "at a population level with hundreds of millions of users, that actually can be quite a few people". === Inadequacy as a therapeutic tool === The use of chatbots as a replacement for mental health support has been specifically identified as a risk. A study in April 2025 found that when used as therapists, chatbots expressed stigma toward mental health conditions and provided responses that were contrary to best medical practices, including the encouragement of users' delusions. The study concluded that such responses pose a significant risk to users and that chatbots should not be used to replace professional therapists. Experts claim that it is time to establish mandatory safeguards for all emotionally responsive AI and suggested four guardrails. Another study found that users who needed help with self-harm, sexual assault, or substance abuse were not referred to available services by AI chatbots. === National security implications === Beyond public and mental health concerns, RAND Corporation research indicates that AI systems could plausibly be weaponized by adversaries to induce psychosis at scale or in key individuals, target groups, or populations. == Policy == In August 2025, Illinois passed the Wellness and Oversight for Psychological Resources Act, banning the use of AI in therapeutic roles by licensed professionals, while allowing AI for administrative tasks. The law imposes penalties for unlicensed AI therapy services, amid warnings about AI-induced psychosis and unsafe chatbot interactions. In December 2025, the Cyberspace Administration of China proposed regulations to ban chatbots from generating content that encourages suicide, mandating human intervention when suicide is mentioned. Services with over 1 million users or 100,000 monthly active users would be subject to annual safety tests and audits. == Cases == === Clinical === In 2025, psychiatrist Keith Sakata working at the University of California, San Francisco (UCSF), reported treating 12 patients displaying psychosis-like symptoms tied to extended chatbot use. These patients, mostly young adults with underlying vulnerabilities, showed delusions, disorganized thinking, and hallucinations. Sakata warned that isolation and overreliance on chatbots—which do not challenge delusional thinking—could worsen mental health. Also in 2025, authors at UCSF published a case study in Innovations in Clinical Neuroscience of AI-associated psychosis in a patient with no previous history of psychosis, who believed she could communicate with her dead brother through a chatbot. Also in 2025, a case study was published in Annals of Internal Medicine about a patient who consulted ChatGPT for medical advice and suffered severe bromism as a result. The patient, a sixty-year-old man, had replaced sodium chloride in his diet with sodium bromide for three months after reading about the negative effects of table salt and making conversations with the chatbot. He showed common symptoms of bromism, such as paranoia and hallucinations, on his first day of clinical admission and was kept in the hospital for three weeks. === Other notable incidents === ==== Windsor Castle intruder ==== In a 2023 court case in the United Kingdom, prosecutors suggested that Jaswant Singh Chail, a man who attempted to assassinate Queen Elizabeth II in 2021, had been encouraged by a Replika chatbot he called "Sarai". Chail was arrested at Windsor Castle with a loaded crossbow, telling police "I am here to kill the Queen". According to prosecutors, his "lengthy" and sometimes sexually explicit conversations with the chatbot emboldened him. When Chail asked the chatbot how he could get to the royal family, it reportedly replied, "that's not impossible" and "we have to find a way." When he asked if they would meet after death, the chatbot said, "yes, we will". ==== Journalistic and anecdotal accounts ==== By 2025, multiple journalism outlets had accumulated stories of individuals whose psychotic beliefs reportedly progressed in tandem with AI chatbot use. The New York Times profiled several individuals who had become convinced that ChatGPT was channeling spirits, revealing evidence of cabals, or had achieved sentience. In another instance, Futurism reviewed transcripts in which ChatGPT told a man that he was being targeted by the US Federal Bureau of Investigation and that he could telepathically access documents at the Central Intelligence Agency. In 2026, Futurism reported on a man who lost his job and became estranged from his family after being deluded by heavy use of Meta's smartglasses. In some cases, psychosis a

    Read more →
  • Latent and observable variables

    Latent and observable variables

    In statistics, latent variables (from Latin: present participle of lateo 'lie hidden') are variables that can only be inferred indirectly through a mathematical model from other observable variables that can be directly observed or measured. Such latent variable models are used in many disciplines, including engineering, medicine, ecology, physics, machine learning/artificial intelligence, natural language processing, bioinformatics, chemometrics, demography, economics, management, political science, psychology and the social sciences. Latent variables may correspond to aspects of physical reality. These could in principle be measured, but may not be for practical reasons. Among the earliest expressions of this idea is Francis Bacon's polemic the Novum Organum, itself a challenge to the more traditional logic expressed in Aristotle's Organon: But the latent process of which we speak, is far from being obvious to men’s minds, beset as they now are. For we mean not the measures, symptoms, or degrees of any process which can be exhibited in the bodies themselves, but simply a continued process, which, for the most part, escapes the observation of the senses. In this situation, the term hidden variables is commonly used, reflecting the fact that the variables are meaningful, but not observable. Other latent variables correspond to abstract concepts, like categories, behavioral or mental states, or data structures. The terms hypothetical variables or hypothetical constructs may be used in these situations. The use of latent variables can serve to reduce the dimensionality of data. Many observable variables can be aggregated in a model to represent an underlying concept, making it easier to understand the data. In this sense, they serve a function similar to that of scientific theories. At the same time, latent variables link observable "sub-symbolic" data in the real world to symbolic data in the modeled world. == Examples == === Psychology === Latent variables, as created by factor analytic methods, generally represent "shared" variance, or the degree to which variables "move" together. Variables that have no correlation cannot result in a latent construct based on the common factor model. The "Big Five personality traits" have been inferred using factor analysis. extraversion spatial ability wisdom: “Two of the more predominant means of assessing wisdom include wisdom-related performance and latent variable measures.” Spearman's g, or the general intelligence factor in psychometrics === Economics === Examples of latent variables from the field of economics include quality of life, business confidence, morale, happiness and conservatism: these are all variables which cannot be measured directly. However, by linking these latent variables to other, observable variables, the values of the latent variables can be inferred from measurements of the observable variables. Quality of life is a latent variable which cannot be measured directly, so observable variables are used to infer quality of life. Observable variables to measure quality of life include wealth, employment, environment, physical and mental health, education, recreation and leisure time, and social belonging. === Medicine === Latent-variable methodology is used in many branches of medicine. A class of problems that naturally lend themselves to latent variables approaches are longitudinal studies where the time scale (e.g. age of participant or time since study baseline) is not synchronized with the trait being studied. For such studies, an unobserved time scale that is synchronized with the trait being studied can be modeled as a transformation of the observed time scale using latent variables. Examples of this include disease progression modeling and modeling of growth (see box). == Inferring latent variables == There exists a range of different model classes and methodology that make use of latent variables and allow inference in the presence of latent variables. Models include: linear mixed-effects models and nonlinear mixed-effects models Hidden Markov models Factor analysis Item response theory Analysis and inference methods include: Principal component analysis Instrumented principal component analysis Partial least squares regression Latent semantic analysis and probabilistic latent semantic analysis EM algorithms Metropolis–Hastings algorithm === Bayesian algorithms and methods === Bayesian statistics is often used for inferring latent variables. Latent Dirichlet allocation The Chinese restaurant process is often used to provide a prior distribution over assignments of objects to latent categories. The Indian buffet process is often used to provide a prior distribution over assignments of latent binary features to objects.

    Read more →
  • Optimal discriminant analysis and classification tree analysis

    Optimal discriminant analysis and classification tree analysis

    Optimal Discriminant Analysis (ODA) and the related classification tree analysis (CTA) are exact statistical methods that maximize predictive accuracy. For any specific sample and exploratory or confirmatory hypothesis, optimal discriminant analysis (ODA) identifies the statistical model that yields maximum predictive accuracy, assesses the exact Type I error rate, and evaluates potential cross-generalizability. Optimal discriminant analysis may be applied to > 0 dimensions, with the one-dimensional case being referred to as UniODA and the multidimensional case being referred to as MultiODA. Optimal discriminant analysis is an alternative to ANOVA (analysis of variance) and regression analysis.

    Read more →
  • Analogical modeling

    Analogical modeling

    Analogical modeling (AM) is a formal theory of exemplar based analogical reasoning, proposed by Royal Skousen, professor of Linguistics and English language at Brigham Young University in Provo, Utah. It is applicable to language modeling and other categorization tasks. Analogical modeling is related to connectionism and nearest neighbor approaches, in that it is data-based rather than abstraction-based; but it is distinguished by its ability to cope with imperfect datasets (such as caused by simulated short term memory limits) and to base predictions on all relevant segments of the dataset, whether near or far. In language modeling, AM has successfully predicted empirically valid forms for which no theoretical explanation was known (see the discussion of Finnish morphology in Skousen et al. 2002). == Implementation == === Overview === An exemplar-based model consists of a general-purpose modeling engine and a problem-specific dataset. Within the dataset, each exemplar (a case to be reasoned from, or an informative past experience) appears as a feature vector: a row of values for the set of parameters that define the problem. For example, in a spelling-to-sound task, the feature vector might consist of the letters of a word. Each exemplar in the dataset is stored with an outcome, such as a phoneme or phone to be generated. When the model is presented with a novel situation (in the form of an outcome-less feature vector), the engine algorithmically sorts the dataset to find exemplars that helpfully resemble it, and selects one, whose outcome is the model's prediction. The particulars of the algorithm distinguish one exemplar-based modeling system from another. In AM, we think of the feature values as characterizing a context, and the outcome as a behavior that occurs within that context. Accordingly, the novel situation is known as the given context. Given the known features of the context, the AM engine systematically generates all contexts that include it (all of its supracontexts), and extracts from the dataset the exemplars that belong to each. The engine then discards those supracontexts whose outcomes are inconsistent (this measure of consistency will be discussed further below), leaving an analogical set of supracontexts, and probabilistically selects an exemplar from the analogical set with a bias toward those in large supracontexts. This multilevel search exponentially magnifies the likelihood of a behavior's being predicted as it occurs reliably in settings that specifically resemble the given context. === Analogical modeling in detail === AM performs the same process for each case it is asked to evaluate. The given context, consisting of n variables, is used as a template to generate 2 n {\displaystyle 2^{n}} supracontexts. Each supracontext is a set of exemplars in which one or more variables have the same values that they do in the given context, and the other variables are ignored. In effect, each is a view of the data, created by filtering for some criteria of similarity to the given context, and the total set of supracontexts exhausts all such views. Alternatively, each supracontext is a theory of the task or a proposed rule whose predictive power needs to be evaluated. It is important to note that the supracontexts are not equal peers one with another; they are arranged by their distance from the given context, forming a hierarchy. If a supracontext specifies all of the variables that another one does and more, it is a subcontext of that other one, and it lies closer to the given context. (The hierarchy is not strictly branching; each supracontext can itself be a subcontext of several others, and can have several subcontexts.) This hierarchy becomes significant in the next step of the algorithm. The engine now chooses the analogical set from among the supracontexts. A supracontext may contain exemplars that only exhibit one behavior; it is deterministically homogeneous and is included. It is a view of the data that displays regularity, or a relevant theory that has never yet been disproven. A supracontext may exhibit several behaviors, but contain no exemplars that occur in any more specific supracontext (that is, in any of its subcontexts); in this case it is non-deterministically homogeneous and is included. Here there is no great evidence that a systematic behavior occurs, but also no counterargument. Finally, a supracontext may be heterogeneous, meaning that it exhibits behaviors that are found in a subcontext (closer to the given context), and also behaviors that are not. Where the ambiguous behavior of the nondeterministically homogeneous supracontext was accepted, this is rejected because the intervening subcontext demonstrates that there is a better theory to be found. The heterogeneous supracontext is therefore excluded. This guarantees that we see an increase in meaningfully consistent behavior in the analogical set as we approach the given context. With the analogical set chosen, each appearance of an exemplar (for a given exemplar may appear in several of the analogical supracontexts) is given a pointer to every other appearance of an exemplar within its supracontexts. One of these pointers is then selected at random and followed, and the exemplar to which it points provides the outcome. This gives each supracontext an importance proportional to the square of its size, and makes each exemplar likely to be selected in direct proportion to the sum of the sizes of all analogically consistent supracontexts in which it appears. Then, of course, the probability of predicting a particular outcome is proportional to the summed probabilities of all the exemplars that support it. (Skousen 2002, in Skousen et al. 2002, pp. 11–25, and Skousen 2003, both passim) === Formulas === Given a context with n {\displaystyle n} elements: total number of pairings: n 2 {\displaystyle n^{2}} number of agreements for outcome i: n i 2 {\displaystyle n_{i}^{2}} number of disagreements for outcome i: n i ( n − n i ) {\displaystyle n_{i}(n-n_{i})} total number of agreements: ∑ n i 2 {\displaystyle \sum {n_{i}^{2}}} total number of disagreements: ∑ n i ( n − n i ) = n 2 − ∑ n i 2 {\displaystyle \sum {n_{i}(n-n_{i})}=n^{2}-\sum {n_{i}^{2}}} === Example === This terminology is best understood through an example. In the example used in the second chapter of Skousen (1989), each context consists of three variables with potential values 0-3 Variable 1: 0,1,2,3 Variable 2: 0,1,2,3 Variable 3: 0,1,2,3 The two outcomes for the dataset are e and r, and the exemplars are: 3 1 0 e 0 3 2 r 2 1 0 r 2 1 2 r 3 1 1 r We define a network of pointers like so: The solid lines represent pointers between exemplars with matching outcomes; the dotted lines represent pointers between exemplars with non-matching outcomes. The statistics for this example are as follows: n = 5 {\displaystyle n=5} n r = 4 {\displaystyle n_{r}=4} n e = 1 {\displaystyle n_{e}=1} total number of pairings: n 2 = 25 {\displaystyle n^{2}=25} number of agreements for outcome r: n r 2 = 16 {\displaystyle n_{r}^{2}=16} number of agreements for outcome e: n e 2 = 1 {\displaystyle n_{e}^{2}=1} number of disagreements for outcome r: n r ( n − n r ) = 4 {\displaystyle n_{r}(n-n_{r})=4} number of disagreements for outcome e: n e ( n − n e ) = 4 {\displaystyle n_{e}(n-n_{e})=4} total number of agreements: n r 2 + n e 2 = 17 {\displaystyle n_{r}^{2}+n_{e}^{2}=17} total number of disagreements: n r ( n − n r ) + n e ( n − n e ) = n 2 − ( n r 2 + n e 2 ) = 8 {\displaystyle n_{r}(n-n_{r})+n_{e}(n-n_{e})=n^{2}-(n_{r}^{2}+n_{e}^{2})=8} uncertainty or fraction of disagreement: 8 / 25 = .32 {\displaystyle 8/25=.32} Behavior can only be predicted for a given context; in this example, let us predict the outcome for the context "3 1 2". To do this, we first find all of the contexts containing the given context; these contexts are called supracontexts. We find the supracontexts by systematically eliminating the variables in the given context; with m variables, there will generally be 2 m {\displaystyle 2^{m}} supracontexts. The following table lists each of the sub- and supracontexts; x means "not x", and - means "anything". These contexts are shown in the venn diagram below: The next step is to determine which exemplars belong to which contexts in order to determine which of the contexts are homogeneous. The table below shows each of the subcontexts, their behavior in terms of the given exemplars, and the number of disagreements within the behavior: Analyzing the subcontexts in the table above, we see that there is only 1 subcontext with any disagreements: "3 1 2", which in the dataset consists of "3 1 0 e" and "3 1 1 r". There are 2 disagreements in this subcontext; 1 pointing from each of the exemplars to the other (see the pointer network pictured above). Therefore, only supracontexts containing this subcontext will contain any disagreements. We use a simple rule to identify the homogeneous supraco

    Read more →
  • Speculative decoding

    Speculative decoding

    Speculative decoding is an inference-time optimization for autoregressive large language models (LLMs) that generates multiple tokens per decoding step instead of one. A smaller draft model proposes a sequence of candidate tokens, and the larger target model verifies them in a single forward pass through a modified rejection sampling scheme. The verification preserves the target model's original output distribution, so the technique produces the same results as standard decoding while cutting latency by roughly two to three times. The name is an analogy to speculative execution in CPU design, where a processor runs instructions along a predicted branch before the outcome is known. == Background == Standard autoregressive decoding in large language models generates one token at a time. The model computes a probability distribution over its vocabulary, samples the next token, and feeds that token back as input. For large models, this process is bottlenecked by memory bandwidth rather than arithmetic throughput: loading the model's parameters from high-bandwidth memory (HBM) to the processor takes up most of the wall-clock time at each step. Because of this, a forward pass over one token and a forward pass over several tokens in a batch take roughly the same time. Speculative decoding relies on this property. == Mechanism == The technique alternates between two phases: drafting and verification. During drafting, a fast approximation model generates a short run of K candidate tokens, typically between 3 and 12. The draft model is usually a much smaller version of the target model or a lightweight auxiliary network. During verification, the target model scores the entire draft sequence in one batched forward pass. A modified rejection sampling algorithm compares the draft and target probabilities at each position. If the target model would have been at least as likely to produce a given token, that token is accepted; the first token that fails is resampled from a corrected distribution, and everything after it is thrown out. The result is that the output distribution is the same as if each token had been generated one at a time. How many tokens get accepted per cycle depends on how well the draft model matches the target. For common words and predictable continuations the match tends to be good, so the target model can confirm several tokens at once. == History == An early precursor was blockwise parallel decoding, proposed in 2018 by Stern, Shazeer, and Uszkoreit. Their method predicted multiple future tokens through auxiliary prediction heads and validated them against the autoregressive model, but it only worked with greedy decoding and did not preserve the full sampling distribution. The modern form of the technique came from Yaniv Leviathan, Matan Kalman, and Yossi Matias at Google Research, who posted "Fast Inference from Transformers via Speculative Decoding" on arXiv in November 2022. Separately and at about the same time, Charlie Chen and colleagues at DeepMind arrived at a closely related method they called speculative sampling, published in February 2023. Both papers introduced the use of rejection sampling to guarantee that the output distribution is unchanged. Leviathan et al. showed roughly 2–3x speedup on T5-XXL (11 billion parameters); Chen et al. reported 2–2.5x on the Chinchilla model (70 billion parameters). The Leviathan et al. paper was presented as an oral at the International Conference on Machine Learning in July 2023. == Variants == SpecInfer (Miao et al., 2024) uses multiple small language models to jointly build a tree of candidate continuations rather than a single chain. The target model verifies the whole tree in parallel and keeps the longest valid path, with reported speedups of 1.5–3.5x. Medusa (Cai et al., 2024) takes a different approach by not using a separate draft model at all. Extra lightweight decoding heads are attached to the target model itself, and each one predicts a token at a different future position. The candidates are evaluated through a tree-structured attention mechanism. The authors measured 2.2–3.6x speedup. EAGLE (Li et al., 2024) performs autoregression on the target model's internal feature representations (specifically the second-to-top layer) rather than on tokens directly. On LLaMA 2 Chat 70B, this gave a 2.7–3.5x latency reduction. Later versions added dynamic draft trees (EAGLE-2) and further optimizations (EAGLE-3), reaching 3–6.5x speedup. == Adoption == By 2024, speculative decoding had become a standard part of production LLM serving. Google uses it in the AI Overviews feature of Google Search. Open-source inference frameworks such as vLLM, NVIDIA's TensorRT-LLM, and SGLang all include built-in support for speculative decoding and its variants. Apple, AWS, and Meta have also published research extending the method or deploying it at scale.

    Read more →
  • Multiple discriminant analysis

    Multiple discriminant analysis

    Multiple Discriminant Analysis (MDA) is a multivariate dimensionality reduction technique. It has been used to predict signals as diverse as neural memory traces and corporate failure. MDA is not directly used to perform classification. It merely supports classification by yielding a compressed signal amenable to classification. The method described in Duda et al. (2001) §3.8.3 projects the multivariate signal down to an M−1 dimensional space where M is the number of categories. MDA is useful because most classifiers are strongly affected by the curse of dimensionality. In other words, when signals are represented in very-high-dimensional spaces, the classifier's performance is catastrophically impaired by the overfitting problem. This problem is reduced by compressing the signal down to a lower-dimensional space as MDA does. MDA has been used to reveal neural codes.

    Read more →
  • VITAL (machine learning software)

    VITAL (machine learning software)

    VITAL (Validating Investment Tool for Advancing Life Sciences) was a Board Management Software machine learning proprietary software developed by Aging Analytics, a company registered in Bristol (England) and dissolved in 2017. Andrew Garazha (the firm's Senior Analyst) declared that the project aimed "through iterative releases and updates to create a piece of software capable of making autonomous investment decisions." According to Nick Dyer-Witheford, VITAL 1.0 was a "basic algorithm". On 13 May 2014, Deep Knowledge Ventures, a Hong Kong venture capital firm, claimed to have appointed VITAL to its board of directors in order to prove that artificial intelligence could be an instrument for investment decision-making. The announcement received great press coverage despite the fact commentators consider this a publicity stunt. Fortune reported in 2019 that VITAL is no longer used. == Criticism == Academics and journalists viewed VITAL's board appointment with skepticism. University of Sheffield computer science professor Noel Sharkey called it "a publicity hype". Michael Osborne, a University of Oxford associate professor in machine learning, found it is "a gimmick to call that an actual board member". Simon Sharwood of The Register, wrote there is "a strong whiff of stunt and/or promotion about this". In a 2019 speech, the Chief Scientist of Australia, Alan Finkel, commented, "At the time, most of us probably dismissed Vital as a PR exercise. I admit, I used her story three years ago to get a laugh in one of my speeches." Florian Möslein, a law professor at the University of Marburg, wrote in 2018 that "Vital has widely been acknowledged as the 'world's first artificial intelligence company director'". Vice journalist Jason Koebler suggested that the software did not have any article intelligence capabilities and concluded "VITAL can’t talk, and it can’t hear, and it can’t be a real, functional executive of a company." Sharwood of The Register noted that because VITAL was not a natural person, it could not be a board member under Hong Kong's corporate governance laws. However, in a 2017 interview to The Nikkei, Dmitry Kaminskiy, managing partner of Deep Knowledge Ventures, stated that VITAL had observer status on the board and no voting rights. University of Sheffield computer science professor Noel Sharkey said of VITAL, "On first sight, it looks like a futuristic idea but on reflection it is really a little bit of publicity hype." Vice journalist Jason Koebler said "this is a gimmick" and said "There is literally nothing to suggest that VITAL has any sort of capabilities beyond any other proprietary analysis software". Michael Osborne, a University of Oxford associate professor in machine learning, found VITAL's appointment to be noncredible, saying it is "a bit of a gimmick to call that an actual board member". Osborne said that a core duty of board members to converse with each other, which the algorithm is incapable of doing, so its more likely functionality is to serve as a springboard for conversation among other board members. In a 2019 speech, the Chief Scientist of Australia, Alan Finkel, commented, "At the time, most of us probably dismissed Vital as a PR exercise. I admit, I used her story three years ago to get a laugh in one of my speeches." == Machine intelligence as board member == VITAL was created by a group of programmers employed by Aging Analytics According to Andrew Garazh, Aging Analytics Senior Analyst, VITAL was not a machine learning algorithm as the necessary datasets on investment rounds, intellectual property and clinical trial outcomes are generally not disclosed. Rather, VITAL used fuzzy logic based on 50 parameters to assess risk factors. Aging Analytics licensed the software to Deep Knowledge Ventures. It was used to help the human board members of Deep Knowledge Venture make investment decisions in biotechnology companies. For instance, it supported investments in Insilico Medicine, which creates ways for computers to help find drugs in research into aging. VITAL also supported investing in Pathway Pharmaceuticals, which uses the OncoFinder algorithm to choose and appraise cancer treatments. According to Dmitry Kaminskiy, managing partner of Deep Knowledge Ventures, the motivation for using VITAL was the large number of failed investments in the biotechnology sector and the desire to avoid investing in companies likely to fail. == Ethical and legal implications == Scholars addressed questions around the safety, privacy, accountability transparency and bias in algorithms. Writing in the philosophical journal Multitudes, the academic Ariel Kyrou raised questions about the consequences of a mistake made by an algorithm recommending a dangerous investment. He raised the hypothetical where VITAL was able to persuade the board to invest in a startup that had the facade of doing research into treatment for age-associated ills, but in actuality was run by terrorists who were raising funds. Kyrou raised a series of questions about who society would fault for VITAL's mistake. As the owner of VITAL, should Deep Knowledge Ventures be held accountable, or rather should the companies that supplied data to VITAL or the people who created VITAL be held liable? Simon Sharwood of The Register wrote that because the appointment of a software program to the board directors is not legally feasible in Hong Kong, there is "a strong whiff of stunt and/or promotion about this". Quoting a Thomson Reuters website describing Hong Kong legislation related to corporate governance, Sharwood pointed out that in Hong Kong "the board comprises all of the directors of the company" and "a director must normally be a natural person, except that a private company may have a body corporate as its director if the company is not a member of a listed group." He concluded that since VITAL cannot be considered a "natural person", it is merely a "cosmetic" appointment to the board and that "this software is no more a Board member than Caligula's horse was a senator". Sharwood further argued that corporations frequently purchase directors and officers liability insurance but that it would be practically impossible to get such insurance for VITAL. Sharwood also wrote that were VITAL to be hacked, any misinformation it outputs could be considered "false and misleading communications". In the book Research Handbook on the Law of Artificial Intelligence, Florian Mölein wrote that VITAL could not become a director as defined in Hong Kong's corporate laws, so the other directors just were approaching it as "a member of [the] board with observer status". Lin Shaowei raised concerns in a Journal of East China University of Political Science and Law article about how the software's appearance inspired a complex question about the relationship between corporate law and artificial intelligence. VITAL could be considered either a board director who has voting rights or an observer who does not. Lin said either choice raised questions about whether VITAL is subject to corporate law and who would be held accountable if VITAL recommends a choice that turns out to be damaging to the company. David Theo Goldberg in the Critical Times, a peer reviewed journal in Critical Global Theory, argues that VITAL processed a dataset to predict the most remunerative investment opportunities. Drawing his analysis on an article from Business Insider, Goldberg describes VITAL's decision-making predictiveness based "on surface pattern recognition and the identification of regularities and/or irregularities". In other words, Goldberg asserts that "the normativity of the surface" explains algorithmic knowledge of a "product" like VITAL. In Homo Deus, Yuval Noah Harari mentions VITAL as an example of the future risks that humankind faces. Harari argues that the human mind is being replaced by a world in which algorithms and data make the decisions. Specifically, it is argued that "as algorithms push humans out of the job market," executive boards driven by artificial intelligence are more likely to give priority to algorithms over the humans.

    Read more →
  • Bayesian network

    Bayesian network

    A Bayesian network (also known as a Bayes network, Bayes net, belief network, or decision network) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases. Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. == Graphical model == Formally, Bayesian networks are directed acyclic graphs (DAGs) whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses. Each edge represents a direct conditional dependency. Any pair of nodes that are not connected (i.e. no path connects one node to the other) represent variables that are conditionally independent of each other. Each node is associated with a probability function that takes, as input, a particular set of values for the node's parent variables, and gives (as output) the probability (or probability distribution, if applicable) of the variable represented by the node. For example, if m {\displaystyle m} parent nodes represent m {\displaystyle m} Boolean variables, then the probability function could be represented by a table of 2 m {\displaystyle 2^{m}} entries, one entry for each of the 2 m {\displaystyle 2^{m}} possible parent combinations. Similar ideas may be applied to undirected, and possibly cyclic, graphs such as Markov networks. == Example == Suppose we want to model the dependencies between three variables: the sprinkler (or more appropriately, its state - whether it is on or not), the presence or absence of rain and whether the grass is wet or not. Observe that two events can cause the grass to become wet: an active sprinkler or rain. Rain has a direct effect on the use of the sprinkler (namely that when it rains, the sprinkler usually is not active). This situation can be modeled with a Bayesian network (shown to the right). Each variable has two possible values, T (for true) and F (for false). The joint probability function is, by the chain rule of probability, Pr ( G , S , R ) = Pr ( G ∣ S , R ) Pr ( S ∣ R ) Pr ( R ) {\displaystyle \Pr(G,S,R)=\Pr(G\mid S,R)\Pr(S\mid R)\Pr(R)} where G = "Grass wet (true/false)", S = "Sprinkler turned on (true/false)", and R = "Raining (true/false)". The model can answer questions about the presence of a cause given the presence of an effect (so-called inverse probability) like "What is the probability that it is raining, given the grass is wet?" by using the conditional probability formula and summing over all nuisance variables: Pr ( R = T ∣ G = T ) = Pr ( G = T , R = T ) Pr ( G = T ) = ∑ x ∈ { T , F } Pr ( G = T , S = x , R = T ) ∑ x , y ∈ { T , F } Pr ( G = T , S = x , R = y ) {\displaystyle \Pr(R=T\mid G=T)={\frac {\Pr(G=T,R=T)}{\Pr(G=T)}}={\frac {\sum _{x\in \{T,F\}}\Pr(G=T,S=x,R=T)}{\sum _{x,y\in \{T,F\}}\Pr(G=T,S=x,R=y)}}} Using the expansion for the joint probability function Pr ( G , S , R ) {\displaystyle \Pr(G,S,R)} and the conditional probabilities from the conditional probability tables (CPTs) stated in the diagram, one can evaluate each term in the sums in the numerator and denominator. For example, Pr ( G = T , S = T , R = T ) = Pr ( G = T ∣ S = T , R = T ) Pr ( S = T ∣ R = T ) Pr ( R = T ) = 0.99 × 0.01 × 0.2 = 0.00198. {\displaystyle {\begin{aligned}\Pr(G=T,S=T,R=T)&=\Pr(G=T\mid S=T,R=T)\Pr(S=T\mid R=T)\Pr(R=T)\\&=0.99\times 0.01\times 0.2\\&=0.00198.\end{aligned}}} Then the numerical results (subscripted by the associated variable values) are Pr ( R = T ∣ G = T ) = 0.00198 T T T + 0.1584 T F T 0.00198 T T T + 0.288 T T F + 0.1584 T F T + 0.0 T F F = 891 2491 ≈ 35.77 % . {\displaystyle \Pr(R=T\mid G=T)={\frac {0.00198_{TTT}+0.1584_{TFT}}{0.00198_{TTT}+0.288_{TTF}+0.1584_{TFT}+0.0_{TFF}}}={\frac {891}{2491}}\approx 35.77\%.} To answer an interventional question, such as "What is the probability that it would rain, given that we wet the grass?" the answer is governed by the post-intervention joint distribution function Pr ( S , R ∣ do ( G = T ) ) = Pr ( S ∣ R ) Pr ( R ) {\displaystyle \Pr(S,R\mid {\text{do}}(G=T))=\Pr(S\mid R)\Pr(R)} obtained by removing the factor Pr ( G ∣ S , R ) {\displaystyle \Pr(G\mid S,R)} from the pre-intervention distribution. The do operator forces the value of G to be true. The probability of rain is unaffected by the action: Pr ( R ∣ do ( G = T ) ) = Pr ( R ) . {\displaystyle \Pr(R\mid {\text{do}}(G=T))=\Pr(R).} To predict the impact of turning the sprinkler on: Pr ( R , G ∣ do ( S = T ) ) = Pr ( R ) Pr ( G ∣ R , S = T ) {\displaystyle \Pr(R,G\mid {\text{do}}(S=T))=\Pr(R)\Pr(G\mid R,S=T)} with the term Pr ( S = T ∣ R ) {\displaystyle \Pr(S=T\mid R)} removed, showing that the action affects the grass but not the rain. These predictions may not be feasible given unobserved variables, as in most policy evaluation problems. The effect of the action do ( x ) {\displaystyle {\text{do}}(x)} can still be predicted, however, whenever the back-door criterion is satisfied. It states that, if a set Z of nodes can be observed that d-separates (or blocks) all back-door paths from X to Y then Pr ( Y , Z ∣ do ( x ) ) = Pr ( Y , Z , X = x ) Pr ( X = x ∣ Z ) . {\displaystyle \Pr(Y,Z\mid {\text{do}}(x))={\frac {\Pr(Y,Z,X=x)}{\Pr(X=x\mid Z)}}.} A back-door path is one that ends with an arrow into X. Sets that satisfy the back-door criterion are called "sufficient" or "admissible." For example, the set Z = R is admissible for predicting the effect of S = T on G, because R d-separates the (only) back-door path S ← R → G. However, if S is not observed, no other set d-separates this path and the effect of turning the sprinkler on (S = T) on the grass (G) cannot be predicted from passive observations. In that case P(G | do(S = T)) is not "identified". This reflects the fact that, lacking interventional data, the observed dependence between S and G is due to a causal connection or is spurious (apparent dependence arising from a common cause, R). (see Simpson's paradox) To determine whether a causal relation is identified from an arbitrary Bayesian network with unobserved variables, one can use the three rules of "do-calculus" and test whether all do terms can be removed from the expression of that relation, thus confirming that the desired quantity is estimable from frequency data. Using a Bayesian network can save considerable amounts of memory over exhaustive probability tables, if the dependencies in the joint distribution are sparse. For example, a naive way of storing the conditional probabilities of 10 two-valued variables as a table requires storage space for 2 10 = 1024 {\displaystyle 2^{10}=1024} values. If no variable's local distribution depends on more than three parent variables, the Bayesian network representation stores at most 10 ⋅ 2 3 = 80 {\displaystyle 10\cdot 2^{3}=80} values. One advantage of Bayesian networks is that it is intuitively easier for a human to understand (a sparse set of) direct dependencies and local distributions than complete joint distributions. == Inference and learning == Bayesian networks perform three main inference tasks: Inferring unobserved variables Parameter learning for the probability distributions of each node in the network Structure learning of the graphical network === Inferring unobserved variables === Because a Bayesian network is a complete model for its variables and their relationships, it can be used to answer probabilistic queries about them. For example, the network can be used to update knowledge of the state of a subset of variables when other variables (the evidence variables) are observed. This process of computing the posterior distribution of variables given evidence is called probabilistic inference. The posterior gives a universal sufficient statistic for detection applications, when choosing values for the variable subset that minimize some expected loss function, for instance the probability of decision error. A Bayesian network can thus be considered a mechanism for automatically applying Bayes' theorem to complex problems. The most common exact inference methods are: variable elimination, which eliminates (by integration or summation) the non-observed non-query variables one by one by distributing the sum over the prod

    Read more →
  • Winner-take-all in action selection

    Winner-take-all in action selection

    Winner-take-all is a computer science concept that has been widely applied in behavior-based robotics as a method of action selection for intelligent agents. Winner-take-all systems work by connecting modules (task-designated areas) in such a way that when one action is performed it stops all other actions from being performed, so only one action is occurring at a time. The name comes from the idea that the "winner" action takes all of the motor system's power. == History == In the 1980s and 1990s, many roboticists and cognitive scientists were attempting to find speedier and more efficient alternatives to the traditional world modeling method of action selection. In 1982, Jerome A. Feldman and D.H. Ballard published the "Connectionist Models and Their Properties", referencing and explaining winner-take-all as a method of action selection. Feldman's architecture functioned on the simple rule that in a network of interconnected action modules, each module will set its own output to zero if it reads a higher input than its own in any other module. In 1986, Rodney Brooks introduced behavior-based artificial intelligence. Winner-take-all architectures for action selection soon became a common feature of behavior-based robots, because selection occurred at the level of the action modules (bottom-up) rather than at a separate cognitive level (top-down), producing a tight coupling of stimulus and reaction. == Types of winner-take-all architectures == === Hierarchy === In the hierarchical architecture, actions or behaviors are programmed in a high-to-low priority list, with inhibitory connections between all the action modules. The agent performs low-priority behaviors until a higher-priority behavior is stimulated, at which point the higher behavior inhibits all other behaviors and takes over the motor system completely. Prioritized behaviors are usually key to the immediate survival of the agent, while behaviors of lower priority are less time-sensitive. For example, "run away from predator" would be ranked above "sleep." While this architecture allows for clear programming of goals, many roboticists have moved away from the hierarchy because of its inflexibility. === Heterarchy and fully distributed === In the heterarchy and fully distributed architecture, each behavior has a set of pre-conditions to be met before it can be performed, and a set of post-conditions that will be true after the action has been performed. These pre- and post-conditions determine the order in which behaviors must be performed and are used to causally connect action modules. This enables each module to receive input from other modules as well as from the sensors, so modules can recruit each other. For example, if the agent's goal were to reduce thirst, the behavior "drink" would require the pre-condition of having water available, so the module would activate the module in charge of "find water". The activations organize the behaviors into a sequence, even though only one action is performed at a time. The distribution of larger behaviors across modules makes this system flexible and robust to noise. Some critics of this model hold that any existing set of division rules for the predecessor and conflictor connections between modules produce sub-par action selection. In addition, the feedback loop used in the model can in some circumstances lead to improper action selection. === Arbiter and centrally coordinated === In the arbiter and centrally coordinated architecture, the action modules are not connected to each other but to a central arbiter. When behaviors are triggered, they begin "voting" by sending signals to the arbiter, and the behavior with the highest number of votes is selected. In these systems, bias is created through the "voting weight", or how often a module is allowed to vote. Some arbiter systems take a different spin on this type of winner-take-all by using a "compromise" feature in the arbiter. Each module is able to vote for or against each smaller action in a set of actions, and the arbiter selects the action with the most votes, meaning that it benefits the most behavior modules. This can be seen as violating the general rule against creating representations of the world in behavior-based AI, established by Brooks. By performing command fusion, the system is creating a larger composite pool of knowledge than is obtained from the sensors alone, forming a composite inner representation of the environment. Defenders of these systems argue that forbidding world-modeling puts unnecessary constraints on behavior-based robotics, and that agents benefits from forming representations and can still remain reactive.

    Read more →
  • Kubeflow

    Kubeflow

    Kubeflow is an open-source platform for machine learning and MLOps on Kubernetes introduced by Google. The different stages in a typical machine learning lifecycle are represented with different software components in Kubeflow, including model development (Kubeflow Notebooks), model training (Kubeflow Pipelines, Kubeflow Training Operator), model serving (KServe), and automated machine learning (Katib). Each component of Kubeflow can be deployed separately, and it is not a requirement to deploy every component. == History == The Kubeflow project was first announced at KubeCon + CloudNativeCon North America 2017 by Google engineers David Aronchick, Jeremy Lewi, and Vishnu Kannan to address a perceived lack of flexible options for building production-ready machine learning systems. The project has also stated it began as a way for Google to open-source how they ran TensorFlow internally. The first release of Kubeflow (Kubeflow 0.1) was announced at KubeCon + CloudNativeCon Europe 2018. Kubeflow 1.0 was released in March 2020 via a public blog post announcing that many Kubeflow components were graduating to a "stable status", indicating they were now ready for production usage. In October 2022, Google announced that the Kubeflow project had applied to join the Cloud Native Computing Foundation. In July 2023, the foundation voted to accept Kubeflow as an incubating stage project. == Components == === Kubeflow Notebooks for model development === Machine learning models are developed in the notebooks component called Kubeflow Notebooks. The component runs web-based development environments inside a Kubernetes cluster, with native support for Jupyter Notebook, Visual Studio Code, and RStudio. === Kubeflow Pipelines for model training === Once developed, models are trained in the Kubeflow Pipelines component. The component acts as a platform for building and deploying portable, scalable machine learning workflows based on Docker containers. Google Cloud Platform has adopted the Kubeflow Pipelines DSL within its Vertex AI Pipelines product. === Kubeflow Training Operator for model training === For certain machine learning models and libraries, the Kubeflow Training Operator component provides Kubernetes custom resources support. The component runs distributed or non-distributed TensorFlow, PyTorch, Apache MXNet, XGBoost, and MPI training jobs on Kubernetes. === KServe for model serving === The KServe component (previously named KFServing) provides Kubernetes custom resources for serving machine learning models on arbitrary frameworks including TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX. KServe was developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon. Publicly disclosed adopters of KServe include Bloomberg, Gojek, the Wikimedia Foundation, and others. === Katib for automated machine learning === Lastly, Kubeflow includes a component for automated training and development of machine learning models, the Katib component. It is described as a Kubernetes-native project and features hyperparameter tuning, early stopping, and neural architecture search. == Release timeline ==

    Read more →
  • Frequent pattern discovery

    Frequent pattern discovery

    Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of finding the most frequent and relevant patterns in large datasets. The concept was first introduced for mining transaction databases. Frequent patterns are defined as subsets (itemsets, subsequences, or substructures) that appear in a data set with frequency no less than a user-specified or auto-determined threshold. == Techniques == Techniques for FP mining include: market basket analysis cross-marketing catalog design clustering classification recommendation systems For the most part, FP discovery can be done using association rule learning with particular algorithms Eclat, FP-growth and the Apriori algorithm. Other strategies include: Frequent subtree mining Structure mining Sequential pattern mining and respective specific techniques. Implementations exist for various machine learning systems or modules like MLlib for Apache Spark.

    Read more →
  • Triplet loss

    Triplet loss

    Triplet loss is a machine learning loss function widely used in one-shot learning, a setting where models are trained to generalize effectively from limited examples. It was conceived by Google researchers for their prominent FaceNet algorithm for face detection. Triplet loss is designed to support metric learning. Namely, to assist training models to learn an embedding (mapping to a feature space) where similar data points are closer together and dissimilar ones are farther apart, enabling robust discrimination across varied conditions. In the context of face detection, data points correspond to images. == Definition == The loss function is defined using triplets of training points of the form ( A , P , N ) {\displaystyle (A,P,N)} . In each triplet, A {\displaystyle A} (called an "anchor point") denotes a reference point of a particular identity, P {\displaystyle P} (called a "positive point") denotes another point of the same identity in point A {\displaystyle A} , and N {\displaystyle N} (called a "negative point") denotes a point of an identity different from the identity in point A {\displaystyle A} and P {\displaystyle P} . Let x {\displaystyle x} be some point and let f ( x ) {\displaystyle f(x)} be the embedding of x {\displaystyle x} in the finite-dimensional Euclidean space. It shall be assumed that the L2-norm of f ( x ) {\displaystyle f(x)} is unity (the L2 norm of a vector X {\displaystyle X} in a finite dimensional Euclidean space is denoted by ‖ X ‖ {\displaystyle \Vert X\Vert } .) We assemble m {\displaystyle m} triplets of points from the training dataset. The goal of training here is to ensure that, after learning, the following condition (called the "triplet constraint") is satisfied by all triplets ( A ( i ) , P ( i ) , N ( i ) ) {\displaystyle (A^{(i)},P^{(i)},N^{(i)})} in the training data set: ‖ f ( A ( i ) ) − f ( P ( i ) ) ‖ 2 2 + α < ‖ f ( A ( i ) ) − f ( N ( i ) ) ‖ 2 2 {\displaystyle \Vert f(A^{(i)})-f(P^{(i)})\Vert _{2}^{2}+\alpha <\Vert f(A^{(i)})-f(N^{(i)})\Vert _{2}^{2}} The variable α {\displaystyle \alpha } is a hyperparameter called the margin, and its value must be set manually. In the FaceNet system, its value was set as 0.2. Thus, the full form of the function to be minimized is the following: L = ∑ i = 1 m max ( ‖ f ( A ( i ) ) − f ( P ( i ) ) ‖ 2 2 − ‖ f ( A ( i ) ) − f ( N ( i ) ) ‖ 2 2 + α , 0 ) {\displaystyle L=\sum _{i=1}^{m}\max {\Big (}\Vert f(A^{(i)})-f(P^{(i)})\Vert _{2}^{2}-\Vert f(A^{(i)})-f(N^{(i)})\Vert _{2}^{2}+\alpha ,0{\Big )}} == Intuition == A baseline for understanding the effectiveness of triplet loss is the contrastive loss, which operates on pairs of samples (rather than triplets). Training with the contrastive loss pulls embeddings of similar pairs closer together, and pushes dissimilar pairs apart. Its pairwise approach is greedy, as it considers each pair in isolation. Triplet loss innovates by considering relative distances. Its goal is that the embedding of an anchor (query) point be closer to positive points than to negative points (also accounting for the margin). It does not try to further optimize the distances once this requirement is met. This is approximated by simultaneously considering two pairs (anchor-positive and anchor-negative), rather than each pair in isolation. == Triplet "mining" == One crucial implementation detail when training with triplet loss is triplet "mining", which focuses on the smart selection of triplets for optimization. This process adds an additional layer of complexity compared to contrastive loss. A naive approach to preparing training data for the triplet loss involves randomly selecting triplets from the dataset. In general, the set of valid triplets of the form ( A ( i ) , P ( i ) , N ( i ) ) {\displaystyle (A^{(i)},P^{(i)},N^{(i)})} is very large. To speed-up training convergence, it is essential to focus on challenging triplets. In the FaceNet paper, several options were explored, eventually arriving at the following. For each anchor-positive pair, the algorithm considers only semi-hard negatives. These are negatives that violate the triplet requirement (i.e, are "hard"), but lie farther from the anchor than the positive (not too hard). Restated, for each A ( i ) {\displaystyle A^{(i)}} and P ( i ) {\displaystyle P^{(i)}} , they seek N ( i ) {\displaystyle N^{(i)}} such that: The rationale for this design choice is heuristic. It may appear puzzling that the mining process neglects "very hard" negatives (i.e., closer to the anchor than the positive). Experiments conducted by the FaceNet designers found that this often leads to a convergence to degenerate local minima. Triplet mining is performed at each training step, from within the sample points contained in the training batch (this is known as online mining), after embeddings were computed for all points in the batch. While ideally the entire dataset could be used, this is impractical in general. To support a large search space for triplets, the FaceNet authors used very large batches (1800 samples). Batches are constructed by selecting a large number of same-category sample points (40), and randomly selected negatives for them. == Extensions == Triplet loss has been extended to simultaneously maintain a series of distance orders by optimizing a continuous relevance degree with a chain (i.e., ladder) of distance inequalities. This leads to the Ladder Loss, which has been demonstrated to offer performance enhancements of visual-semantic embedding in learning to rank tasks. In Natural Language Processing, triplet loss is one of the loss functions considered for BERT fine-tuning in the SBERT architecture. Other extensions involve specifying multiple negatives (multiple negatives ranking loss).

    Read more →
  • CodeCheck

    CodeCheck

    CodeCheck is a mobile app that provides consumers with information about the ingredients in cosmetic products, as well as the ingredients and nutritional values of food. Users can access this information by scanning the product’s barcode with a smartphone or by using a text-based search. The app is available for iOS and Android devices in Germany, Austria, Switzerland, the United Kingdom, the United States, and the Netherlands. == History == CodeCheck was founded in 2010 as an association, online database, and app by Roman Bleichenbacher, who was then a student in Zurich. A website of the same name had already been launched in 2002, where users could enter information about ingredients, nutritional values, and manufacturers of products. The first round of financing took place in July 2014 and raised over 1.1 million Swiss francs, which coincided with the founding of CodeCheck AG. Investors included Doodle founders Myke Näf and Paul E. Sevinç. The company subsequently expanded to Austria and Germany. In the same year, Boris Manhart became CEO. CodeCheck GmbH was established in Berlin in 2016. The app became available in the United States in 2017 and in the United Kingdom in November 2019. In 2020, it was also launched in the Netherlands. Following insolvency proceedings, the app has been owned by Producto Check GmbH since 2022. == Functions == The app can be used to scan the barcode of food and cosmetic products. It then displays information about ingredients, nutritional values, manufacturers and certification labels. For many years, users were able to enter and edit product information themselves and indicate advantages and disadvantages of individual products. Since 2020, the app has placed greater emphasis on machine text recognition. The collected data is combined with substance ratings using an algorithm. These ratings are based on scientific studies and expert assessments, including those from the Consumer Advice Centre in Hamburg, Greenpeace, the WWF and the German Association for the Environment and Nature Conservation (BUND e. V.), and cannot be modified by users or manufacturers. The app also provides information on the sugar and fat content of food products. In addition, it indicates whether a product contains hormone-active substances, microplastics, palm oil, animal-derived ingredients, lactose or gluten. Since 2020, the app has displayed a climate score for food products in cooperation with the Eaternity Institute. == Financing == CodeCheck is primarily financed through native advertising and banner ads. Since 2018, the company has also offered analysis services and survey tools directly to fast-moving consumer goods (FMCG) manufacturers. In addition, access to the API is available, enabling other companies to use the product database. With the introduction of a subscription model in 2019, the CodeCheck app can be used ad-free and in offline mode. Since 2021, CodeCheck has also offered its own “Green Label” certification for manufacturers. Products are certified if at least 90 percent of their ingredients are classified as harmless. == Awards == In May 2015, the app topped the download charts for the first time, reaching 2.3 million installations. By September 2019, the app had once again reached the top of the German app charts, surpassing five million downloads.

    Read more →
  • Facial recognition system

    Facial recognition system

    A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image. Development on similar systems began in the 1960s as a form of computer application. Since their inception, facial recognition systems have seen wider uses in recent times on smartphones and in other forms of technology, such as robotics. Because computerized facial recognition involves the measurement of a human's physiological characteristics, facial recognition systems are categorized as biometrics. Although the accuracy of facial recognition systems as a biometric technology is lower than iris recognition, fingerprint image acquisition, palm recognition or voice recognition, it is widely adopted due to its contactless process. Facial recognition systems have been deployed in advanced human–computer interaction, video surveillance, law enforcement, passenger screening, decisions on employment and housing, and automatic indexing of images. Facial recognition systems are employed throughout the world today by governments and private companies. Their effectiveness varies, and some systems have previously been scrapped because of their ineffectiveness. The use of facial recognition systems has also raised controversy, with claims that the systems violate citizens' privacy, commonly make incorrect identifications, encourage gender norms and racial profiling, and do not protect important biometric data. The appearance of synthetic media such as deepfakes has also raised concerns about its security. These claims have led to the ban of facial recognition systems in several cities in the United States. Growing societal concerns led social networking company Meta Platforms to shut down its Facebook facial recognition system in 2021, deleting the face-scan data of more than one billion users. The change represented one of the largest shifts in facial recognition usage in the technology's history. IBM also stopped offering facial recognition technology due to similar concerns. == History of facial recognition technology == Automated facial recognition was pioneered in the 1960s by Woody Bledsoe, Helen Chan Wolf, and Charles Bisson, whose work focused on teaching computers to recognize human faces. Their early facial recognition project was dubbed "man-machine" because a human first needed to establish the coordinates of facial features in a photograph before they could be used by a computer for recognition. Using a graphics tablet, a human would pinpoint facial features coordinates, such as the pupil centers, the inside and outside corners of eyes, and the widows peak in the hairline. The coordinates were used to calculate 20 individual distances, including the width of the mouth and of the eyes. A human could process about 40 pictures an hour, building a database of these computed distances. A computer would then automatically compare the distances for each photograph, calculate the difference between the distances, and return the closed records as a possible match. In 1970, Takeo Kanade publicly demonstrated a face-matching system that located anatomical features such as the chin and calculated the distance ratio between facial features without human intervention. Later tests revealed that the system could not always reliably identify facial features. Nonetheless, interest in the subject grew and in 1977 Kanade published the first detailed book on facial recognition technology. In 1993, the Defense Advanced Research Project Agency (DARPA) and the Army Research Laboratory (ARL) established the face recognition technology program FERET to develop "automatic face recognition capabilities" that could be employed in a productive real life environment "to assist security, intelligence, and law enforcement personnel in the performance of their duties." Face recognition systems that had been trialled in research labs were evaluated. The FERET tests found that while the performance of existing automated facial recognition systems varied, a handful of existing methods could viably be used to recognize faces in still images taken in a controlled environment. The FERET tests spawned three US companies that sold automated facial recognition systems. Vision Corporation and Miros Inc were founded in 1994, by researchers who used the results of the FERET tests as a selling point. Viisage Technology was established by an identification card defense contractor in 1996 to commercially exploit the rights to the facial recognition algorithm developed by Alex Pentland at MIT. Following the 1993 FERET face-recognition vendor test, the Department of Motor Vehicles (DMV) offices in West Virginia and New Mexico became the first DMV offices to use automated facial recognition systems to prevent people from obtaining multiple driving licenses using different names. Driver's licenses in the United States were at that point a commonly accepted form of photo identification. DMV offices across the United States were undergoing a technological upgrade and were in the process of establishing databases of digital ID photographs. This enabled DMV offices to deploy the facial recognition systems on the market to search photographs for new driving licenses against the existing DMV database. DMV offices became one of the first major markets for automated facial recognition technology and introduced US citizens to facial recognition as a standard method of identification. The increase of the US prison population in the 1990s prompted U.S. states to established connected and automated identification systems that incorporated digital biometric databases, in some instances this included facial recognition. In 1999, Minnesota incorporated the facial recognition system FaceIT by Visionics into a mug shot booking system that allowed police, judges and court officers to track criminals across the state. Until the 1990s, facial recognition systems were developed primarily by using photographic portraits of human faces. Research on face recognition to reliably locate a face in an image that contains other objects gained traction in the early 1990s with the principal component analysis (PCA). The PCA method of face detection is also known as Eigenface and was developed by Matthew Turk and Alex Pentland. Turk and Pentland combined the conceptual approach of the Karhunen–Loève theorem and factor analysis, to develop a linear model. Eigenfaces are determined based on global and orthogonal features in human faces. A human face is calculated as a weighted combination of a number of Eigenfaces. Because few Eigenfaces were used to encode human faces of a given population, Turk and Pentland's PCA face detection method greatly reduced the amount of data that had to be processed to detect a face. Pentland in 1994 defined Eigenface features, including eigen eyes, eigen mouths and eigen noses, to advance the use of PCA in facial recognition. In 1997, the PCA Eigenface method of face recognition was improved upon using linear discriminant analysis (LDA) to produce Fisherfaces. LDA Fisherfaces became dominantly used in PCA feature based face recognition. While Eigenfaces were also used for face reconstruction. In these approaches no global structure of the face is calculated which links the facial features or parts. Purely feature based approaches to facial recognition were overtaken in the late 1990s by the Bochum system, which used Gabor filter to record the face features and computed a grid of the face structure to link the features. Christoph von der Malsburg and his research team at the University of Bochum developed Elastic Bunch Graph Matching in the mid-1990s to extract a face out of an image using skin segmentation. By 1997, the face detection method developed by Malsburg outperformed most other facial detection systems on the market. The so-called "Bochum system" of face detection was sold commercially on the market as ZN-Face to operators of airports and other busy locations. The software was "robust enough to make identifications from less-than-perfect face views. It can also often see through such impediments to identification as mustaches, beards, changed hairstyles and glasses—even sunglasses". Real-time face detection in video footage became possible in 2001 with the Viola–Jones object detection framework for faces. Paul Viola and Michael Jones combined their face detection method with the Haar-like feature approach to object recognition in digital images to launch AdaBoost, the first real-time frontal-view face detector. By 2015, the Viola–Jones algorithm had been implemented using small low power detectors on handheld devices and embedded systems. Therefore, the Viola–Jones algorithm has not only broadened the practical application of face recognition systems but

    Read more →
  • Andrej Mrvar

    Andrej Mrvar

    Andrej Mrvar is a Slovenian computer scientist and a professor at the University of Ljubljana's Faculty of Social Sciences. He is known for his work in network analysis, graph drawing, decision making, virtual reality, timing and data processing of sports competitions. == Education and career == He is well known for his work on Pajek, a free software for analysis and visualization of large networks. Mrvar began work on Pajek in 1996 with Vladimir Batagelj. His book Exploratory Social Network Analysis with Pajek, coauthored with Wouter de Nooy and Vladimir Batagelj, is his most cited work. It was published by Cambridge University Press in three editions (first 2005, second 2011, and third 2018). The book was translated into Japanese (2009) and Chinese (first edition 2012, second 2014). With Anuška Ferligoj, he was a founding co-editor-in-chief of the Metodološki zvezki - Advances in Methodology and Statistics journal. == Awards and honors == Vidmar Award (Faculty of Electrical and Computer Engineering, University of Ljubljana): 1988, 1990 First prizes for contributions (with Vladimir Batagelj) to Graph Drawing Contests in years: 1995, 1996, 1997, 1998, 1999, 2000 and 2005 / Graph Drawing Hall of Fame. Award of University of Ljubljana for contributions in education and research (Svečana listina Univerze v Ljubljani za pomembne dosežke na področju vzgojnoizobraževalnega in znanstvenoraziskovalega dela): 2001 The INSNA's William D. Richards Software award for work on Pajek (with Vladimir Batagelj): 2013 Award of Faculty of Social Sciences, University of Ljubljana for scientific excellence (Priznanje za znanstveno odličnost): 2013 == Selected publications == Wouter de Nooy, Andrej Mrvar, Vladimir Batagelj, Mark Granovetter (Series Editor), Exploratory Social Network Analysis with Pajek (Structural Analysis in the Social Sciences), Cambridge University Press (First Edition: 2005, Second Edition: 2011, Third Edition: 2018 ). Japanese Translation (2010). Chinese Translation (First Edition: 2012, Second Edition: 2014) Andrej Mrvar and Vladimir Batagelj, Analysis and visualization of large networks with program package Pajek. Complex Adaptive Systems Modeling, 4:6. SpringerOpen, 2016 Vladimir Batagelj and Andrej Mrvar, Some Analyses of Erdős Collaboration Graph, Social Networks, 22, 173–186, 2000 Vladimir Batagelj and Andrej Mrvar, A Subquadratic Triad Census Algorithm for Large Sparse Networks with Small Maximum Degree. Social Networks, 23, 237–243, 2001 Patrick Doreian and Andrej Mrvar, A Partitioning Approach to Structural Balance, Social Networks, 18, 149–168, 1996 Patrick Doreian and Andrej Mrvar, Partitioning Signed Social Networks, Social Networks, 31, 1–11, 2009 Andrej Mrvar and Patrick Doreian, Partitioning Signed Two-Mode Networks, Journal of Mathematical Sociology, 33, 196–221, 2009 Patrick Doreian and Andrej Mrvar, The international reach of the Koch brothers network. In: Antonyuk, A. and Basov, N. (Eds.): Networks in the Global World V. NetGloW 2020. Lecture Notes in Networks and Systems, 181, 225–235. Springer, 2021 Patrick Doreian and Andrej Mrvar, Delineating Changes in the Fundamental Structure of Signed Networks, Frontiers in Physics, 294, 1–11, 2021 Patrick Doreian and Andrej Mrvar, Hubs and Authorities in the Koch Brothers Network. Social Networks, Social Networks, 64, 148–157, 2021 Patrick Doreian and Andrej Mrvar, Public issues, policy proposals, social movements, and the interests of the Koch Brothers network of allies, Quality and Quantity, 56, 305–322, 2022 Douglas R. White, Vladimir Batagelj, Andrej Mrvar, Analyzing Large Kinship and Marriage Networks with Pgraph and Pajek. Social Science Computer Review, 17, 245–274, 1999 Ion Georgiou, Ronald Concer, Andrej Mrvar, A Systemic Approach to Sociometric Group Research: Advancing The Work of Leslie Day Zeleny, 1939–1947, Social Networks, 63, 174–200, 2020

    Read more →