AI Avatar Kids

AI Avatar Kids — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Embodied cognitive science

Embodied cognitive science is an interdisciplinary field of research, the aim of which is to explain the mechanisms underlying intelligent behavior. It comprises three main methodologies: the modeling of psychological and biological systems in a holistic manner that considers the mind and body as a single entity; the formation of a common set of general principles of intelligent behavior; and the experimental use of robotic agents in controlled environments. == Contributors == Embodied cognitive science borrows heavily from embodied philosophy and the related research fields of cognitive science, psychology, neuroscience and artificial intelligence. Contributors to the field include: From the perspective of neuroscience, Gerald Edelman of the Neurosciences Institute at La Jolla, Francisco Varela of CNRS in France, and J. A. Scott Kelso of Florida Atlantic University From the perspective of psychology, Lawrence Barsalou, Michael Turvey, Vittorio Guidano and Eleanor Rosch From the perspective of linguistics, Gilles Fauconnier, George Lakoff, Mark Johnson, Leonard Talmy and Mark Turner From the perspective of language acquisition, Eric Lenneberg and Philip Rubin at Haskins Laboratories From the perspective of anthropology, Edwin Hutchins, Bradd Shore, James Wertsch and Merlin Donald. From the perspective of autonomous agent design, early work is sometimes attributed to Rodney Brooks or Valentino Braitenberg From the perspective of artificial intelligence, Understanding Intelligence by Rolf Pfeifer and Christian Scheier or How the Body Shapes the Way We Think, by Rolf Pfeifer and Josh C. Bongard From the perspective of philosophy, Andy Clark, Dan Zahavi, Shaun Gallagher, and Evan Thompson In 1950, Alan Turing proposed that a machine may need a human-like body to think and speak: It can also be maintained that it is best to provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English. That process could follow the normal teaching of a child. Things would be pointed out and named, etc. Again, I do not know what the right answer is, but I think both approaches should be tried. == Traditional cognitive theory == Embodied cognitive science is an alternative theory to cognition in which it minimizes appeals to computational theory of mind in favor of greater emphasis on how an organism's body determines how and what it thinks. Traditional cognitive theory is based mainly around symbol manipulation, in which certain inputs are fed into a processing unit that produces an output. These inputs follow certain rules of syntax, from which the processing unit finds semantic meaning. Thus, an appropriate output is produced. For example, a human's sensory organs are its input devices, and the stimuli obtained from the external environment are fed into the nervous system which serves as the processing unit. From here, the nervous system is able to read the sensory information because it follows a syntactic structure, thus an output is created. This output then creates bodily motions and brings forth behavior and cognition. Of particular note is that cognition is sealed away in the brain, meaning that mental cognition is cut off from the external world and is only possible by the input of sensory information. == The embodied cognitive approach == Embodied cognitive science differs from the traditionalist approach in that it denies the input-output system. This is chiefly due to the problems presented by the Homunculus argument, which concluded that semantic meaning could not be derived from symbols without some kind of inner interpretation. If some little man in a person's head interpreted incoming symbols, then who would interpret the little man's inputs? Because of the specter of an infinite regress, the traditionalist model began to seem less plausible. Thus, embodied cognitive science aims to avoid this problem by defining cognition in three ways. === Physical attributes of the body === The first aspect of embodied cognition examines the role of the physical body, particularly how its properties affect its ability to think. This part attempts to overcome the symbol manipulation component that is a feature of the traditionalist model. Depth perception, for instance, can be better explained under the embodied approach due to the sheer complexity of the action. Depth perception requires that the brain detect the disparate retinal images obtained by the distance of the two eyes. In addition, body and head cues complicate this further. When the head is turned in a given direction, objects in the foreground will appear to move against objects in the background. From this, it is said that some kind of visual processing is occurring without the need of any kind of symbol manipulation. This is because the objects appearing to move the foreground are simply appearing to move. This observation concludes then that depth can be perceived with no intermediate symbol manipulation necessary. A more poignant example exists through examining auditory perception. Generally speaking the greater the distance between the ears, the greater the possible auditory acuity. Also relevant is the amount of density in between the ears, for the strength of the frequency wave alters as it passes through a given medium. The brain's auditory system takes these factors into account as it process information, but again without any need for a symbolic manipulation system. This is because the distance between the ears for example does not need symbols to represent it. The distance itself creates the necessary opportunity for greater auditory acuity. The amount of density between the ears is similar, in that it is the actual amount itself that simply forms the opportunity for frequency alteration. Thus under consideration of the physical properties of the body, a symbolic system is unnecessary and an unhelpful metaphor. === The body's role in the cognitive process === The second aspect draws heavily from George Lakoff's and Mark Johnson's work on concepts. They argued that humans use metaphors whenever possible to better explain their external world. Humans also have a basic stock of concepts in which other concepts can be derived from. These basic concepts include spatial orientations such as up, down, front, and back. Humans can understand what these concepts mean because they can directly experience them from their own bodies. For example, because human movement revolves around standing erect and moving the body in an up-down motion, humans innately have these concepts of up and down. Lakoff and Johnson contend this is similar with other spatial orientations such as front and back too. As mentioned earlier, these basic stocks of spatial concepts are the basis in which other concepts are constructed. Happy and sad for instance are seen now as being up or down respectively. When someone says they are feeling down, what they are really saying is that they feel sad for example. Thus the point here is that true understanding of these concepts is contingent on whether one can have an understanding of the human body. So the argument goes that if one lacked a human body, they could not possibly know what up or down could mean, or how it could relate to emotional states. [I]magine a spherical being living outside of any gravitational field, with no knowledge or imagination of any other kind of experience. What could UP possibly mean to such a being? While this does not mean that such beings would be incapable of expressing emotions in other words, it does mean that they would express emotions differently from humans. Human concepts of happiness and sadness would be different because human would have different bodies. So then an organism's body directly affects how it can think, because it uses metaphors related to its body as the basis of concepts. === Interaction of local environment === A third component of the embodied approach looks at how agents use their immediate environment in cognitive processing. Meaning, the local environment is seen as an actual extension of the body's cognitive process. The example of a personal digital assistant (PDA) is used to better imagine this. Echoing functionalism (philosophy of mind), this point claims that mental states are individuated by their role in a much larger system. So under this premise, the information on a PDA is similar to the information stored in the brain. So then if one thinks information in the brain constitutes mental states, then it must follow that information in the PDA is a cognitive state too. Consider also the role of pen and paper in a complex multiplication problem. The pen and paper are so involved in the cognitive process of solving the problem that it seems ridiculous to say they are somehow different from the process, in very much the same way the PDA is used for information like the brain. Another example examines how humans control and manipulate their environment
Read more →
Fuzzy mathematics

Fuzzy mathematics is a branch of mathematics that extends classical set theory and logic to model reasoning under uncertainty. Initiated by Lotfi Asker Zadeh in 1965 with the introduction of fuzzy sets, the field has since evolved to include fuzzy set theory, fuzzy logic, and various fuzzy analogues of traditional mathematic structures. Unlike classical mathematics, which usually relies on binary membership (an element either belongs to a set or it does not), fuzzy mathematics allows elements to partially belong to a set, with degrees of membership represented by values in the interval [0, 1]. This framework enables more flexible modeling of imprecise or vague concepts. Fuzzy mathematics has found applications in numerous domains, including control theory, artificial intelligence, decision theory, pattern recognition, and linguistics, where the modeling of gradations and uncertainty is essential. == Definition == A fuzzy subset A of a set X is defined by a function A: X → L, where L is typically the interval [0, 1]. This function is called the membership function of the fuzzy subset and assigns to each element x in X a degree of membership A(x) in the fuzzy set A. In classical set theory, a subset of X can be represented by an indicator function (also known as a characteristic function), which maps elements to either 0 or 1, indicating non-membership or full membership, respectively. Fuzzy subsets generalize this concept by allowing any real value between 0 and 1, thereby enabling partial membership. More generally, the codomain L of the membership function can be replaced with any complete lattice, resulting in the broader framework of L-fuzzy sets. == Fuzzification == The development of fuzzification in mathematics can be broadly divided into three historical stages: Initial, straightforward fuzzifications (1960s–1970s), Expansion of generalization techniques (1980s), Standardization, axiomatization, and L-fuzzification (1990s). Fuzzification generally involves extending classical mathematical concepts from binary (crisp) logic, where membership is determined by characteristic functions, to fuzzy logic, where membership is expressed by values in the interval [0, 1] via membership functions. Let A and B be fuzzy subsets of a set X. The fuzzy versions of set-theoretic operations are commonly defined as: ( A ∩ B ) ( x ) = min ( A ( x ) , B ( x ) ) {\displaystyle (A\cap B)(x)=\min(A(x),B(x))} ( A ∪ B ) ( x ) = max ( A ( x ) , B ( x ) ) {\displaystyle (A\cup B)(x)=\max(A(x),B(x))} for all x ∈ X {\displaystyle x\in X} . These operations can be generalized using t-norms and t-conorms, respectively. For example, the minimum operation can be replaced by multiplication: ( A ∩ B ) ( x ) = A ( x ) ⋅ B ( x ) {\displaystyle (A\cap B)(x)=A(x)\cdot B(x)} Fuzzification of algebraic structures often relies on generalizing the closure property. Let ∗ {\displaystyle } be a binary operation on X, and let A be a fuzzy subset of X. Then A is said to satisfy fuzzy closure if: A ( x ∗ y ) ≥ min ( A ( x ) , A ( y ) ) {\displaystyle A(xy)\geq \min(A(x),A(y))} for all x , y ∈ X {\displaystyle x,y\in X} . If ( G , ∗ ) {\displaystyle (G,)} is a group, then a fuzzy subset A of G is a fuzzy subgroup if: A ( x ∗ y − 1 ) ≥ min ( A ( x ) , A ( y − 1 ) ) {\displaystyle A(xy^{-1})\geq \min(A(x),A(y^{-1}))} for all x , y ∈ G {\displaystyle x,y\in G} . Similar generalizations apply to relational properties. For example, for example, for fuzzification of the transitivity property, a fuzzy relation R {\displaystyle R} on X {\displaystyle X} (i.e., a fuzzy subset of X × X {\displaystyle X\times X} ) is said to be fuzzy transitive if: R ( x , z ) ≥ min ( R ( x , y ) , R ( y , z ) ) {\displaystyle R(x,z)\geq \min(R(x,y),R(y,z))} for all x , y , z ∈ X {\displaystyle x,y,z\in X} . == Fuzzy analogues == Fuzzy subgroupoids and fuzzy subgroups were introduced in 1971 by A. Rosenfeld. Analogues of other mathematical subjects have been translated to fuzzy mathematics, such as fuzzy field theory and fuzzy Galois theory, fuzzy topology, fuzzy geometry, fuzzy orderings, and fuzzy graphs.
Read more →
Clinical decision support system

A clinical decision support system (CDSS) is a form of health information technology that provides clinicians, staff, patients, or other individuals with knowledge and person-specific information to enhance decision-making in clinical workflows. CDSS tools include alerts and reminders, clinical guidelines, condition-specific order sets, patient data summaries, diagnostic support, and context-aware reference information. They often leverage artificial intelligence to analyze clinical data and help improve care quality and safety. CDSSs constitute a major topic in artificial intelligence in medicine. == Characteristics == A clinical decision support system is an active knowledge system that uses variables of patient data to produce advice regarding health care. This implies that a CDSS is simply a decision support system focused on using knowledge management. === Purpose === The main purpose of modern CDSS is to assist clinicians at the point of care. This means that clinicians interact with a CDSS to help to analyze and reach a diagnosis based on patient data for different diseases. In the early days, CDSSs were conceived to make decisions for the clinician in a literal manner. The clinician would input the information and wait for the CDSS to output the "right" choice, and the clinician would simply act on that output. However, the modern methodology of using CDSSs to assist means that the clinician interacts with the CDSS, utilizing both their knowledge and the CDSS's, better to analyse the patient's data than either a human or a CDSS could do on their own. Typically, a CDSS makes suggestions for the clinician to review, and the clinician is expected to pick out useful information from the presented results and discount erroneous CDSS suggestions. The two main types of CDSS are knowledge-based systems and non-knowledge-based (machine learning–based) systems: An example of how a clinician might use a clinical decision support system is a diagnosis decision support system (DDSS). DDSS requests some of the patient's data and, in response, proposes a set of possible diagnoses. The physician then takes the output of the DDSS and determines which diagnoses are likely and which are not, and, if necessary, orders further tests to narrow down the diagnosis. Another example of a CDSS would be a case-based reasoning (CBR) system. A CBR system might use previous case data to help determine the appropriate amount of beams and the optimal beam angles for use in radiotherapy for brain cancer patients; medical physicists and oncologists would then review the recommended treatment plan to determine its viability. Another important classification of a CDSS is based on the timing of its use. Physicians use these systems at the point of care to help them as they are dealing with a patient, with the timing of use being either pre-diagnosis, during diagnosis, or post-diagnosis. Pre-diagnosis CDSS systems help the physician prepare the diagnoses. CDSSs help review and filter the physician's preliminary diagnostic choices to improve outcomes. Post-diagnosis CDSS systems are used to mine data to derive connections between patients and their past medical history and clinical research to predict future events. Early speculation that AI-based decision support would replace clinicians in common tasks has largely given way to a consensus around assistive models, in which AI augments rather than supplants clinical judgment. Contemporary deep learning-based systems, unlike earlier rule-based tools, can be trained directly on clinical data without manual rule authoring and integrated into electronic health record workflows at the point of care. Another approach, used by the National Health Service in England, is to use a CDSS to triage medical conditions out of hours by suggesting a suitable next step to the patient (e.g. call an ambulance, or see a general practitioner on the next working day). The suggestion, which may be disregarded by either the patient or the phone operative if common sense or caution suggests otherwise, is based on the known information and an implicit conclusion about what the worst-case diagnosis is likely to be; it is not always revealed to the patient because it might well be incorrect and is not based on a medically-trained person's opinion - it is only used for initial triage purposes. === Knowledge-based === Most CDSSs consist of three parts: the knowledge base, an inference engine, and a mechanism to communicate. The knowledge base contains the rules and associations of compiled data which most often take the form of IF-THEN rules. If this was a system for determining drug interactions, then a rule might be that IF drug X is taken AND drug Y is taken THEN alert the user. Using another interface, an advanced user could edit the knowledge base to keep it up to date with new drugs. The inference engine combines the rules from the knowledge base with the patient's data. The communication mechanism allows the system to show the results to the user as well as have input into the system. An expression language such as GELLO or CQL (Clinical Quality Language) is needed for expressing knowledge artefacts in a computable manner. For example: if a patient has diabetes mellitus, and if the last haemoglobin A1c test result was less than 7%, recommend re-testing if it has been over six months, but if the last test result was greater than or equal to 7%, then recommend re-testing if it has been over three months. The current focus of the HL7 CDS WG is to build on the Clinical Quality Language (CQL). The U.S. Centers for Medicare & Medicaid Services (CMS) has announced that it plans to use CQL for the specification of Electronic Clinical Quality Measures (eCQMs). === Non-knowledge-based === CDSSs which do not use a knowledge base use a form of artificial intelligence called machine learning, which allow computers to learn from past experiences and/or find patterns in clinical data. This eliminates the need for writing rules and expert input. However, since systems based on machine learning cannot explain the reasons for their conclusions, most clinicians do not use them directly for diagnoses, reliability and accountability reasons. Nevertheless, they can be useful as post-diagnostic systems, for suggesting patterns for clinicians to look into in more depth. As of 2012, three types of non-knowledge-based systems are support-vector machines, artificial neural networks and genetic algorithms. Artificial neural networks use nodes and weighted connections between them to analyse the patterns found in patient data to derive associations between symptoms and a diagnosis. Genetic algorithms are based on simplified evolutionary processes using directed selection to achieve optimal CDSS results. The selection algorithms evaluate components of random sets of solutions to a problem. The solutions that come out on top are then recombined and mutated and run through the process again. This happens over and over until the proper solution is discovered. They are functionally similar to neural networks in that they are also "black boxes" that attempt to derive knowledge from patient data. Non-knowledge-based networks often focus on a narrow list of symptoms, such as symptoms for a single disease, as opposed to the knowledge-based approach, which covers the diagnosis of many diseases. An example of a non-knowledge-based CDSS is a web server developed using a support vector machine for the prediction of gestational diabetes in Ireland. == Regulations == === History, United States === The IOM had published a report in 1999, To Err is Human, which focused on the patient safety crisis in the United States, pointing to the incredibly high number of deaths. This statistic attracted great attention to the quality of patient care. The Institute of Medicine (IOM) promoted the usage of health information technology, including clinical decision support systems, to advance the quality of patient care. With the enactment of the American Recovery and Reinvestment Act of 2009 (ARRA), there was a push for widespread adoption of health information technology through the Health Information Technology for Economic and Clinical Health Act (HITECH). Through these initiatives, more hospitals and clinics were integrating electronic medical records (EMRs) and computerized physician order entry (CPOE) within their health information processing and storage. Despite the absence of laws, the CDSS vendors would almost certainly be viewed as having a legal duty of care to both the patients who may adversely be affected due to CDSS usage and the clinicians who may use the technology for patient care. However, duties of care legal regulations are not explicitly defined yet. With the enactment of the HITECH Act included in the ARRA, encouraging the adoption of health IT, more detailed case laws for CDSS and EMRs were still being defined by the Office of National Coordinator for Health Informati
Read more →
Argument Interchange Format

The Argument Interchange Format (AIF) is an international effort to develop a representational mechanism for exchanging argument resources between research groups, tools, and domains using a semantically rich language. AIF traces its history back to a 2005 colloquium in Budapest. The result of the work in Budapest was first published as a draft description in 2006. Building on this foundation, further work then used the AIF to build foundations for the Argument Web. AIF-RDF is the extended ontology represented in the Resource Description Framework Schema (RDFS) semantic language. The Argument Interchange Format introduces a small set of ontological concepts that aim to capture a common understanding of argument -- one that works in multiple domains (both domains of argumentation and also domains of academic research), so that data can be shared and re-used across different projects in different areas. These ontological concepts are: Information (I-nodes) Applications of Rules of Inference (RA-nodes) Applications of Rules of Conflict (CA-nodes) Applications of Rules of Preference (PA-nodes) extended by: Schematic Forms (F-nodes) that are instantiated by RA, CA and PA nodes The AIF has reifications in a variety of development environments and implementation languages including MySQL database schema RDF Prolog JSON as well as translations to visual languages such as DOT and SVG. AIF data can be accessed online at AIFdb.
Read more →
Large language model

A large language model (LLM) is a neural network trained on a vast amount of text for natural language processing tasks, especially language generation. LLMs can typically generate, summarize, translate and analyze text in many contexts, and are a foundational technology behind modern chatbots. Biased or inaccurate training data can make an LLM's output less reliable. As of 2026, the most capable LLMs are based on transformer architectures, which, according to the 2017 paper "Attention Is All You Need", can be more efficient and parallelizable than earlier statistical and recurrent neural network models. Benchmark evaluations for LLMs attempt to measure model reasoning, factual accuracy, alignment, and safety. == History == Before the emergence of transformer-based models in 2017, some language models were considered large relative to the computational and data constraints of their time. In the early 1990s, IBM's statistical models pioneered word alignment techniques for machine translation, laying the groundwork for corpus-based language modeling. In 2001, a smoothed n-gram model, such as those employing Kneser–Ney smoothing, trained on 300 million words, achieved state-of-the-art perplexity on benchmark tests. During the 2000s, with the rise of widespread internet access, researchers began compiling massive text datasets from the web ("web as corpus") to train statistical language models. Moving beyond n-gram models, researchers started in 2000 to use neural networks as language models. Following the breakthrough of deep neural networks in image classification around 2012, similar architectures were adapted for language tasks. This shift was marked by the development of word embeddings (e.g., Word2Vec by Mikolov in 2013) and sequence-to-sequence (seq2seq) models using LSTM. In 2016, Google transitioned its translation service to neural machine translation (NMT), replacing statistical phrase-based models with deep recurrent neural networks. These early NMT systems used LSTM-based encoder-decoder architectures, as they preceded the invention of transformers. At the 2017 NeurIPS conference, Google researchers introduced the transformer architecture in their landmark paper "Attention Is All You Need". This paper's goal was to improve upon 2014 seq2seq technology, and was based mainly on the attention mechanism developed by Bahdanau et al. in 2014. The following year in 2018, BERT was introduced and quickly became "ubiquitous". Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model. Academic and research usage of BERT began to decline in 2023, following rapid improvements in the abilities of decoder-only models (such as GPT) to solve tasks via prompting. Although decoder-only GPT-1 was introduced in 2018, it was GPT-2 in 2019 that caught widespread attention because OpenAI claimed to have initially deemed it too powerful to release publicly, out of fear of malicious use. GPT-3 in 2020 went a step further and as of 2025 is available only via API with no offering of downloading the model to execute locally. But it was the consumer-facing chatbot ChatGPT in late 2022 that received extensive media coverage and public attention by 2023. The 2023 GPT-4 was praised for its increased accuracy and as a "holy grail" for its multimodal capabilities. OpenAI did not reveal the high-level architecture and the number of parameters of GPT-4. The release of ChatGPT led to an uptick in LLM usage across several research subfields of computer science, including robotics, software engineering, and societal impact work. In 2024, OpenAI released the reasoning model OpenAI o1, which generates long chains of thought before returning a final answer. Many LLMs with parameter counts comparable to those of OpenAI's GPT series have been developed. Since 2022, weights-available models have been gaining popularity, especially at first with BLOOM and LLaMA, though both have restrictions on usage and deployment. Mistral AI's open-weight models Mistral 7B and Mixtral 8x7B have a more permissive Apache License. In January 2025, DeepSeek released DeepSeek R1, a 671-billion-parameter open-weight model that performs comparably to OpenAI o1 but at a much lower price per token for users. Since 2023, many LLMs have been trained to be multimodal, having the ability to also process or generate other types of data, such as images, audio, or 3D meshes. Open-weight LLMs have become more influential since 2023. Per Vake et al. (2025), community-driven contributions to open-weight models improve their efficiency and performance via collaborative platforms such as Hugging Face. == Dataset preprocessing == === Tokenization === As machine learning algorithms process numbers rather than text, the text must be converted to numbers. In the first step, a vocabulary is decided upon, then integer indices are arbitrarily but uniquely assigned to each vocabulary entry, and finally, an embedding is associated with the integer index. Algorithms include byte-pair encoding (BPE) and WordPiece. There are also special tokens serving as control characters, such as [MASK] for masked-out token (as used in BERT), and [UNK] ("unknown") for characters not appearing in the vocabulary. Also, some special symbols are used to denote special text formatting. For example, "Ġ" denotes a preceding whitespace in RoBERTa and GPT and "##" denotes continuation of a preceding word in BERT. For example, the BPE tokenizer used by the legacy version of GPT-3 would split tokenizer: texts -> series of numerical "tokens" as Tokenization also compresses the datasets. Because LLMs generally require input to be an array that is not jagged, the shorter texts must be "padded" until they match the length of the longest one. ==== Byte-pair encoding ==== As an example, consider a tokenizer based on byte-pair encoding. In the first step, all unique characters (including blanks and punctuation marks) are treated as an initial set of n-grams (i.e. initial set of uni-grams). Successively the most frequent pair of adjacent characters is merged into a bi-gram and all instances of the pair are replaced by it. All occurrences of adjacent pairs of (previously merged) n-grams that most frequently occur together are then again merged into even lengthier n-gram, until a vocabulary of prescribed size is obtained. After a tokenizer is trained, any text can be tokenized by it, as long as it does not contain characters not appearing in the initial-set of uni-grams. === Dataset cleaning === In the context of training LLMs, datasets are typically cleaned by removing low-quality, duplicated, or toxic data. Cleaned datasets can increase training efficiency and lead to improved downstream performance. A trained LLM can be used to clean datasets for training a further LLM. With the increasing proportion of LLM-generated content on the web, data cleaning in the future may include filtering out such content. LLM-generated content can pose a problem if the content is similar to human text (making filtering difficult) but of lower quality (degrading performance of models trained on it). === Synthetic data === Training of largest language models might need more linguistic data than naturally available, or that the naturally occurring data is of insufficient quality. In these cases, synthetic data might be used. == Training == An LLM is a type of foundation model (large X model) trained on language. LLMs can be trained in different ways. In particular, GPT models are first pretrained to predict the next word on a large amount of data, before being fine-tuned. === Cost === Substantial infrastructure is necessary for training the largest models. The tendency towards larger models is visible in the list of large language models. For example, the training of GPT-2 (i.e. a 1.5-billion-parameter model) in 2019 cost $50,000, while training of the PaLM (i.e. a 540-billion-parameter model) in 2022 cost $8 million, and Megatron-Turing NLG 530B (in 2021) cost around $11 million. The qualifier "large" in "large language model" is inherently vague, as there is no definitive threshold for the number of parameters required to qualify as "large". === Fine-tuning === Before being fine-tuned, most LLMs are next-token predictors. The fine-tuning shapes the LLM's behavior via techniques like reinforcement learning from human feedback (RLHF) or constitutional AI. Instruction fine-tuning is a form of supervised learning used to teach LLMs to follow user instructions. In 2022, OpenAI demonstrated InstructGPT, a version of GPT-3 similarly fine-tuned to follow instructions. Reinforcement learning from human feedback (RLHF) involves training a reward model to predict which text humans prefer. Then, the LLM can be fine-tuned through reinforcement learning to better satisfy this reward model. Since humans typically prefer truthful, helpful and harmless answers, RLHF favors such answers. == Architecture == LLMs are generally based on the tra
Read more →
A.I. Insight forums

The Artificial Intelligence Insight forums, also known as the A.I. Insight forums, are a series of forums to build consensus on how the United States Congress should craft A.I. legislation. Organized by Senate Majority Leader Charles "Chuck" Schumer, the first of nine closed-door forums convened on September 13, 2023. == Background == Amid a surge in the popularity and advancement of artificial intelligence, senator Chuck Schumer launched an effort to establish a framework for the regulation of A.I. in April 2023. By the end of June, a preliminary framework – dubbed the "SAFE Innovation Framework" – was established and presented to Congress. Schumer also announced a series of forums wherein tech leaders who were well-acquainted with A.I. would help to "educate" Congress on the risks and problems that A.I. poses. Many tech leaders including Sam Altman, Elon Musk, and Sundar Pichai were set to attend the meetings. Many U.S. lawmakers and senators such as Mike Rounds and Todd Young were also set to attend. == September 13 forum == The overarching consensus following the conclusion of the September 13 forum was that there "should be" regulations regarding the use and advancement of A.I., but it should not be made "too fast". Many tech executives who attended the forum also warned senators of the risks and threats that A.I. could pose. Musk, who attended the forum, stated afterwards that there was "overwhelming consensus" on the regulation of A.I. === Invitees === This is a list of people who were invited to attend the September 13 forum. Elon Musk (Tesla, SpaceX, X Corp.) Sam Altman (OpenAI) Bill Gates (ex–Microsoft) Jensen Huang (Nvidia) Alex Karp (Palantir) Satya Nadella (Microsoft) Arvind Krishna (IBM) Sundar Pichai (Alphabet Inc., Google) Eric Schmidt (ex–Google) Mark Zuckerberg (Meta) Charles Rivkin (Motion Picture Association) Liz Shuler (AFL-CIO) Meredith Stiehm (Writers Guild of America) Randi Weingarten (American Federation of Teachers) Maya Wiley (LCCHR) == October 24 forum == The second of nine forums was hosted on October 24, 2023, as federal A.I. regulation drew nearer. According to Schumer's office, the forum was centered mainly on how A.I. could "enable innovation", and the innovation that is needed for the safe progression of A.I. At the forum, Senators Brian Schatz and John Kennedy introduced the "Schatz-Kennedy A.I. Labeling Act", a new piece of A.I. legislation that would provide "more transparency on A.I.-generated content". Following the forum, Senator Rounds stated that in order to fuel the development of A.I., a total estimated $56 billion would be needed for the next three years. Rounds, alongside Senator Young and Schumer, also highlighted the need to outcompete China and workforce initiatives. === Invitees === 21 people were invited to attend the forum, and were composed largely of venture capitalists, academics, civil rights campaigners, and industry figures. Some key figures included venture capitalists Marc Andreessen and John Doerr. == Future == Over the course of fall 2023, there is slated to be a total of nine forums on the topic of A.I., with the first hosted on September 13.
Read more →
Generative adversarial network

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks compete with each other in the form of a zero-sum game, where one agent's gain is another agent's loss. Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning, and reinforcement learning. The core idea of a GAN is based on the "indirect" training through the discriminator, another neural network that can tell how "realistic" the input seems, which itself is also being updated dynamically. This means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner. GANs are similar to mimicry in evolutionary biology, with an evolutionary arms race between both networks. == Definition == === Mathematical === The original GAN is defined as the following game: Each probability space ( Ω , μ ref ) {\displaystyle (\Omega ,\mu _{\text{ref}})} defines a GAN game. There are 2 players: generator and discriminator. The generator's strategy set is P ( Ω ) {\displaystyle {\mathcal {P}}(\Omega )} , the set of all probability measures μ G {\displaystyle \mu _{G}} on Ω {\displaystyle \Omega } . The discriminator's strategy set is the set of Markov kernels μ D : Ω → P [ 0 , 1 ] {\displaystyle \mu _{D}:\Omega \to {\mathcal {P}}[0,1]} , where P [ 0 , 1 ] {\displaystyle {\mathcal {P}}[0,1]} is the set of probability measures on [ 0 , 1 ] {\displaystyle [0,1]} . The GAN game is a zero-sum game, with objective function L ( μ G , μ D ) := E x ∼ μ ref , y ∼ μ D ( x ) ⁡ [ ln ⁡ y ] + E x ∼ μ G , y ∼ μ D ( x ) ⁡ [ ln ⁡ ( 1 − y ) ] . {\displaystyle L(\mu _{G},\mu _{D}):=\operatorname {E} _{x\sim \mu _{\text{ref}},y\sim \mu _{D}(x)}[\ln y]+\operatorname {E} _{x\sim \mu _{G},y\sim \mu _{D}(x)}[\ln(1-y)].} The generator aims to minimize the objective, and the discriminator aims to maximize the objective. The generator's task is to approach μ G ≈ μ ref {\displaystyle \mu _{G}\approx \mu _{\text{ref}}} , that is, to match its own output distribution as closely as possible to the reference distribution. The discriminator's task is to output a value close to 1 when the input appears to be from the reference distribution, and to output a value close to 0 when the input looks like it came from the generator distribution. === In practice === The generative network generates candidates while the discriminative network evaluates them. This creates a contest based on data distributions, where the generator learns to map from a latent space to the true data distribution, aiming to produce candidates that the discriminator cannot distinguish from real data. The discriminator's goal is to correctly identify these candidates, but as the generator improves, its task becomes more challenging, increasing the discriminator's error rate. A known dataset serves as the initial training data for the discriminator. Training involves presenting it with samples from the training dataset until it achieves acceptable accuracy. The generator is trained based on whether it succeeds in fooling the discriminator. Typically, the generator is seeded with randomized input that is sampled from a predefined latent space (e.g. a multivariate normal distribution). Thereafter, candidates synthesized by the generator are evaluated by the discriminator. Independent backpropagation procedures are applied to both networks so that the generator produces better samples, while the discriminator becomes more skilled at flagging synthetic samples. When used for image generation, the generator is typically a deconvolutional neural network, and the discriminator is a convolutional neural network. === Relation to other statistical machine learning methods === GANs are implicit generative models, which means that they do not explicitly model the likelihood function nor provide a means for finding the latent variable corresponding to a given sample, unlike alternatives such as flow-based generative model. Compared to fully visible belief networks such as WaveNet and PixelRNN and autoregressive models in general, GANs can generate one complete sample in one pass, rather than multiple passes through the network. Compared to Boltzmann machines and linear ICA, there is no restriction on the type of function used by the network. Since neural networks are universal approximators, GANs are asymptotically consistent. Variational autoencoders might be universal approximators, but it is not proven as of 2017. == Mathematical properties == === Measure-theoretic considerations === This section provides some of the mathematical theory behind these methods. In modern probability theory based on measure theory, a probability space also needs to be equipped with a σ-algebra. As a result, a more rigorous definition of the GAN game would make the following changes:Each probability space ( Ω , B , μ ref ) {\displaystyle (\Omega ,{\mathcal {B}},\mu _{\text{ref}})} defines a GAN game. The generator's strategy set is P ( Ω , B ) {\displaystyle {\mathcal {P}}(\Omega ,{\mathcal {B}})} , the set of all probability measures μ G {\displaystyle \mu _{G}} on the measure-space ( Ω , B ) {\displaystyle (\Omega ,{\mathcal {B}})} . The discriminator's strategy set is the set of Markov kernels μ D : ( Ω , B ) → P ( [ 0 , 1 ] , B ( [ 0 , 1 ] ) ) {\displaystyle \mu _{D}:(\Omega ,{\mathcal {B}})\to {\mathcal {P}}([0,1],{\mathcal {B}}([0,1]))} , where B ( [ 0 , 1 ] ) {\displaystyle {\mathcal {B}}([0,1])} is the Borel σ-algebra on [ 0 , 1 ] {\displaystyle [0,1]} .Since issues of measurability never arise in practice, these will not concern us further. === Choice of the strategy set === In the most generic version of the GAN game described above, the strategy set for the discriminator contains all Markov kernels μ D : Ω → P [ 0 , 1 ] {\displaystyle \mu _{D}:\Omega \to {\mathcal {P}}[0,1]} , and the strategy set for the generator contains arbitrary probability distributions μ G {\displaystyle \mu _{G}} on Ω {\displaystyle \Omega } . However, as shown below, the optimal discriminator strategy against any μ G {\displaystyle \mu _{G}} is deterministic, so there is no loss of generality in restricting the discriminator's strategies to deterministic functions D : Ω → [ 0 , 1 ] {\displaystyle D:\Omega \to [0,1]} . In most applications, D {\displaystyle D} is a deep neural network function. As for the generator, while μ G {\displaystyle \mu _{G}} could theoretically be any computable probability distribution, in practice, it is usually implemented as a pushforward: μ G = μ Z ∘ G − 1 {\displaystyle \mu _{G}=\mu _{Z}\circ G^{-1}} . That is, start with a random variable z ∼ μ Z {\displaystyle z\sim \mu _{Z}} , where μ Z {\displaystyle \mu _{Z}} is a probability distribution that is easy to compute (such as the uniform distribution, or the Gaussian distribution), then define a function G : Ω Z → Ω {\displaystyle G:\Omega _{Z}\to \Omega } . Then the distribution μ G {\displaystyle \mu _{G}} is the distribution of G ( z ) {\displaystyle G(z)} . Consequently, the generator's strategy is usually defined as just G {\displaystyle G} , leaving z ∼ μ Z {\displaystyle z\sim \mu _{Z}} implicit. In this formalism, the GAN game objective is L ( G , D ) := E x ∼ μ ref ⁡ [ ln ⁡ D ( x ) ] + E z ∼ μ Z ⁡ [ ln ⁡ ( 1 − D ( G ( z ) ) ) ] . {\displaystyle L(G,D):=\operatorname {E} _{x\sim \mu _{\text{ref}}}[\ln D(x)]+\operatorname {E} _{z\sim \mu _{Z}}[\ln(1-D(G(z)))].} === Generative reparametrization === The GAN architecture has two main components. One is casting optimization into a game, of form min G max D L ( G , D ) {\displaystyle \min _{G}\max _{D}L(G,D)} , which is different from the usual kind of optimization, of form min θ L ( θ ) {\displaystyle \min _{\theta }L(\theta )} . The other is the decomposition of μ G {\displaystyle \mu _{G}} into μ Z ∘ G − 1 {\displaystyle \mu _{Z}\circ G^{-1}} , which can be understood as a reparametrization trick. To see its significance, one must compare GAN with previous methods for learning generative models, which were plagued with "intractable probabilistic computations that arise in maximum likelihood estimation and related strategies". At the same time, Kingma and Welling and Rezende et al. developed the same idea of reparametrization into a general stochastic backpropagation method. Among its first applications was the variational autoencoder. === Move order and st
Read more →
AI@50

AI@50, formally known as the "Dartmouth Artificial Intelligence Conference: The Next Fifty Years" (July 13–15, 2006), was a conference organized by James H. Moor, commemorating the 50th anniversary of the Dartmouth workshop which effectively inaugurated the history of artificial intelligence. Five of the original ten attendees were present: Marvin Minsky, Ray Solomonoff, Oliver Selfridge, Trenchard More, and John McCarthy. While sponsored by Dartmouth College, General Electric, and the Frederick Whittemore Foundation, a $200,000 grant from the Defense Advanced Research Projects Agency (DARPA) called for a report of the proceedings that would: Analyze progress on AI's original challenges during the first 50 years, and assess whether the challenges were "easier" or "harder" than originally thought and why Document what the AI@50 participants believe are the major research and development challenges facing this field over the next 50 years, and identify what breakthroughs will be needed to meet those challenges Relate those challenges and breakthroughs against developments and trends in other areas such as control theory, signal processing, information theory, statistics, and optimization theory. A summary report by the conference director, James H. Moor, was published in AI Magazine. == Conference Program and links to published papers == James H. Moor, conference Director, Introduction Carol Folt and Barry Scherr, Welcome Carey Heckman, Tonypandy and the Origins of Science === AI: Past, Present, Future === John McCarthy, What Was Expected, What We Did, and AI Today Marvin Minsky, The Emotion Machine === The Future Model of Thinking === Ron Brachman and Hector Levesque, A Large Part of Human Thought David Mumford, What is the Right Model for 'Thought'? Stuart Russell, The Approach of Modern AI === The Future of Network Models === Geoffrey Hinton & Simon Osindero, From Pandemonium to Graphical Models and Back Again Rick Granger, From Brain Circuits to Mind Manufacture === The Future of Learning & Search === Oliver Selfridge, Learning and Education for Software: New Approaches in Machine Learning Ray Solomonoff, Machine Learning — Past and Future Leslie Pack Kaelbling, Learning to be Intelligent Peter Norvig, Web Search as a Product of and Catalyst for AI === The Future of AI === Rod Brooks, Intelligence and Bodies Nils Nilsson, Routes to the Summit Eric Horvitz, In Pursuit of Artificial Intelligence: Reflections on Challenges and Trajectories === The Future of Vision === Eric Grimson, Intelligent Medical Image Analysis: Computer Assisted Surgery and Disease Monitoring Takeo Kanade, Artificial Intelligence Vision: Progress and Non-Progress Terry Sejnowski, A Critique of Pure Vision === The Future of Reasoning === Alan Bundy, Constructing, Selecting and Repairing Representations of Knowledge Edwina Rissland, The Exquisite Centrality of Examples Bart Selman, The Challenge and Promise of Automated Reasoning === The Future of Language and Cognition === Trenchard More The Birth of Array Theory and Nial Eugene Charniak, Why Natural Language Processing is Now Statistical Natural Language Processing Pat Langley, Intelligent Behavior in Humans and Machines === The Future of the Future === Ray Kurzweil, Why We Can Be Confident of Turing Test Capability Within a Quarter Century George Cybenko, The Future Trajectory of AI Charles J. Holland, DARPA's Perspective === AI and Games === Jonathan Schaeffer, Games as a Test-bed for Artificial Intelligence Research Danny Kopec, Chess and AI Shay Bushinsky, Principle Positions in Deep Junior's Development === Future Interactions with Intelligent Machines === Daniela Rus, Making Bodies Smart Sherry Turkle, From Building Intelligences to Nurturing Sensibilities === Selected Submitted Papers: Future Strategies for AI === J. Storrs Hall, Self-improving AI: An Analysis Selmer Bringsjord, The Logicist Manifesto Vincent C. Müller, Is There a Future for AI Without Representation? Kristinn R. Thórisson, Integrated A.I. Systems === Selected Submitted Papers: Future Possibilities for AI === Eric Steinhart, Survival as a Digital Ghost Colin T. A. Schmidt, Did You Leave That 'Contraption' Alone With Your Little Sister? Michael Anderson & Susan Leigh Anderson, The Status of Machine Ethics Marcello Guarini, Computation, Coherence, and Ethical Reasoning
Read more →
Kernel embedding of distributions

In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space Ω {\displaystyle \Omega } on which a sensible kernel function (measuring similarity between elements of Ω {\displaystyle \Omega } ) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in R d {\displaystyle \mathbb {R} ^{d}} , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in. The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data. Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings. Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables Intermediate density estimation is not needed Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel) If a characteristic kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven. Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. == Definitions == Let X {\displaystyle X} denote a random variable with domain Ω {\displaystyle \Omega } and distribution P {\displaystyle P} . Given a symmetric, positive-definite kernel k : Ω × Ω → R {\displaystyle k:\Omega \times \Omega \rightarrow \mathbb {R} } the Moore–Aronszajn theorem asserts the existence of a unique RKHS H {\displaystyle {\mathcal {H}}} on Ω {\displaystyle \Omega } (a Hilbert space of functions f : Ω → R {\displaystyle f:\Omega \to \mathbb {R} } equipped with an inner product ⟨ ⋅ , ⋅ ⟩ H {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {H}}} and a norm ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} ) for which k {\displaystyle k} is a reproducing kernel, i.e., in which the element k ( x , ⋅ ) {\displaystyle k(x,\cdot )} satisfies the reproducing property ⟨ f , k ( x , ⋅ ) ⟩ H = f ( x ) ∀ f ∈ H , ∀ x ∈ Ω . {\displaystyle \langle f,k(x,\cdot )\rangle _{\mathcal {H}}=f(x)\qquad \forall f\in {\mathcal {H}},\quad \forall x\in \Omega .} One may alternatively consider x ↦ k ( x , ⋅ ) {\displaystyle x\mapsto k(x,\cdot )} as an implicit feature mapping φ : Ω → H {\displaystyle \varphi :\Omega \rightarrow {\mathcal {H}}} (which is therefore also called the feature space), so that k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ H {\displaystyle k(x,x')=\langle \varphi (x),\varphi (x')\rangle _{\mathcal {H}}} can be viewed as a measure of similarity between points x , x ′ ∈ Ω . {\displaystyle x,x'\in \Omega .} While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel. === Kernel embedding === The kernel embedding of the distribution P {\displaystyle P} in H {\displaystyle {\mathcal {H}}} (also called the kernel mean or mean map) is given by: μ X := E [ k ( X , ⋅ ) ] = E [ φ ( X ) ] = ∫ Ω φ ( x ) d P ( x ) {\displaystyle \mu _{X}:=\mathbb {E} [k(X,\cdot )]=\mathbb {E} [\varphi (X)]=\int _{\Omega }\varphi (x)\ \mathrm {d} P(x)} If P {\displaystyle P} allows a square integrable density p {\displaystyle p} , then μ X = E k p {\displaystyle \mu _{X}={\mathcal {E}}_{k}p} , where E k {\displaystyle {\mathcal {E}}_{k}} is the Hilbert–Schmidt integral operator. A kernel is characteristic if the mean embedding μ : { family of distributions over Ω } → H {\displaystyle \mu :\{{\text{family of distributions over }}\Omega \}\to {\mathcal {H}}} is injective. Each distribution can thus be uniquely represented in the RKHS and all statistical features of distributions are preserved by the kernel embedding if a characteristic kernel is used. === Empirical kernel embedding === Given n {\displaystyle n} training examples { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} drawn independently and identically distributed (i.i.d.) from P , {\displaystyle P,} the kernel embedding of P {\displaystyle P} can be empirically estimated as μ ^ X = 1 n ∑ i = 1 n φ ( x i ) {\displaystyle {\widehat {\mu }}_{X}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})} === Joint distribution embedding === If Y {\displaystyle Y} denotes another random variable (for simplicity, assume the co-domain of Y {\displaystyle Y} is also Ω {\displaystyle \Omega } with the same kernel k {\displaystyle k} which satisfies ⟨ φ ( x ) ⊗ φ ( y ) , φ ( x ′ ) ⊗ φ ( y ′ ) ⟩ = k ( x , x ′ ) k ( y , y ′ ) {\displaystyle \langle \varphi (x)\otimes \varphi (y),\varphi (x')\otimes \varphi (y')\rangle =k(x,x')k(y,y')} ), then the joint distribution P ( x , y ) ) {\displaystyle P(x,y))} can be mapped into a tensor product feature space H ⊗ H {\displaystyle {\mathcal {H}}\otimes {\mathcal {H}}} via C X Y = E [ φ ( X ) ⊗ φ ( Y ) ] = ∫ Ω × Ω φ ( x ) ⊗ φ ( y ) d P ( x , y ) {\displaystyle {\mathcal {C}}_{XY}=\mathbb {E} [\varphi (X)\otimes \varphi (Y)]=\int _{\Omega \times \Omega }\varphi (x)\otimes \varphi (y)\ \mathrm {d} P(x,y)} By the equivalence between a tensor and a linear map, this joint embedding may be interpreted as an uncentered cross-covariance operator C X Y : H → H {\displaystyle {\mathcal {C}}_{XY}:{\mathcal {H}}\to {\mathcal {H}}} from which the cross-covariance of functions f , g ∈ H {\displaystyle f,g\in {\mathcal {H}}} can be computed as Cov ⁡ ( f ( X ) , g ( Y ) ) := E [ f ( X ) g ( Y ) ] − E [ f ( X ) ] E [ g ( Y ) ] = ⟨ f , C X Y g ⟩ H = ⟨ f ⊗ g , C X Y ⟩ H ⊗ H {\displaystyle \operatorname {Cov} (f(X),g(Y)):=\mathbb {E} [f(X)g(Y)]-\mathbb {E} [f(X)]\mathbb {E} [g(Y)]=\langle f,{\mathcal {C}}_{XY}g\rangle _{\mathcal {H}}=\langle f\otimes g,{\mathcal {C}}_{XY}\rangle _{{\mathcal {H}}\otimes {\mathcal {H}}}} Given n {\displaystyle n} pairs of training examples { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} drawn i.i.d. from P {\displaystyle P} , we can also empirically estimate the joint distribution kernel embedding via C ^ X Y = 1 n ∑ i = 1 n φ ( x i ) ⊗ φ ( y i ) {\displaystyle {\widehat {\mathcal {C}}}_{XY}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})\otimes \varphi (y_{i})} === Conditional distribution embedding === Given a conditional distribution P ( y ∣ x ) , {\displaystyle P(y\mid x),} one can define the corresponding RKHS embedding as μ Y ∣ x = E [ φ ( Y ) ∣ X ] = ∫ Ω φ ( y ) d P ( y ∣ x ) {\displaystyle \mu _{Y\mid x}=\mathbb {E} [\varphi (Y)\mid X]=\int _{\Omega
Read more →
Niceaunties

Niceaunties is the pseudonym of a Singapore-based artist and designer whose work incorporates generative artificial intelligence, video, and digital installation. Her practice centers around the figure of the "auntie", a common term for older women in Southeast Asian contexts, and explores themes such as aging, care, domesticity, and gender roles. Her work has been featured in exhibitions and media platforms including TED, Christie's Art + Tech, Expanded.Art, and publications such as The Guardian, The Straits Times. == Early life and career == Niceaunties was born in 1981 in Singapore. She attributes her inspiration for "auntie culture" to the matriarchal environment and older women of her household, including her grandmother, while growing up. She is also an architectural designer with Spark Architect. The Niceaunties project began in 2023 after she encountered AI-generated images in her work as an architect. It draws inspiration from women in the artist's family and broader Southeast Asian cultural dynamics. Her work often features AI-generated visuals created with tools such as DALL-E, Krea, RunwayML, and SORA. Her imagery and narratives center on the fictional "Auntieverse", which features older women in imagined settings involving community, ecology, and labor. Her notable works include 'Auntlantis', a five-part video series imagining older women engaged in ocean clean-up and collective ritual, and 'Goddess,' a video created with Sora, featuring a character who gradually forgets her divine identity through years of domestic labor. == Exhibitions == 2024 – Expanded.Art, Berlin – Auntiedote solo exhibition 2024 – TED (conference), Vancouver – Speaker and screening 2024 – Victoria and Albert Museum, London – Digital Art Weekend 2024 – Louisiana Museum of Modern Art, Denmark – Ocean exhibition 2025 – Christie's Augmented Intelligence Auction, New York == Reception == In 2024, Niceaunties gave a TED Talk titled The Weird and Wonderful Art of Niceaunties. Journalist Rebecca Ratcliffe, writing for The Guardian, described her work as combining AI with "the surreal and the political," noting her focus on older women as central characters. Her work has also received criticism for being reliant on generative AI, which many feel exploits and steals from traditional artists.
Read more →
Dry Drowning

Dry Drowning is a cyberpunk mystery visual novel developed by Studio V and published by VLG Publishing and WhisperGames for Microsoft Windows on August 2, 2019. It was released on the Nintendo Switch on February 22, 2021. == Gameplay == The player takes control of Mordred Foley and has to read through the story, while making decisions at certain points. Depending on the choices, the player can influence the relationship to other characters as well as the course of the game, discovering more than 150 story branches, and eventually reach one out of three different endings with variations. The game also includes passages where the player has to find clues or items on the screen by clicking on them. These can be used in interrogation scenes with certain characters in order to unmask them and discover their lies. Throughout the game, the player has access to an in-game operating system called AquaOS. With that, they can re-read their conversations, look at their found items, and read biographies of the characters encountered. == Plot == The game is set in the fictional and totalitarian city Nova Polemos in Europa in 2066. Mordred Foley and Hera Kairis are private investigators and before the events of the game, they sent two of the most dangerous serial killers ever, Jennifer Kingston and Robert Herrington, to the electric chair. However, after their execution, their agency underwent an investigation for falsifying the evidence presented during the case, which completely destroyed its reputation. Now they want to restart their careers and lives, while dealing with their past traumas. Soon, Mordred is caught up in several cases that all led him to believe that the dreaded serial killer named Pandora has returned. In order to solve these cases, both Mordred and Hera have to face their pasts and fears, all while a racist political party is about to make the lives of refugees in Nova Polemos even worse. == Development == The game was initially conceived by Giacomo Masi and Samuele Zolfanelli, then developed by Studio V and directed and written by Giacomo Masi. It was originally written in Italian and translated into English, Chinese, Japanese, Korean, and German. The soundtrack was composed, written, and performed by Giorgio Maioli. The ending theme and Hera's pieces, performed on piano, were created by Alessandro Masi. The background and character artworks were made by Giulia Carli, other graphic elements such as the UI were created by Samuele Zolfanelli. The developers cited L.A. Noire, Ace Attorney, Blade Runner and Heavy Rain as some of their inspirations for the game. === Releases === Dry Drowning was originally released on Microsoft Windows through Steam, GOG, Itch.io, and Utomik in August 2019. In July 2019, Giacomo Masi announced the game would be released for Xbox One in 2020, though it was not released that year. A Nintendo Switch port was released on February 22, 2021, and a version for PlayStation 4 is set to release in 2021. == Reception == According to review aggregator platform Metacritic, Dry Drowning received "mixed or average reviews" for PC based on 11 reviews and "generally favorable reviews" for Nintendo Switch based on 6 reviews. Fellow review aggregator OpenCritic assessed that the game received fair approval, being recommended by 55% of critics. 4players.de gave a positive rating of 80% and wrote: "Stylish noir thriller with an interesting story, but mechanical limitations – despite a variety of possible interactions." Screen Rant gave a mixed rating of 3 out of 5 stars and wrote, "Dry Drowning may be a fair bit messy, but there's charm here. Players who are willing to embrace the cheesier elements will find some joy in its well-crafted setting and a decent murder mystery plot. The game is constrictive and lacks the genuine shock and engagement of top tier visual novels like Doki Doki Literature Club!, but there are some moments of clever world building and a strong enough mystery propelling it." The Italian review site SpazioGames gave a positive rating of 8.5 out of 10 points and wrote: "Dry Drowning is a very good game with great narrative experience. Every relationship between the characters is layered to increase player involvement, and each choice has different consequences. A thriller game that deserves to be played." === Awards === The game won Best of EGS 2019 and Best of JOIN 2019 awards, an honorable mention at GAMEROME and was nominated as "Best Italian Debut Game" at the Italian Video Game Awards 2020. It was also declared Best Game at Join The Indie 2019.
Read more →
Someday (short story)

"Someday" is a science fiction short story by American writer Isaac Asimov. It was first published in the August 1956 issue of Infinity Science Fiction and reprinted in the collections Earth Is Room Enough (1957), The Complete Robot (1982), Robot Visions (1990), and The Complete Stories, Volume 1 (1990). == Plot summary == The story is set in a future where computers play a central role in organizing society. Humans are employed as computer operators, but they leave most of the thinking to machines. Indeed, whilst binary programming is taught at school, reading and writing have become obsolete. The story concerns a pair of boys who dismantle and upgrade an old Bard, a child's computer whose sole function is to generate random fairy tales. The boys download a book about computers into the Bard's memory in an attempt to expand its vocabulary, but the Bard simply incorporates computers into its standard fairy tale repertoire. The story ends with the boys excitedly leaving the room after deciding to go to the library to learn "squiggles" (writing) as a means of passing secret messages to one another. As they leave, one of the boys accidentally kicks the Bard's on switch. The Bard begins reciting a new story about a poor mistreated and often ignored robot called the Bard, whose sole purpose is to tell stories, which ends with the words: "the little computer knew then that computers would always grow wiser and more powerful until someday—someday—someday—…"
Read more →
Query understanding

Query understanding is the process of inferring the intent of a search engine user by extracting semantic meaning from the searcher’s keywords. Query understanding methods generally take place before the search engine retrieves and ranks results. It is related to natural language processing but specifically focused on the understanding of search queries. == Methods == === Stemming and lemmatization === Many languages inflect words to reflect their role in the utterance they appear in. The variation between various forms of a word is likely to be of little importance for the relatively coarse-grained model of meaning involved in a retrieval system, and for this reason the task of conflating the various forms of a word is a potentially useful technique to increase recall of a retrieval system. Stemming algorithms, also known as stemmers, typically use a collection of simple rules to remove suffixes intended to model the language’s inflection rules. For some languages, there are simple lemmatisation methods to reduce a word in query to its lemma or root form or its stem; for others, this operation involves non-trivial string processing and may require recognizing the word's part of speech or referencing a lexical database. The effectiveness of stemming and lemmatization varies across languages. === Query Segmentation === Query segmentation is a key component of query understanding, aiming to divide a query into meaningful segments. Traditional approaches, such as the bag-of-words model, treat individual words as independent units, which can limit interpretative accuracy. For languages like Chinese, where words are not separated by spaces, segmentation is essential, as individual characters often lack standalone meaning. Even in English, the BOW model may not capture the full meaning, as certain phrases—such as "New York"—carry significance as a whole rather than as isolated terms. By identifying phrases or entities within queries, query segmentation enhances interpretation, enabling search engines to apply proximity and ordering constraints, ultimately improving search accuracy and user satisfaction. === Entity recognition === Entity recognition is the process of locating and classifying entities within a text string. Named-entity recognition specifically focuses on named entities, such as names of people, places, and organizations. In addition, entity recognition includes identifying concepts in queries that may be represented by multi-word phrases. Entity recognition systems typically use grammar-based linguistic techniques or statistical machine learning models. === Query rewriting === Query rewriting is the process of automatically reformulating a search query to more accurately capture its intent. Query expansion adds additional query terms, such as synonyms, in order to retrieve more documents and thereby increase recall. Query relaxation removes query terms to reduce the requirements for a document to match the query, thereby also increasing recall. Other forms of query rewriting, such as automatically converting consecutive query terms into phrases and restricting query terms to specific fields, aim to increase precision. === Spelling Correction === Automatic spelling correction is a critical feature of modern search engines, designed to address common spelling errors in user queries. Such errors are especially frequent as users often search for unfamiliar topics. By correcting misspelled queries, search engines enhance their understanding of user intent, thereby improving the relevance and quality of search results and overall user experience.
Read more →
Construction of t-norms

In mathematics, t-norms are a special kind of binary operations on the real unit interval [0, 1]. Various constructions of t-norms, either by explicit definition or by transformation from previously known functions, provide a plenitude of examples and classes of t-norms. This is important, e.g., for finding counter-examples or supplying t-norms with particular properties for use in engineering applications of fuzzy logic. The main ways of construction of t-norms include using generators, defining parametric classes of t-norms, rotations, or ordinal sums of t-norms. Relevant background can be found in the article on t-norms. == Generators of t-norms == The method of constructing t-norms by generators consists in using a unary function (generator) to transform some known binary function (most often, addition or multiplication) into a t-norm. In order to allow using non-bijective generators, which do not have the inverse function, the following notion of pseudo-inverse function is employed: Let f: [a, b] → [c, d] be a monotone function between two closed subintervals of extended real line. The pseudo-inverse function to f is the function f (−1): [c, d] → [a, b] defined as f ( − 1 ) ( y ) = { sup { x ∈ [ a , b ] ∣ f ( x ) < y } for f non-decreasing sup { x ∈ [ a , b ] ∣ f ( x ) > y } for f non-increasing. {\displaystyle f^{(-1)}(y)={\begin{cases}\sup\{x\in [a,b]\mid f(x)y\}&{\text{for }}f{\text{ non-increasing.}}\end{cases}}} === Additive generators === The construction of t-norms by additive generators is based on the following theorem: Let f: [0, 1] → [0, +∞] be a strictly decreasing function such that f(1) = 0 and f(x) + f(y) is in the range of f or in [f(0+), +∞] for all x, y in [0, 1]. Then the function T: [0, 1]2 → [0, 1] defined as T(x, y) = f (-1)(f(x) + f(y)) is a t-norm. Alternatively, one may avoid using the notion of pseudo-inverse function by having T ( x , y ) = f − 1 ( min ( f ( 0 + ) , f ( x ) + f ( y ) ) ) {\displaystyle T(x,y)=f^{-1}\left(\min \left(f(0^{+}),f(x)+f(y)\right)\right)} . The corresponding residuum can then be expressed as ( x ⇒ y ) = f − 1 ( max ( 0 , f ( y ) − f ( x ) ) ) {\displaystyle (x\Rightarrow y)=f^{-1}\left(\max \left(0,f(y)-f(x)\right)\right)} . And the biresiduum as ( x ⇔ y ) = f − 1 ( | f ( x ) − f ( y ) | ) {\displaystyle (x\Leftrightarrow y)=f^{-1}\left(\left|f(x)-f(y)\right|\right)} . If a t-norm T results from the latter construction by a function f which is right-continuous in 0, then f is called an additive generator of T. Examples: The function f(x) = 1 – x for x in [0, 1] is an additive generator of the Łukasiewicz t-norm. The function f defined as f(x) = –log(x) if 0 < x ≤ 1 and f(0) = +∞ is an additive generator of the product t-norm. The function f defined as f(x) = 2 – x if 0 ≤ x < 1 and f(1) = 0 is an additive generator of the drastic t-norm. Basic properties of additive generators are summarized by the following theorem: Let f: [0, 1] → [0, +∞] be an additive generator of a t-norm T. Then: T is an Archimedean t-norm. T is continuous if and only if f is continuous. T is strictly monotone if and only if f(0) = +∞. Each element of (0, 1) is a nilpotent element of T if and only if f(0) < +∞. The multiple of f by a positive constant is also an additive generator of T. T has no non-trivial idempotents. (Consequently, e.g., the minimum t-norm has no additive generator.) === Multiplicative generators === The isomorphism between addition on [0, +∞] and multiplication on [0, 1] by the logarithm and the exponential function allow two-way transformations between additive and multiplicative generators of a t-norm. If f is an additive generator of a t-norm T, then the function h: [0, 1] → [0, 1] defined as h(x) = e−f (x) is a multiplicative generator of T, that is, a function h such that h is strictly increasing h(1) = 1 h(x) · h(y) is in the range of h or equal to 0 or h(0+) for all x, y in [0, 1] h is right-continuous in 0 T(x, y) = h (−1)(h(x) · h(y)). Vice versa, if h is a multiplicative generator of T, then f: [0, 1] → [0, +∞] defined by f(x) = −log(h(x)) is an additive generator of T. == Parametric classes of t-norms == Many families of related t-norms can be defined by an explicit formula depending on a parameter p. This section lists the best known parameterized families of t-norms. The following definitions will be used in the list: A family of t-norms Tp parameterized by p is increasing if Tp(x, y) ≤ Tq(x, y) for all x, y in [0, 1] whenever p ≤ q (similarly for decreasing and strictly increasing or decreasing). A family of t-norms Tp is continuous with respect to the parameter p if lim p → p 0 T p = T p 0 {\displaystyle \lim _{p\to p_{0}}T_{p}=T_{p_{0}}} for all values p0 of the parameter. === Schweizer–Sklar t-norms === The family of Schweizer–Sklar t-norms, introduced by Berthold Schweizer and Abe Sklar in the early 1960s, is given by the parametric definition T p S S ( x , y ) = { T min ( x , y ) if p = − ∞ ( x p + y p − 1 ) 1 / p if − ∞ < p < 0 T p r o d ( x , y ) if p = 0 ( max ( 0 , x p + y p − 1 ) ) 1 / p if 0 < p < + ∞ T D ( x , y ) if p = + ∞ . {\displaystyle T_{p}^{\mathrm {SS} }(x,y)={\begin{cases}T_{\min }(x,y)&{\text{if }}p=-\infty \\(x^{p}+y^{p}-1)^{1/p}&{\text{if }}-\infty −∞ Continuous if and only if p < +∞ Strict if and only if −∞ < p ≤ 0 (for p = −1 it is the Hamacher product) Nilpotent if and only if 0 < p < +∞ (for p = 1 it is the Łukasiewicz t-norm). The family is strictly decreasing for p ≥ 0 and continuous with respect to p in [−∞, +∞]. An additive generator for T p S S {\displaystyle T_{p}^{\mathrm {SS} }} for −∞ < p < +∞ is f p S S ( x ) = { − log ⁡ x if p = 0 1 − x p p otherwise. {\displaystyle f_{p}^{\mathrm {SS} }(x)={\begin{cases}-\log x&{\text{if }}p=0\\{\frac {1-x^{p}}{p}}&{\text{otherwise.}}\end{cases}}} === Hamacher t-norms === The family of Hamacher t-norms, introduced by Horst Hamacher in the late 1970s, is given by the following parametric definition for 0 ≤ p ≤ +∞: T p H ( x , y ) = { T D ( x , y ) if p = + ∞ 0 if p = x = y = 0 x y p + ( 1 − p ) ( x + y − x y ) otherwise. {\displaystyle T_{p}^{\mathrm {H} }(x,y)={\begin{cases}T_{\mathrm {D} }(x,y)&{\text{if }}p=+\infty \\0&{\text{if }}p=x=y=0\\{\frac {xy}{p+(1-p)(x+y-xy)}}&{\text{otherwise.}}\end{cases}}} The t-norm T 0 H {\displaystyle T_{0}^{\mathrm {H} }} is called the Hamacher product. Hamacher t-norms are the only t-norms which are rational functions. The Hamacher t-norm T p H {\displaystyle T_{p}^{\mathrm {H} }} is strict if and only if p < +∞ (for p = 1 it is the product t-norm). The family is strictly decreasing and continuous with respect to p. An additive generator of T p H {\displaystyle T_{p}^{\mathrm {H} }} for p < +∞ is f p H ( x ) = { 1 − x x if p = 0 log ⁡ p + ( 1 − p ) x x otherwise. {\displaystyle f_{p}^{\mathrm {H} }(x)={\begin{cases}{\frac {1-x}{x}}&{\text{if }}p=0\\\log {\frac {p+(1-p)x}{x}}&{\text{otherwise.}}\end{cases}}} === Frank t-norms === The family of Frank t-norms, introduced by M.J. Frank in the late 1970s, is given by the parametric definition for 0 ≤ p ≤ +∞ as follows: T p F ( x , y ) = { T m i n ( x , y ) if p = 0 T p r o d ( x , y ) if p = 1 T L u k ( x , y ) if p = + ∞ log p ⁡ ( 1 + ( p x − 1 ) ( p y − 1 ) p − 1 ) otherwise. {\displaystyle T_{p}^{\mathrm {F} }(x,y)={\begin{cases}T_{\mathrm {min} }(x,y)&{\text{if }}p=0\\T_{\mathrm {prod} }(x,y)&{\text{if }}p=1\\T_{\mathrm {Luk} }(x,y)&{\text{if }}p=+\infty \\\log _{p}\left(1+{\frac {(p^{x}-1)(p^{y}-1)}{p-1}}\right)&{\text{otherwise.}}\end{cases}}} The Frank t-norm T p F {\displaystyle T_{p}^{\mathrm {F} }} is strict if p < +∞. The family is strictly decreasing and continuous with respect to p. An additive generator for T p F {\displaystyle T_{p}^{\mathrm {F} }} is f p F ( x ) = { − log ⁡ x if p = 1 1 − x if p = + ∞ log ⁡ p − 1 p x − 1 otherwise. {\displaystyle f_{p}^{\mathrm {F} }(x)={\begin{cases}-\log x&{\text{if }}p=1\\1-x&{\text{if }}p=+\infty \\\log {\frac {p-1}{p^{x}-1}}&{\text{otherwise.}}\end{cases}}} === Yager t-norms === The family of Yager t-norms, introduced in the early 1980s by Ronald R. Yager, is given for 0 ≤ p ≤ +∞ by T p Y ( x , y ) = { T D ( x , y ) if p = 0 max ( 0 , 1 − ( ( 1 − x ) p + ( 1 − y ) p ) 1 / p ) if 0 < p < + ∞ T m i n ( x , y ) if p = + ∞ {\displaystyle T_{p}^{\mathrm {Y} }(x,y)={\begin{cases}T_{\mathrm {D} }(x,y)&{\text{if }}p=0\\\max \left(0,1-((1-x)^{p}+(1-y)^{p})^{1/p}\right)&{\text{if }}0 Read more →
Smart speaker

A smart speaker is a type of loudspeaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "wake word" (or several "wake words"). Some smart speakers also act as smart home hubs by using Wi-Fi, Bluetooth, Thread, and other protocol standards to extend usage beyond audio playback and control home automation devices connected through a local area network. == History == Early voice-activated devices began in 2013 with MIT's Jasper project, which used multiple microphones and cloud software to power hands-free interactions from across a room. The first commercial smart speaker was the Amazon Echo, which was released in 2014 powered by Alexa and a ring of far-field microphones. Google followed in 2016 with Home, powered by Google Assistant. By 2017, devices like the Echo Show and Home Hub (later called Nest Hub) added touchscreens and video, creating the "smart display" subcategory. In 2018, Apple joined the smart speaker trend by launching the HomePod, which focused on high-quality audio alongside their built-in assistant Siri. ASUS release its own smart Speaker Xiao-Bu in 2019 with Artificial Intelligence, it terminates the Cloud Service on June 1st, 2025, which means all real-time service such as weather, news, currency conversion is affected. Sonos's 1st smart speaker Sonos One released in 2017, powered by Alexa. Invoke by Harman Kardon was powered by Microsoft's intelligent personal assistant, Cortana. In the early 2020s, smart speakers gained on-device voice processing for faster responses and improved privacy. New standards such as Matter and Thread allowed multitudes of smart-home devices (even from completely different brands) to work together. == Features == === Audio and Voice === Smart speakers use multiple microphones along with noise-cancelling software to pick up your voice from across the room, even when music is playing or the assistant is already talking. Noise suppression and echo cancellation is also used by the speaker so it can focus in on who is talking and ignore any background noises. Most smart speaker models can recognize who is speaking by voiceprint, which allows the speaker to grab information from that person's calendar, preferences, or music playlists. Listening to music on a speaker is when importance for good audio quality becomes apparent. Entry-level (cheaper) speakers such as the Home Mini or the Echo Dot have a single full-range driver. These lower-end speakers typically aren't great for listening to music as the audio quality is pretty poor. More advanced units such as the Home Max or Echo Studio have separate tweeters and woofers meant for listening to music in high quality. === Connectivity and smart-home control === Most connect over Wi-Fi or Bluetooth and support hub protocols like Thread and Matter. That lets them not only stream and play music but also allows you to control various brands of smart lights, thermostats, door locks, cameras, and much more-all from one point of control. Each can have its own designated interface and features in-house, usually launched or controlled via application or home automation software. These devices are able to communicate with each other via peer-to-peer connection through mesh networking. These speakers and related smart devices are typically controlled with one smartphone application. === Assistant services and skills === The built-in assistants handle timers, alarms, reminders, news briefings, weather updates, send messages to other smart devices, send texts, make calls, and simple questions. You can combine actions together in what are typically known as routines (for example saying "good morning" turns on lights, starts the coffee, says the weather, and reads the news) and add extra functions known as skills or actions (for things like ordering food or playing trivia games). This hands-free use of smart speakers can help assist those with disabilities. Most other technologies need the user to be able to physically interact with the device. Smart speakers are not bound by these limitations and can serve as an excellent tool for those who are unable to use their arms or legs or have vision issues. Although these tasks can be completed by a phone or computer, consumers tend to lean towards smart speakers due to factors such as their range being much greater than that of a phone and the need to not have to physically interact with the speaker to get the voice assistant as with most smartphones, certain parts of a phone may need to be interacted with to activate the speaking assistant. === Smart displays === Some smart speakers also include a screen to show the user a visual response. A smart speaker with a touchscreen is known as a smart display; these integrate a conversational user interface with display screens to augment voice interaction with images and video. They are powered by one of the common voice assistants and offer additional controls for smart home devices, feature streaming apps, and web browsers with touch controls for selecting content. The first smart displays were introduced in 2017 by Amazon (Amazon Echo Show) and Google (Google/Nest Home Hub). Hotel chain Marriott International partnered with Amazon to install Echo devices in select hotels since 2018. A Taiwanese startup, Aiello, launched the Aiello Voice Assistant (AVA) in the Asian hotel market in 2019, claiming it is powered by a multi-AI model system. Angie by Nomadix, which is similar to the Amazon Echo, launched its first product in 2017, specifically targeting hotel properties in the North America. In May 2019, Angie Hospitality acquired the assets of Roxy, a competitor that also built its own speech-enabled virtual assistant technology for hotels. This acquisition merged two proprietary NLP stacks into the current Nomadix product. === Artificial intelligence === The newest speakers can use on-device AI or cloud-based generative models to allow the smart speaker to carry on much more natural conversations, draft emails or recipes, suggest ideas based on context, or even create short pieces of music or art. This AI evolution allows these speakers to do far more than what they could do before. == Accuracy == According to a study by Proceedings of the National Academy of Sciences of the United States of America released In March 2020, the six biggest tech development companies, Amazon, Apple, Google, Yandex, IBM and Microsoft, have misidentified more words spoken by "black people" than "white people". The systems tested errors and unreadability, with a 19 and 35 percent discrepancy for the former and a 2 and 20 percent discrepancy for the latter. The North American Chapter of the Association for Computational Linguistics (NAACL) also identified a discrepancy between male and female voices. According to their research, Google's speech recognition software is 13 percent more accurate for men than women. It performs better than the systems used by Bing, AT&T, and IBM. == Privacy concerns == The built-in microphone in smart speakers is continuously listening for wake words followed by a command. However, these continuously listening microphones also raise privacy concerns among users. According to a survey taken by 1,007 people in Western Europe, it is clear that privacy is the biggest concern holding consumers back from buying "smart" products. these concerns include what is being recorded, how the data will be used, how it will be protected, and whether it will be used for invasive advertising. Furthermore, an analysis of Amazon Echo Dots showed that 30–38% of "spurious audio recordings were human conversations", suggesting that these devices capture audio other than strictly detection of the wake word. === As a wiretap === There are strong concerns that the ever-listening microphone of smart speakers presents a perfect candidate for wiretapping. In 2017, British security researcher Mark Barnes showed that pre-2017 Echos have exposed pins which allow for a compromised OS to be booted. According to Umar Iqbal, an assistant professor at Washington University in St. Louis, research indicates that data from consumer interactions with Alexa was used to targeted advertisements and products to consumer with over 40% of transmitted data lacking proper encryption raising privacy concerns. Further data indicates that due to the Smart Speakers ability to always capture audio, it begins to pick up on external conversations from consumers not related to commands given to the smart speaker. Things such as other members in the household, consumers on the phone and even TV audio can be picked up by these speakers and stored for future use by companies. === Voice assistance vs privacy === While voice assistants provide a valuable service, there can be some hesitation towards using them in various social contexts, such as in public or around other users. However, only more recently have users begun interac
Read more →