AI Assistant Picture

AI Assistant Picture — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • ImageMixer

    ImageMixer

    ImageMixer is a brand name of video editing software that edits digital video and still image in camcorders and authors to VCD and DVD. It is a second-party Japanese product, distributed by Pixela Corporation, a Japanese manufacturer of PC peripheral hardware and multimedia software. == Bundling == ImageMixer is widely used for several camcorder brands, such as JVC, Hitachi and Canon. Also, Sony has chosen to package ImageMixer with its DVD and HDD Handycam. == ImageMixer series == ImageMixer has other series of software for digital camera, such as ImageMixer Label Maker and ImageMixer DVD dubbing. ImageMixer also has movie editing solution for Macintosh. == Windows Vista version of ImageMixer == A Windows Vista version of ImageMixer has been developed (ImageMixer3).

    Read more →
  • Artificial consciousness

    Artificial consciousness

    Artificial consciousness, also known as machine consciousness, synthetic consciousness, or digital consciousness, is consciousness hypothesized to be possible for artificial intelligence. It is also the corresponding field of study, which draws insights from philosophy of mind, philosophy of artificial intelligence, cognitive science and neuroscience. The term "sentience" can be used when specifically designating ethical considerations stemming from a form of phenomenal consciousness (P-consciousness, or the ability to feel qualia). Since sentience involves the ability to experience ethically positive or negative (i.e., valenced) mental states, it may justify welfare concerns and legal protection, as with non-human animals. Some scholars believe that consciousness is generated by the interoperation of various parts of the brain; these mechanisms are labeled the neural correlates of consciousness (NCC). Some further believe that constructing a system (e.g., a computer system) that can emulate this NCC interoperation would result in a system that is conscious. Some scholars reject the possibility of non-biological conscious beings. == Philosophical views == As there are many hypothesized types of consciousness, there are many potential implementations of artificial consciousness. In the philosophical literature, perhaps the most common taxonomy of consciousness is into "access" and "phenomenal" variants. Access consciousness concerns those aspects of experience that can be apprehended, while phenomenal consciousness concerns those aspects of experience that seemingly cannot be apprehended, instead being characterized qualitatively in terms of "raw feels", "what it is like" or qualia. === Plausibility debate === Type-identity theorists and other skeptics hold the view that consciousness can be realized only in particular physical systems because consciousness has properties that necessarily depend on physical constitution. In his 2001 article "Artificial Consciousness: Utopia or Real Possibility," Giorgio Buttazzo says that a common objection to artificial consciousness is that, "Working in a fully automated mode, they [the computers] cannot exhibit creativity, unreprogrammation (which means can 'no longer be reprogrammed', from rethinking), emotions, or free will. A computer, like a washing machine, is a slave operated by its components." For other theorists (e.g., functionalists), who define mental states in terms of causal roles, any system that can instantiate the same pattern of causal roles, regardless of physical constitution, will instantiate the same mental states, including consciousness. ==== Thought experiments ==== David Chalmers proposed two thought experiments intending to demonstrate that "functionally isomorphic" systems (those with the same "fine-grained functional organization", i.e., the same information processing) will have qualitatively identical conscious experiences, regardless of whether they are based on biological neurons or digital hardware. The "fading qualia" is a reductio ad absurdum thought experiment. It involves replacing, one by one, the neurons of a brain with a functionally identical component, for example based on a silicon chip. Chalmers makes the hypothesis, knowing it in advance to be absurd, that "the qualia fade or disappear" when neurons are replaced one-by-one with identical silicon equivalents. Since the original neurons and their silicon counterparts are functionally identical, the brain's information processing should remain unchanged, and the subject's behaviour and introspective reports would stay exactly the same. Chalmers argues that this leads to an absurd conclusion: the subject would continue to report normal conscious experiences even as their actual qualia fade away. He concludes that the subject's qualia actually don't fade, and that the resulting robotic brain, once every neuron is replaced, would remain just as sentient as the original biological brain. Similarly, the "dancing qualia" thought experiment is another reductio ad absurdum argument. It supposes that two functionally isomorphic systems could have different perceptions (for instance, seeing the same object in different colors, like red and blue). It involves a switch that alternates between a chunk of brain that causes the perception of red, and a functionally isomorphic silicon chip, that causes the perception of blue. Since both perform the same function within the brain, the subject would not notice any change during the switch. Chalmers argues that this would be highly implausible if the qualia were truly switching between red and blue, hence the contradiction. Therefore, he concludes that the equivalent digital system would not only experience qualia, but it would perceive the same qualia as the biological system (e.g., seeing the same color). Greg Egan's short story Learning To Be Me (mentioned in §In fiction), illustrates how undetectable duplication of the brain and its functionality could be from a first-person perspective. Critics object that Chalmers' proposal begs the question in assuming that all mental properties and external connections are already sufficiently captured by abstract causal organization. Van Heuveln et al. argue that the dancing qualia argument contains an equivocation fallacy, conflating a "change in experience" between two systems with an "experience of change" within a single system. Mogensen argues that the fading qualia argument can be resisted by appealing to vagueness at the boundaries of consciousness and the holistic structure of conscious neural activity, which suggests consciousness may require specific biological substrates rather than being substrate-independent. Anil Seth argues that the complexity of brain neurons intrinsically matters in addition to their function and that it is not possible to replace any part of the brain with a perfect silicon equivalent. He points out that some of biological neurons exhibit activity aimed at cleaning up metabolic waste products, and writes that a perfect silicon replacement would require a silicon-based metabolism, but silicon is not suitable for creating such artificial metabolism. ==== In large language models ==== In 2022, Google engineer Blake Lemoine made a viral claim that Google's LaMDA chatbot was sentient. Lemoine supplied as evidence the chatbot's humanlike answers to many of his questions; however, the chatbot's behavior was judged by the scientific community as likely a consequence of mimicry, rather than machine sentience. Lemoine's claim was widely derided for being ridiculous. Moreover, attributing consciousness based solely on the basis of LLM outputs or the immersive experience created by an algorithm is considered a fallacy. However, while philosopher Nick Bostrom states that LaMDA is unlikely to be conscious, he additionally poses the question of "what grounds would a person have for being sure about it?" One would have to have access to unpublished information about LaMDA's architecture, and also would have to understand how consciousness works, and then figure out how to map the philosophy onto the machine: "(In the absence of these steps), it seems like one should be maybe a little bit uncertain. [...] there could well be other systems now, or in the relatively near future, that would start to satisfy the criteria." David Chalmers argued in 2023 that LLMs today display impressive conversational and general intelligence abilities, but are likely not conscious yet, as they lack some features that may be necessary, such as recurrent processing, a global workspace, and unified agency. Nonetheless, he considers that non-biological systems can be conscious, and suggested that future, extended models (LLM+s) incorporating these elements might eventually meet the criteria for consciousness, raising both profound scientific questions and significant ethical challenges. However, the view that consciousness can exist without biological phenomena is controversial and some reject it. Kristina Šekrst cautions that anthropomorphic terms such as "hallucination" can obscure important ontological differences between artificial and human cognition. While LLMs may produce human-like outputs, she argues that it does not justify ascribing mental states or consciousness to them. Instead, she advocates for an epistemological framework (such as reliabilism) that recognizes the distinct nature of AI knowledge production. She suggests that apparent understanding in LLMs may be a sophisticated form of AI hallucination. She also questions what would happen if an LLM were trained without any mention of consciousness. === Testing === Sentience is an inherently first-person phenomenon. Because of that, and due to the lack of an empirical definition of sentience, directly measuring it may be impossible. Although systems may display numerous behaviors correlated with sentience, determining whether a system is sentient is known as the hard pr

    Read more →
  • Workplace impact of artificial intelligence

    Workplace impact of artificial intelligence

    The impact of artificial intelligence on workers includes both applications to improve worker safety and health, and potential hazards that must be controlled. One potential application is using AI to eliminate hazards by removing humans from hazardous situations that involve risk of stress, overwork, or musculoskeletal injuries. Predictive analytics may also be used to identify conditions that may lead to hazards such as fatigue, repetitive strain injuries, or toxic substance exposure, leading to earlier interventions. Another is to streamline workplace safety and health workflows through automating repetitive tasks, enhancing safety training programs through virtual reality, or detecting and reporting near misses. When used in the workplace, AI also presents the possibility of new hazards. These may arise from machine learning techniques leading to unpredictable behavior and inscrutability in their decision-making, or from cybersecurity and information privacy issues. Many hazards of AI are psychosocial due to its potential to cause changes in work organization. These include increased monitoring leading to micromanagement, algorithms unintentionally or intentionally mimicking undesirable human biases, and assigning blame for machine errors to the human operator instead. AI may also lead to physical hazards in the form of human–robot collisions, and ergonomic risks of control interfaces and human–machine interactions. Hazard controls include cybersecurity and information privacy measures, communication and transparency with workers about data usage, and limitations on collaborative robots. From a workplace safety and health perspective, only "weak" or "narrow" AI that is tailored to a specific task is relevant, as there are many examples that are currently in use or expected to come into use in the near future. Certain digital technologies are predicted to result in job losses. Starting in the 2020s, the adoption of modern robotics has led to net employment growth. However, many businesses anticipate that automation, or employing robots would result in job losses in the future. This is especially true for companies in Central and Eastern Europe. Other digital technologies, such as platforms or big data, are projected to have a more neutral impact on employment. A large number of tech workers have been laid off starting in 2023; many such job cuts have been attributed to artificial intelligence. == Health and safety applications == In order for any potential AI health and safety application to be adopted, it requires acceptance by both managers and workers. For example, worker acceptance may be diminished by concerns about information privacy, or from a lack of trust and acceptance of the new technology, which may arise from inadequate transparency or training. Alternatively, managers may emphasize increases in economic productivity rather than gains in worker safety and health when implementing AI-based systems. === Eliminating hazardous tasks === AI may increase the scope of work tasks where a worker can be removed from a situation that carries risk. In a sense, while traditional automation can replace the functions of a worker's body with a robot, AI effectively replaces the functions of their brain with a computer. Hazards that can be avoided include stress, overwork, musculoskeletal injuries, and boredom. This can expand the range of affected job sectors into white-collar and service sector jobs such as in medicine, finance, and information technology. === Analytics to reduce risk === Machine learning is used for people analytics to make predictions about worker behavior to assist management decision-making, such as hiring and performance assessment. These could also be used to improve worker health. The analytics may be based on inputs such as online activities, monitoring of communications, location tracking, and voice analysis and body language analysis of filmed interviews. For example, sentiment analysis may be used to spot fatigue to prevent overwork. Decision support systems have a similar ability to be used to, for example, prevent industrial disasters or make disaster response more efficient. For manual material handling workers, predictive analytics and artificial intelligence may be used to reduce musculoskeletal injury. Traditional guidelines are based on statistical averages and are geared towards anthropometrically typical humans. The analysis of large amounts of data from wearable sensors may allow real-time, personalized calculation of ergonomic risk and fatigue management, as well as better analysis of the risk associated with specific job roles. Wearable sensors may also enable earlier intervention against exposure to toxic substances than is possible with area or breathing zone testing on a periodic basis. Furthermore, the large data sets generated could improve workplace health surveillance, risk assessment, and research. === Streamlining safety and health workflows === AI has also been used to attempt to make the workplace safety and health workflow more efficient. One example is coding of workers' compensation claims, which are submitted in a prose narrative form and must manually be assigned standardized codes. AI is being investigated to perform this task faster, more cheaply, and with fewer errors. == Hazards == There are several broad aspects of AI that may give rise to specific hazards. The risks depend on implementation rather than the mere presence of AI. Systems using sub-symbolic AI such as machine learning may behave unpredictably and are more prone to inscrutability in their decision-making. This is especially true if a situation is encountered that was not part of the AI's training dataset, and is exacerbated in environments that are less structured. Undesired behavior may also arise from flaws in the system's perception (arising either from within the software or from sensor degradation), knowledge representation and reasoning, or from software bugs. They may arise from improper training, such as a user applying the same algorithm to two problems that do not have the same requirements. Machine learning applied during the design phase may have different implications than that applied at runtime. Systems using symbolic AI are less prone to unpredictable behavior. The use of AI also increases cybersecurity risks relative to platforms that do not use AI, and information privacy concerns about collected data may pose a hazard to workers. === Psychosocial === Psychosocial hazards are those that arise from the way work is designed, organized, and managed, or its economic and social contexts, rather than arising from a physical substance or object. They cause not only psychiatric and psychological outcomes such as occupational burnout, anxiety disorders, and depression, but they can also cause physical injury or illness such as cardiovascular disease or musculoskeletal injury. Many hazards of AI are psychosocial in nature due to its potential to cause changes in work organization, in terms of increasing complexity and interaction between different organizational factors. However, psychosocial risks are often overlooked by designers of advanced manufacturing systems. Einola and Khoreva explore how different organizational groups perceive and interact with AI technologies. Their research shows that successful AI integration depends on human ownership and contextual understanding. They caution against blind technological optimism and stress the importance of tailoring AI use to specific workplace ecosystems. This perspective reinforces the need for inclusive design and transparent implementation strategies. ==== Changes in work practices ==== Over-reliance on AI tools may lead to deskilling of some professions. When AI becomes a substitute for traditional peer collaboration and mentorship, there is a risk of diminishing opportunities for interpersonal skill development and team-based learning. Increased monitoring may lead to micromanagement and thus to stress and anxiety. A perception of surveillance may also lead to stress. Controls for these include consultation with worker groups, extensive testing, and attention to introduced bias. Wearable sensors, activity trackers, and augmented reality may also lead to stress from micromanagement, both for assembly line workers and gig workers. Gig workers also lack the legal protections and rights of formal workers. Newell & Marabelli argue that AI alters power dynamics and employee autonomy, requiring a more nuanced understanding of its social and organizational implications. There is also the risk of people being forced to work at a robot's pace, or to monitor robot performance at nonstandard hours. A 2025 preprint paper based on users' interactions with the AI chatbot Microsoft Copilot identified forty jobs that the author's claimed had high overlaps with the capabilities of AI. Some media outlets used this paper to report on jobs becoming obsolete. Cri

    Read more →
  • Algorithm selection

    Algorithm selection

    Algorithm selection (sometimes also called per-instance algorithm selection or offline algorithm selection) is a meta-algorithmic technique to choose an algorithm from a portfolio on an instance-by-instance basis. It is motivated by the observation that on many practical problems, different algorithms have different performance characteristics. That is, while one algorithm performs well in some scenarios, it performs poorly in others and vice versa for another algorithm. If we can identify when to use which algorithm, we can optimize for each scenario and improve overall performance. This is what algorithm selection aims to do. The only prerequisite for applying algorithm selection techniques is that there exists (or that there can be constructed) a set of complementary algorithms. == Definition == Given a portfolio P {\displaystyle {\mathcal {P}}} of algorithms A ∈ P {\displaystyle {\mathcal {A}}\in {\mathcal {P}}} , a set of instances i ∈ I {\displaystyle i\in {\mathcal {I}}} and a cost metric m : P × I → R {\displaystyle m:{\mathcal {P}}\times {\mathcal {I}}\to \mathbb {R} } , the algorithm selection problem consists of finding a mapping s : I → P {\displaystyle s:{\mathcal {I}}\to {\mathcal {P}}} from instances I {\displaystyle {\mathcal {I}}} to algorithms P {\displaystyle {\mathcal {P}}} such that the cost ∑ i ∈ I m ( s ( i ) , i ) {\displaystyle \sum _{i\in {\mathcal {I}}}m(s(i),i)} across all instances is optimized. == Examples == === Boolean satisfiability problem (and other hard combinatorial problems) === A well-known application of algorithm selection is the Boolean satisfiability problem. Here, the portfolio of algorithms is a set of (complementary) SAT solvers, the instances are Boolean formulas, the cost metric is for example average runtime or number of unsolved instances. So, the goal is to select a well-performing SAT solver for each individual instance. In the same way, algorithm selection can be applied to many other N P {\displaystyle {\mathcal {NP}}} -hard problems (such as mixed integer programming, CSP, AI planning, TSP, MAXSAT, QBF and answer set programming). Competition-winning systems in SAT are SATzilla, 3S and CSHC === Machine learning === In machine learning, algorithm selection is better known as meta-learning. The portfolio of algorithms consists of machine learning algorithms (e.g., Random Forest, SVM, DNN), the instances are data sets and the cost metric is for example the error rate. So, the goal is to predict which machine learning algorithm will have a small error on each data set. == Instance features == The algorithm selection problem is mainly solved with machine learning techniques. By representing the problem instances by numerical features f {\displaystyle f} , algorithm selection can be seen as a multi-class classification problem by learning a mapping f i ↦ A {\displaystyle f_{i}\mapsto {\mathcal {A}}} for a given instance i {\displaystyle i} . Instance features are numerical representations of instances. For example, we can count the number of variables, clauses, average clause length for Boolean formulas, or number of samples, features, class balance for ML data sets to get an impression about their characteristics. === Static vs. probing features === We distinguish between two kinds of features: Static features are in most cases some counts and statistics (e.g., clauses-to-variables ratio in SAT). These features ranges from very cheap features (e.g. number of variables) to very complex features (e.g., statistics about variable-clause graphs). Probing features (sometimes also called landmarking features) are computed by running some analysis of algorithm behavior on an instance (e.g., accuracy of a cheap decision tree algorithm on an ML data set, or running for a short time a stochastic local search solver on a Boolean formula). These feature often cost more than simple static features. === Feature costs === Depending on the used performance metric m {\displaystyle m} , feature computation can be associated with costs. For example, if we use running time as performance metric, we include the time to compute our instance features into the performance of an algorithm selection system. SAT solving is a concrete example, where such feature costs cannot be neglected, since instance features for CNF formulas can be either very cheap (e.g., to get the number of variables can be done in constant time for CNFs in the DIMACs format) or very expensive (e.g., graph features which can cost tens or hundreds of seconds). It is important to take the overhead of feature computation into account in practice in such scenarios; otherwise a misleading impression of the performance of the algorithm selection approach is created. For example, if the decision which algorithm to choose can be made with perfect accuracy, but the features are the running time of the portfolio algorithms, there is no benefit to the portfolio approach. This would not be obvious if feature costs were omitted. == Approaches == === Regression approach === One of the first successful algorithm selection approaches predicted the performance of each algorithm m ^ A : I → R {\displaystyle {\hat {m}}_{\mathcal {A}}:{\mathcal {I}}\to \mathbb {R} } and selected the algorithm with the best predicted performance a r g min A ∈ P m ^ A ( i ) {\displaystyle arg\min _{{\mathcal {A}}\in {\mathcal {P}}}{\hat {m}}_{\mathcal {A}}(i)} for an instance i {\displaystyle i} . === Clustering approach === A common assumption is that the given set of instances I {\displaystyle {\mathcal {I}}} can be clustered into homogeneous subsets and for each of these subsets, there is one well-performing algorithm for all instances in there. So, the training consists of identifying the homogeneous clusters via an unsupervised clustering approach and associating an algorithm with each cluster. A new instance is assigned to a cluster and the associated algorithm selected. A more modern approach is cost-sensitive hierarchical clustering using supervised learning to identify the homogeneous instance subsets. === Pairwise cost-sensitive classification approach === A common approach for multi-class classification is to learn pairwise models between every pair of classes (here algorithms) and choose the class that was predicted most often by the pairwise models. We can weight the instances of the pairwise prediction problem by the performance difference between the two algorithms. This is motivated by the fact that we care most about getting predictions with large differences correct, but the penalty for an incorrect prediction is small if there is almost no performance difference. Therefore, each instance i {\displaystyle i} for training a classification model A 1 {\displaystyle {\mathcal {A}}_{1}} vs A 2 {\displaystyle {\mathcal {A}}_{2}} is associated with a cost | m ( A 1 , i ) − m ( A 2 , i ) | {\displaystyle |m({\mathcal {A}}_{1},i)-m({\mathcal {A}}_{2},i)|} . == Requirements == The algorithm selection problem can be effectively applied under the following assumptions: The portfolio P {\displaystyle {\mathcal {P}}} of algorithms is complementary with respect to the instance set I {\displaystyle {\mathcal {I}}} , i.e., there is no single algorithm A ∈ P {\displaystyle {\mathcal {A}}\in {\mathcal {P}}} that dominates the performance of all other algorithms over I {\displaystyle {\mathcal {I}}} (see figures to the right for examples on complementary analysis). In some application, the computation of instance features is associated with a cost. For example, if the cost metric is running time, we have also to consider the time to compute the instance features. In such cases, the cost to compute features should not be larger than the performance gain through algorithm selection. == Application domains == Algorithm selection is not limited to single domains but can be applied to any kind of algorithm if the above requirements are satisfied. Application domains include: hard combinatorial problems: SAT, Mixed Integer Programming, CSP, AI Planning, TSP, MAXSAT, QBF and Answer Set Programming combinatorial auctions in machine learning, the problem is known as meta-learning software design black-box optimization multi-agent systems numerical optimization linear algebra, differential equations evolutionary algorithms vehicle routing problem power systems For an extensive list of literature about algorithm selection, we refer to a literature overview. == Variants of algorithm selection == === Online selection === Online algorithm selection refers to switching between different algorithms during the solving process. This is useful as a hyper-heuristic. In contrast, offline algorithm selection selects an algorithm for a given instance only once and before the solving process. === Computation of schedules === An extension of algorithm selection is the per-instance algorithm scheduling problem, in which we do not select only one solver, but we select a time budget for each algorithm

    Read more →
  • LTX (text-to-video model)

    LTX (text-to-video model)

    LTX is a family of open source artificial intelligence video foundation models developed by Lightricks, and first released in November 2024. The latest models, LTX-2, create videos based on user prompts. They were preceded by LTX Video, which was released in 2024 as the company's first text-to-video model. LTX-2 is part of the LTX family of video generation models, which form the core technology, alongside LTX Studio, of the LTX ecosystem. == History == === Origins: LTX Video (2024–2025) === In November 2024 Lightricks publicly released its first text-to-video model, LTX Video. It was a 2-billion parameter model, available as open source. In May 2025 Lightricks launched LTXV-13b, a version with 13-billion parameters. Two months later, the model broke the 60 second barrier for generated video. === Release of LTX-2 (2025) === In October 2025 Lightricks announced its latest model, and renamed it LTX-2. The model was described as capable of generating synchronized audio and video at native 4K resolution and up to 50 frames per second (fps), using a variety of conditions and prompts, including text-to-video and image-to-video. Google highlighted the fact that LTX-2 was trained on its infrastructure, and saying it was "The first open source AI video generation model, powered by Google Cloud". Upon its release it was ranked in the top-3 models for image-to-video creation by Artificial Analysis, behind Kling 3.5 by Kling AI and Veo 3.1 by Google. Its text-to-image option was ranked 7th. In addition to its open-source release, Lightricks offers API access to LTX-2, allowing developers to generate videos from text and image prompts through a hosted service without running the model locally. === Open Source Release (2026) === In January 2026, Lightricks officially released the full open-source version of LTX-2, making the model’s complete codebase, weights, and associated tooling publicly available. In March 2026 the company released LTX-2.3, which was accompanied by a desktop video editor enabling the entire model to run locally on consumer hardware. == Technical features == === Advancements over LTX Video === LTX-2 builds upon the LTX Video architecture with several major improvements: Unified audio-video generation producing synchronized dialogue, ambience, and motion Native 4K rendering 50-fps output for cinematic motion Three operational modes (Fast, Pro, Ultra) More efficient diffusion pipelines enabling high fidelity on consumer GPUs === Core capabilities === Text-to-video generation Image-to-video generation Multimodal audiovisual synthesis High-resolution spatial and temporal coherence Configurable quality/performance settings Open-source distribution of weights and datasets == Reception == Initial reception to LTX-2 was broadly positive, with several technology and media outlets highlighting its open-source approach and multimodal capabilities. Open Source For You described LTX-2 as “one of the first AI video systems to combine 4K output, synchronized audio, and an open model release,” noting that it positioned Lightricks as a significant competitor to proprietary systems such as OpenAI's Sora and Google's Veo. IEA Green said that the model “could rewrite the AI filmmaking game,” emphasizing that its 50-fps rendering and unified audio-video generation made it suitable for professional studios and independent creators alike. AI News characterized LTX-2 as a “major step forward in the democratization of cinematic-quality video generation,” praising its consumer-grade hardware efficiency and multi-tier generation modes, while also noting ongoing challenges in long-form temporal stability. FinancialContent reported strong interest among creative agencies, attributing the attention to Lightricks’ decision to release model weights and datasets, which reviewers said enabled “a level of transparency not typically seen in commercial AI video models.” === Benchmarks and rankings === Upon release, LTX-2 ranked third for image-to-video creation in the Artificial Analysis benchmark, behind Kling 3.5 and Veo 3.1, while its text-to-video option ranked seventh. As of early 2026, it was the highest-ranked open-source model in the benchmark. === Limitations === Some early reviewers also pointed out quality limitations. The Ray3 technical review noted occasional inconsistencies in lip-sync and motion tracking during long scenes, though it stated these were “in line with the challenges faced by all current AI video diffusion models” and expected to improve with continued iteration. Like other diffusion-based video generators, LTX-2 can produce artifacts in complex multi-person scenes and may struggle with precise text rendering within generated video.

    Read more →
  • DABUS

    DABUS

    DABUS (Device for the Autonomous Bootstrapping of Unified Sentience) is an artificial intelligence (AI) system created by Stephen Thaler. It reportedly conceived of two novel products — a food container constructed using fractal geometry, which enables rapid reheating, and a flashing beacon for attracting attention in an emergency. The filing of patent applications designating DABUS as inventor has led to decisions by patent offices and courts on whether a patent can be granted for an invention reportedly made by an AI system. == History in different jurisdictions == === Australia === On 17 September 2019, Thaler filed an application to patent a "Food container and devices and methods for attracting enhanced attention," naming DABUS as the inventor. On 21 September 2020, IP Australia found that section 15(1) of the Patents Act 1990 (Cth) is inconsistent with an artificial intelligence machine being treated as an inventor, and Thaler's application had lapsed. Thaler sought judicial review, and on 30 July 2021, the Federal Court set aside IP Australia's decision and ordered IP Australia to reconsider the application. On 13 April 2022, the Full Court of the Federal Court set aside that decision, holding that only a natural person can be an inventor for the purposes of the Patents Act 1990 (Cth) and the Patents Regulations 1991 (Cth), and that such an inventor must be identified for any person to be entitled to a grant of a patent. On 11 November 2022, Thaler was refused special leave to appeal to the High Court. === European Patent Office === On 17 October 2018 and 7 November 2018, Thaler filed two European patent applications with the European Patent Office. The first claimed invention was a "Food Container" and the second was "Devices and Methods for Attracting Enhanced Attention." On 27 January 2020, the EPO rejected the applications on the grounds that the application listed an AI system named DABUS, and not a human, as the inventor, based on Article 81 and Rule 19(1) of the European Patent Convention (EPC). On 21 December 2021, the Board of Appeal of the EPO dismissed Thaler's appeal from the EPO's primary decision. The Board of Appeal confirmed that "under the EPC the designated inventor has to be a person with legal capacity. This is not merely an assumption on which the EPC was drafted. It is the ordinary meaning of the term inventor." === United Kingdom === Similar applications were filed by Thaler to the United Kingdom Intellectual Property Office on 17 October and 7 November 2018. The Office asked Thaler to file statements of inventorship and of right of grant to a patent (Patent Form 7) in respect of each invention within 16 months of the filing date. Thaler filed those forms naming DABUS as the inventor and explaining in some detail why he believed that machines should be regarded as inventors in the circumstances. His application was rejected on the grounds that: (1) naming a machine as inventor did not meet the requirements of the Patents Act 1977; and (2) the IPO was not satisfied as to the manner in which Thaler had acquired rights that would otherwise vest in the inventor. Thaler was not satisfied with the decision and asked for a hearing before an official known as the "hearing officer". By a decision dated 4 December 2019 the hearing officer rejected Thaler's appeal. Thaler appealed against the hearing officer's decision to the Patents Court (a specialist court within the Chancery Division of the High Court of England and Wales that determines patent disputes). On 21 September 2020, Mr Justice Marcus Smith upheld the decision of the hearing officer. On 21 September 2021, Thaler's further appeal to the Court of Appeal was dismissed by Arnold LJ and Laing LJ (Birss LJ dissenting). On 20 December 2023, the UK Supreme Court dismissed a further appeal by Thaler. In its judgment, the court held that an "inventor" under the Patents Act 1977 must be a natural person. === United States === The patent applications on the inventions were refused by the USPTO, which held that only natural persons can be named as inventors in a patent application. Thaler first fought this result by filing a complaint under the Administrative Procedure Act alleging that the decision was "arbitrary, capricious, an abuse of discretion and not in accordance with the law; unsupported by substantial evidence, and in excess of Defendants’ statutory authority." A month later on August 19, 2019, Thaler filed a petition with the USPTO as allowed in 37 C.F.R. § 1.181 stating that DABUS should be the inventor. The judge and Thaler agreed in this case that Thaler himself is unable to receive the patent on behalf of DABUS. In their August 5, 2022, Thaler decision, the US Court of Appeals for the Federal Circuit affirmed that only a natural person could be an inventor, which means that the AI that invents any other type of invention is not addressed by the "who" mentioned in the legislation. === New Zealand === On January 31, 2022, the Intellectual Property Office of New Zealand (IPONZ) decided that a patent application (776029) filed by Stephen Thaler was void, on the basis that no inventor was identified on the patent application. IPONZ determined that DABUS could not be "an actual devisor of the invention" as required by the Patents Act 2013, and that this must be a natural person as held by the previous patent offices above. The High Court of New Zealand confirmed the decision in 2023. === South Africa === On 24 June 2021, the South African Companies and Intellectual Property Commission (CIPC) accepted Dr Thaler's Patent Cooperation Treaty, for a patent in respect of inventions generated by DABUS. In July 2021, the CIPC released a notice of issuance for the patent. It is the first patent granted for an AI invention. === Switzerland === On June 26, 2025, the Swiss Federal Administrative Court ruled that artificial intelligence systems such as DABUS cannot be listed as inventors in patent applications. The court upheld the existing practice of the Swiss Federal Institute of Intellectual Property (IPI), which requires that only natural persons can be recognized as inventors under Swiss patent law. The case concerned a patent application, which sought to designate DABUS as the sole inventor of a food container designed with a fractal geometry to enhance heat distribution. The IPI had rejected the application, arguing that both the absence of a human inventor and the attribution of inventorship to an AI system were inadmissible. While the court dismissed Thaler's main request, it accepted a subsidiary request: if a human applicant recognizes and files a patent based on an AI-generated invention, that person may be considered the inventor. As a result, the application may proceed with Thaler listed as the inventor. The decision (B-2532/2024) can still be appealed to the Swiss Federal Supreme Court.

    Read more →
  • Hallucination (artificial intelligence)

    Hallucination (artificial intelligence)

    In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called bullshitting, confabulation, or delusion) is a response generated by AI that contains false or misleading information presented as fact. This term draws a loose analogy with human psychology, where a hallucination typically involves false percepts. For example, a chatbot powered by large language models (LLMs), like ChatGPT, may embed plausible-sounding random falsehoods within its generated content. Detecting and mitigating errors and hallucinations pose significant challenges for practical deployment and reliability of LLMs in high-stakes scenarios, such as chip design, supply chain logistics, and medical diagnostics. Some software engineers and statisticians have criticized the specific term "AI hallucination" for unreasonably anthropomorphizing computers. Symbolic artificial intelligence models generally do not produce hallucinations, unlike large language models. == Term == === Origin === Since the 1980s, the term "hallucination" has been used in computer vision with a positive connotation to describe the process of adding detail to an image. For example, the task of generating high-resolution face images from low-resolution inputs is called face hallucination. The first documented use of the term "hallucination" in this sense is in the PhD thesis of Eric Mjolsness in 1986. A notable work is the face hallucination algorithm by Simon Baker and Takeo Kanade published in 1999. In the 2000s, hallucinations were described in statistical machine translation as a failure mode. Since the 2010s, the term has undergone a semantic shift to signify the generation of factually incorrect or misleading outputs by AI systems in tasks like machine translation and object detection. In 2015, hallucinations were identified in visual semantic role labeling tasks by Saurabh Gupta and Jitendra Malik. In 2015, computer scientist Andrej Karpathy used the term "hallucinated" in a blog post to describe his recurrent neural network (RNN) language model generating an incorrect citation link. In 2017, Google researchers used the term to describe the responses generated by neural machine translation (NMT) models when they are not related to the source text, and in 2018, the term was used in computer vision to describe instances where non-existent objects are erroneously detected because of adversarial attacks. In July 2021, Meta warned during its release of BlenderBot 2 that the system is prone to "hallucinations", which Meta defined as "confident statements that are not true". Following OpenAI's ChatGPT release in beta version in November 2022, some users complained that such chatbots often seem to pointlessly embed plausible-sounding random falsehoods within their generated content. Many news outlets, including The New York Times, started to use the term "hallucinations" to describe these models' frequently incorrect or inconsistent responses. In 2023, the Cambridge dictionary updated its definition of hallucination to include this new sense specific to the field of AI. Some researchers have highlighted a lack of consistency in how the term is used, but also identified several alternative terms in the literature, such as confabulations, fabrications, and factual errors. === Definitions and alternatives === Uses, definitions and characterizations of the term "hallucination" in the context of LLMs include: "a tendency to invent facts in moments of uncertainty" (OpenAI, May 2023) "a model's logical mistakes" (OpenAI, May 2023) "fabricating information entirely, but behaving as if spouting facts" (CNBC, May 2023) "making up information" (The Verge, February 2023) "probability distributions" (in scientific contexts) Journalist Benj Edwards, in Ars Technica, writes that the term "hallucination" is controversial, but that some form of metaphor remains necessary; Edwards suggests "confabulation" as an analogy for processes that involve "creative gap-filling". In July 2024, a White House report on fostering public trust in AI research mentioned hallucinations only in the context of reducing them. Notably, when acknowledging David Baker's Nobel Prize-winning work with AI-generated proteins, the Nobel committee avoided the term entirely, instead referring to "imaginative protein creation". Hicks, Humphries, and Slater, in their article in Ethics and Information Technology, argue that the output of LLMs is "bullshit" under Harry Frankfurt's definition of the term, and that the models are "in an important way indifferent to the truth of their outputs", with true statements only accidentally true, and false ones accidentally false. Some researchers also use the derogatory term "botshit", often referring to uncritical use of AI. === Criticism === In the scientific community, some researchers avoid the term "hallucination", seeing it as potentially misleading. It has been criticized by Usama Fayyad, executive director of the Institute for Experimental Artificial Intelligence at Northeastern University, on the grounds that it misleadingly personifies large language models and is vague. Mary Shaw said, "The current fashion for calling generative AI's errors 'hallucinations' is appalling. It anthropomorphizes the software, and it spins actual errors as somehow being idiosyncratic quirks of the system even when they're objectively incorrect." In Salon, statistician Gary Smith argues that LLMs "do not understand what words mean" and consequently that the term "hallucination" unreasonably anthropomorphizes the machine. Murray Shanahan argues that anthropomorphic framing of LLM capabilities, including terms like "hallucination", encourages users and researchers to attribute cognitive processes to systems that operate through statistical pattern completion, and advocates for more careful linguistic practices when discussing LLM behavior. Kristina Šekrst argues that applying psychological vocabulary to LLM outputs obscures the difference between the appearance of mental properties and their genuine presence. Förster & Skop assert that tech companies use the hallucination metaphor to anthropomorphize models and deflect responsibility for non-factual outputs. Some see the AI outputs not as illusory but as prospective—that is, having some chance of being true, similar to early-stage scientific conjectures. The term has also been criticized for its association with psychedelic drug experiences. == In natural language generation == In natural language generation, there are several reasons why natural language models hallucinate: === Hallucination from data === Hallucinations can stem from incomplete, inaccurate or unrepresentative data sets. === Modeling-related causes === The pre-training of generative pretrained transformers (GPT) involves predicting the next word. It incentivizes GPT models to "give a guess" about what the next word is, even when they lack information. Some researchers take an anthropomorphic perspective and posit that hallucinations arise from a tension between novelty and usefulness. For instance, Amabile and Pratt define human creativity as the production of novel and useful ideas. By extension, a focus on novelty in machine creativity can lead to the production of original but inaccurate responses—that is, falsehoods—whereas a focus on usefulness may result in memorized content lacking originality. By 2022, newspapers such as The New York Times expressed concern that, as the adoption of bots based on large language models continued to grow, unwarranted user confidence in bot output could lead to problems. === Interpretability research === In 2025, interpretability research by Anthropic on the LLM Claude identified internal circuits that cause it to decline to answer questions unless it knows the answer. By default, the circuit is active and the LLM doesn't answer. When the LLM has sufficient information, these circuits are inhibited and the LLM answers the question. Hallucinations were found to occur when this inhibition happens incorrectly, such as when Claude recognizes a name but lacks sufficient information about that person, causing it to generate plausible but untrue responses. === Examples === On 15 November 2022, researchers from Meta AI published Galactica, designed to "store, combine and reason about scientific knowledge". Content generated by Galactica came with the warning: "Outputs may be unreliable! Language Models are prone to hallucinate text." In one case, when asked to draft a paper on creating avatars, Galactica cited a fictitious paper from a real author who works in the relevant area. Meta withdrew Galactica on 17 November due to offensiveness and inaccuracy. OpenAI's ChatGPT, released in beta version to the public on November 30, 2022, was based on the foundation model GPT-3.5 (a revision of GPT-3). Professor Ethan Mollick of Wharton called it an "omniscient, eager-to-please intern who sometimes lies to you". Data scientist Teresa Kuba

    Read more →
  • Sample complexity

    Sample complexity

    The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function. More precisely, the sample complexity is the number of training-samples that we need to supply to the algorithm, so that the function returned by the algorithm is within an arbitrarily small error of the best possible function, with probability arbitrarily close to 1. There are two variants of sample complexity: The weak variant fixes a particular input-output distribution; The strong variant takes the worst-case sample complexity over all input-output distributions. The No free lunch theorem, discussed below, proves that, in general, the strong sample complexity is infinite, i.e. that there is no algorithm that can learn the globally-optimal target function using a finite number of training samples. However, if we are only interested in a particular class of target functions (e.g., only linear functions) then the sample complexity is finite, and it depends linearly on the VC dimension on the class of target functions. == Definition == Let X {\displaystyle X} be a space which we call the input space, and Y {\displaystyle Y} be a space which we call the output space, and let Z {\displaystyle Z} denote the product X × Y {\displaystyle X\times Y} . For example, in the setting of binary classification, X {\displaystyle X} is typically a finite-dimensional vector space and Y {\displaystyle Y} is the set { − 1 , 1 } {\displaystyle \{-1,1\}} . Fix a hypothesis space H {\displaystyle {\mathcal {H}}} of functions h : X → Y {\displaystyle h\colon X\to Y} . A learning algorithm over H {\displaystyle {\mathcal {H}}} is a computable map from Z {\displaystyle Z} to H {\displaystyle {\mathcal {H}}} . In other words, it is an algorithm that takes as input a finite sequence of training samples and outputs a function from X {\displaystyle X} to Y {\displaystyle Y} . Typical learning algorithms include empirical risk minimization, without or with Tikhonov regularization. Fix a loss function L : Y × Y → R ≥ 0 {\displaystyle {\mathcal {L}}\colon Y\times Y\to \mathbb {R} _{\geq 0}} , for example, the square loss L ( y , y ′ ) = ( y − y ′ ) 2 {\displaystyle {\mathcal {L}}(y,y')=(y-y')^{2}} , where h ( x ) = y ′ {\displaystyle h(x)=y'} . For a given distribution ρ {\displaystyle \rho } on X × Y {\displaystyle X\times Y} , the expected risk of a hypothesis (a function) h ∈ H {\displaystyle h\in {\mathcal {H}}} is E ( h ) := E ρ [ L ( h ( x ) , y ) ] = ∫ X × Y L ( h ( x ) , y ) d ρ ( x , y ) {\displaystyle {\mathcal {E}}(h):=\mathbb {E} _{\rho }[{\mathcal {L}}(h(x),y)]=\int _{X\times Y}{\mathcal {L}}(h(x),y)\,d\rho (x,y)} In our setting, we have h = A ( S n ) {\displaystyle h={\mathcal {A}}(S_{n})} , where A {\displaystyle {\mathcal {A}}} is a learning algorithm and S n = ( ( x 1 , y 1 ) , … , ( x n , y n ) ) ∼ ρ n {\displaystyle S_{n}=((x_{1},y_{1}),\ldots ,(x_{n},y_{n}))\sim \rho ^{n}} is a sequence of vectors which are all drawn independently from ρ {\displaystyle \rho } . Define the optimal risk E H ∗ = inf h ∈ H E ( h ) . {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}={\underset {h\in {\mathcal {H}}}{\inf }}{\mathcal {E}}(h).} Set h n = A ( S n ) {\displaystyle h_{n}={\mathcal {A}}(S_{n})} , for each sample size n {\displaystyle n} . h n {\displaystyle h_{n}} is a random variable and depends on the random variable S n {\displaystyle S_{n}} , which is drawn from the distribution ρ n {\displaystyle \rho ^{n}} . The algorithm A {\displaystyle {\mathcal {A}}} is called consistent if E ( h n ) {\displaystyle {\mathcal {E}}(h_{n})} probabilistically converges to E H ∗ {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}} . In other words, for all ϵ , δ > 0 {\displaystyle \epsilon ,\delta >0} , there exists a positive integer N {\displaystyle N} , such that, for all sample sizes n ≥ N {\displaystyle n\geq N} , we have Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] < δ . {\displaystyle \Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]<\delta .} The sample complexity of A {\displaystyle {\mathcal {A}}} is then the minimum N {\displaystyle N} for which this holds, as a function of ρ , ϵ {\displaystyle \rho ,\epsilon } , and δ {\displaystyle \delta } . We write the sample complexity as N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} to emphasize that this value of N {\displaystyle N} depends on ρ , ϵ {\displaystyle \rho ,\epsilon } , and δ {\displaystyle \delta } . If A {\displaystyle {\mathcal {A}}} is not consistent, then we set N ( ρ , ϵ , δ ) = ∞ {\displaystyle N(\rho ,\epsilon ,\delta )=\infty } . If there exists an algorithm for which N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is finite, then we say that the hypothesis space H {\displaystyle {\mathcal {H}}} is learnable. In others words, the sample complexity N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} defines the rate of consistency of the algorithm: given a desired accuracy ϵ {\displaystyle \epsilon } and confidence δ {\displaystyle \delta } , one needs to sample N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} data points to guarantee that the risk of the output function is within ϵ {\displaystyle \epsilon } of the best possible, with probability at least 1 − δ {\displaystyle 1-\delta } . In probably approximately correct (PAC) learning, one is concerned with whether the sample complexity is polynomial, that is, whether N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is bounded by a polynomial in 1 / ϵ {\displaystyle 1/\epsilon } and 1 / δ {\displaystyle 1/\delta } . If N ( ρ , ϵ , δ ) {\displaystyle N(\rho ,\epsilon ,\delta )} is polynomial for some learning algorithm, then one says that the hypothesis space H {\displaystyle {\mathcal {H}}} is PAC-learnable. This is a stronger notion than being learnable. == Unrestricted hypothesis space: infinite sample complexity == One can ask whether there exists a learning algorithm so that the sample complexity is finite in the strong sense, that is, there is a bound on the number of samples needed so that the algorithm can learn any distribution over the input-output space with a specified target error. More formally, one asks whether there exists a learning algorithm A {\displaystyle {\mathcal {A}}} , such that, for all ϵ , δ > 0 {\displaystyle \epsilon ,\delta >0} , there exists a positive integer N {\displaystyle N} such that for all n ≥ N {\displaystyle n\geq N} , we have sup ρ ( Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] ) < δ , {\displaystyle \sup _{\rho }\left(\Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]\right)<\delta ,} where h n = A ( S n ) {\displaystyle h_{n}={\mathcal {A}}(S_{n})} , with S n = ( ( x 1 , y 1 ) , … , ( x n , y n ) ) ∼ ρ n {\displaystyle S_{n}=((x_{1},y_{1}),\ldots ,(x_{n},y_{n}))\sim \rho ^{n}} as above. The No Free Lunch Theorem says that without restrictions on the hypothesis space H {\displaystyle {\mathcal {H}}} , this is not the case, i.e., there always exist "bad" distributions for which the sample complexity is arbitrarily large. Thus, in order to make statements about the rate of convergence of the quantity sup ρ ( Pr ρ n [ E ( h n ) − E H ∗ ≥ ε ] ) , {\displaystyle \sup _{\rho }\left(\Pr _{\rho ^{n}}[{\mathcal {E}}(h_{n})-{\mathcal {E}}_{\mathcal {H}}^{}\geq \varepsilon ]\right),} one must either constrain the space of probability distributions ρ {\displaystyle \rho } , e.g. via a parametric approach, or constrain the space of hypotheses H {\displaystyle {\mathcal {H}}} , as in distribution-free approaches. == Restricted hypothesis space: finite sample-complexity == The latter approach leads to concepts such as VC dimension and Rademacher complexity which control the complexity of the space H {\displaystyle {\mathcal {H}}} . A smaller hypothesis space introduces more bias into the inference process, meaning that E H ∗ {\displaystyle {\mathcal {E}}_{\mathcal {H}}^{}} may be greater than the best possible risk in a larger space. However, by restricting the complexity of the hypothesis space it becomes possible for an algorithm to produce more uniformly consistent functions. This trade-off leads to the concept of regularization. It is a theorem from VC theory that the following three statements are equivalent for a hypothesis space H {\displaystyle {\mathcal {H}}} : H {\displaystyle {\mathcal {H}}} is PAC-learnable. The VC dimension of H {\displaystyle {\mathcal {H}}} is finite. H {\displaystyle {\mathcal {H}}} is a uniform Glivenko-Cantelli class. This gives a way to prove that certain hypothesis spaces are PAC learnable, and by extension, learnable. === An example of a PAC-learnable hypothesis space === X = R d , Y = { − 1 , 1 } {\displaystyle X=\mathbb {R} ^{d},Y=\{-1,1\}} , and let H {\displaystyle {\mathcal {H}}} be the space of affine functions on X {\displaystyle X} , that is, functions of the form x ↦ ⟨ w , x ⟩ + b {\displaystyle x\mapsto \langl

    Read more →
  • Randonautica

    Randonautica

    Randonautica (a portmanteau of "random" + "nautica") is an app launched on February 22, 2020 founded by Auburn Salcedo and Joshua Lengfelder. It randomly generates coordinates that encourages the user to explore their local area and report what is found. According to its creators, the app is "an attractor of strange things," letting one choose specific coordinates based on a specific theme. It gained controversy after a report of two teenagers coincidentally finding a corpse while using the application. == Overview == The app, which creators claim to be inspired by chaos theory and Guy Debord's Theory of the Dérive, offers its users three types of coordinates to choose from: an attractor, a void, or an anomaly. The app has a cult following on YouTube and TikTok and there is a subreddit made by the creators for users of the app. == History == 29-year-old circus performer Joshua Lengfelder discovered a bot called Fatum Project in a fringe science chat group on Telegram in January 2019. According to The New York Times, "He absorbed the project’s theories about how random exploration could break people out of their predetermined realities, and how people could influence random outcomes with their minds." Lengfelder then created a Telegram bot using Fatum Project's code, generating coordinates. He then created the subreddit r/randonauts in March. In October, developer Simon Nishi McCorkindale made the bot's webpage. With the help of Auburn Salcedo, chief executive of a TV agency, both created Randonauts LLC. Salcedo became the chief operating officer while Lengfelder was the CEO. The app, called Randonautica, was launched on February 22, 2020. Later the same year the app and back-end got completely overhauled by a new team of developers and got a more visual and friendlier design and logo. In April 2022 Lengfelder exited Randonauts LLC and Auburn Salcedo became CEO. == Reception == The app has as many as 10.8 million users as of July 2020, gaining popularity amid the COVID-19 pandemic in the United States as restrictions have been lightened. Emma Chamberlain made a YouTube video about the app that helped increase its following. i-D reported that the hashtag #randonautica has gained 176.5 million views on TikTok, although it has not marketed itself yet. === Controversy === With the app's popularity, users started reporting coincidences which many find unsettling. The majority of reports were from TikTok and Reddit, as well as Telegram. The most notable controversy involved a group of people heading to a beach in Duwamish Head, Puget Sound, West Seattle per the app, where they found a bag with two dead bodies, a 27-year-old male and a 36-year-old female, as reported by the Seattle Police homicide detectives. In August 2020, police arrested and charged their landlord, Michael Lee Dudley, in connection with the murders. In March 2021, Dudley was denied bail while other people were under suspicion of aiding Dudley in the dismemberment and disposal of the bodies, but no one else had been charged. This has caused speculation that the app has an intended, puzzle-like theme. However, Lengfelder stated that it is "a shocking coincidence." Salcedo called the videos fake, and that "It’s so hard to manage, because people are really taking creative liberties after seeing how much traction the app is getting in that fear factor." In 2022, Michael Dudley was convicted of second degree murder for killing both victims, who were identified as Jessica Lewis and Austin Wenner. He was sentenced to 46 years in prison the following year. In their questions page, Randonautica's creators have said that if the app generates coordinates inside a private property, it is a violation of their terms and conditions to trespass. In addition, Randonautica has also received allegations that the app is used for human trafficking, which its creators have denied, saying that data collected by the app are anonymous. It also ensured that the app is not designed to violate religious customs, saying that "the app is simply a tool. Just as a knife can be used either to prepare dinner or to cut somebody."

    Read more →
  • Prompt engineering

    Prompt engineering

    Prompt engineering is the process of structuring natural language inputs (known as prompts) to produce specified outputs from a generative artificial intelligence (GenAI) model. Context engineering is the related area of software engineering that focuses on the management of non-prompt contexts supplied to the GenAI model, such as metadata, API tools, and tokens. It can also be defined as the practice of designing and refining input instructions given to a generative AI model to produce more accurate, relevant, or useful outputs. Effective prompt engineering involves understanding how a model interprets language, and may include techniques such as few-shot prompting, chain-of-thought prompting, and role assignment. It is increasingly considered a skill for working with large language models (LLMs) in both research and professional contexts. During the 2020s AI boom, prompt engineering became regarded as a business capability across corporations and industries. Employees with the title prompt engineer were hired to create prompts that would increase productivity and efficacy, although the individual title has since lost traction amid AI models that produce better prompts than humans and corporate training in prompting for general employees. Common prompting techniques include multi-shot, chain-of-thought, and tree-of-thought prompting, as well as the use of assigning roles to the model. Automated prompt generation methods, such as retrieval-augmented generation (RAG), provide for greater accuracy and a wider scope of functions for prompt engineers. Prompt injection is a type of cybersecurity attack that targets machine learning models through malicious prompts. == Terminology == The Oxford English Dictionary defines prompt engineering as "The action or process of formulating and refining prompts for an artificial intelligence program, algorithm, etc., in order to optimize its output or to achieve a desired outcome; the discipline or profession concerned with this." In 2023, prompt ("an instruction given to an artificial intelligence program, algorithm, etc., which determines or influences the content it generates") was the runner-up to Oxford's word of the year. === Prompt === A prompt is some natural language text that describes and prescribes the task that an artificial intelligence (AI) should perform. A prompt for a text-to-text language model can be a query, a command, or a longer statement referencing context, instructions, and conversation history. The process of prompt engineering may involve designing clear queries, refining wording, providing relevant context, specifying the style of output, and assigning a character for the AI to mimic in order to guide the model toward more accurate, useful, and consistent responses. When communicating with a text-to-image or a text-to-audio model, a typical prompt contains a description of a desired output such as "a high-quality photo of an astronaut riding a horse" or "Lo-fi slow BPM electro chill with organic samples". Prompt engineering may be applied to text-to-image models to achieve a desired subject, style, layout, lighting, and aesthetic. === Techniques === Common terms used to describe various specific prompt engineering techniques include chain-of-thought, tree-of-thought, and retrieval-augmented generation (RAG). A 2024 survey of the field identified over 50 distinct text-based prompting techniques, 40 multimodal variants, and a vocabulary of 33 terms used across prompting research, highlighting a present lack of standardised terminology for prompt engineering. Vibe coding is an AI-assisted software development method where a user prompts an LLM with a description of what they want and lets it generate or edit the code. In 2025, "vibe coding" was the Collins Dictionary word of the year. === Context engineering === Context engineering is a related process that focuses on the context elements that accompany user prompts, which include system instructions, retrieved knowledge, tool definitions, conversation summaries, and task metadata. Context engineering is performed to improve reliability, provenance and token efficiency in production LLM systems. The concept emphasises operational practices such as token budgeting, provenance tags, versioning of context artifacts, observability (logging which context was supplied), and context regression tests to ensure that changes to supplied context do not silently alter system behaviour. == Rationale == Research has found that the performance of large language models (LLMs) is highly sensitive to choices such as the ordering of examples, the quality of demonstration labels, and even small variations in phrasing. In some cases, reordering examples in a prompt produced accuracy shifts of more than 40 percent. === In-context learning === A model's ability to temporarily learn from prompts is known as in-context learning. In-context learning is an emergent ability of large language models. It is an emergent property of model scale, meaning that breaks in scaling laws occur, leading to its efficacy increasing at a different rate in larger models than in smaller models. Unlike training and fine-tuning, which produce lasting changes, in-context learning is temporary. Training models to perform in-context learning can be viewed as a form of meta-learning, or "learning to learn". === Prompting to estimate model sensitivity === Research consistently demonstrates that LLMs are highly sensitive to subtle variations in prompt formatting, structure, and linguistic properties. Some studies have shown up to 76 accuracy points across formatting changes in few-shot settings. Linguistic features significantly influence prompt effectiveness—such as morphology, syntax, and lexico-semantic changes—which meaningfully enhance task performance across a variety of tasks. Clausal syntax, for example, improves consistency and reduces uncertainty in knowledge retrieval. This sensitivity persists even with larger model sizes, additional few-shot examples, or instruction tuning. To address sensitivity of models and make them more robust, several evaluative methods have been proposed. FormatSpread facilitates systematic analysis by evaluating a range of plausible prompt formats, offering a more comprehensive performance interval. Similarly, PromptEval estimates performance distributions across diverse prompts, enabling robust metrics such as performance quantiles and accurate evaluations under constrained budgets. == Prompting techniques == === Multi-shot === A prompt may include a few examples for a model to learn from in context, an approach called few-shot learning. For example, the prompt may ask the model to complete "maison → house, chat → cat, chien →", with the expected response being dog. === Chain-of-thought === Chain-of-thought (CoT) prompting is a technique that allows large language models (LLMs) to solve a problem as a series of intermediate steps before giving a final answer. In 2022, Google Brain reported that chain-of-thought prompting improves reasoning ability by inducing the model to answer a multi-step problem with steps of reasoning that mimic a train of thought. Chain-of-thought techniques were developed to help LLMs handle multi-step reasoning tasks, such as arithmetic or commonsense reasoning questions. When applied to PaLM, a 540 billion parameter language model, according to Google, CoT prompting significantly aided the model, allowing it to perform comparably with task-specific fine-tuned models on several tasks, achieving state-of-the-art results at the time on the GSM8K mathematical reasoning benchmark. It is possible to fine-tune models on CoT reasoning datasets to enhance this capability further and stimulate better interpretability. As originally proposed by Google, each CoT prompt is accompanied by a set of input/output examples—called exemplars—to demonstrate the desired model output, making it a few-shot prompting technique. However, according to a later paper from researchers at Google and the University of Tokyo, simply appending the words "Let's think step-by-step" was also effective, which allowed for CoT to be employed as a zero-shot technique. ==== Self-consistency ==== Self-consistency performs several chain-of-thought rollouts, then selects the most commonly reached conclusion out of all the rollouts. === Tree-of-thought === Tree-of-thought prompting generalizes chain-of-thought by generating multiple lines of reasoning in parallel, with the ability to backtrack or explore other paths. It can use tree search algorithms like breadth-first, depth-first, or beam. === Text-to-image prompting === In 2022, text-to-image models like DALL-E 2, Stable Diffusion, and Midjourney were released to the public. These models take text prompts as input and use them to generate images. Early text-to-image models typically do not understand negation, grammar and sentence structure in the same way as large language models, and may thus requi

    Read more →
  • Matrix regularization

    Matrix regularization

    In the field of statistical learning theory, matrix regularization generalizes notions of vector regularization to cases where the object to be learned is a matrix. The purpose of regularization is to enforce conditions, for example sparsity or smoothness, that can produce stable predictive functions. For example, in the more common vector framework, Tikhonov regularization optimizes over min x ‖ A x − y ‖ 2 + λ ‖ x ‖ 2 {\displaystyle \min _{x}\left\|Ax-y\right\|^{2}+\lambda \left\|x\right\|^{2}} to find a vector x {\displaystyle x} that is a stable solution to the regression problem. When the system is described by a matrix rather than a vector, this problem can be written as min X ‖ A X − Y ‖ 2 + λ ‖ X ‖ 2 , {\displaystyle \min _{X}\left\|AX-Y\right\|^{2}+\lambda \left\|X\right\|^{2},} where the vector norm enforcing a regularization penalty on x {\displaystyle x} has been extended to a matrix norm on X {\displaystyle X} . Matrix regularization has applications in matrix completion, multivariate regression, and multi-task learning. Ideas of feature and group selection can also be extended to matrices, and these can be generalized to the nonparametric case of multiple kernel learning. == Basic definition == Consider a matrix W {\displaystyle W} to be learned from a set of examples, S = ( X i t , y i t ) {\displaystyle S=(X_{i}^{t},y_{i}^{t})} , where i {\displaystyle i} goes from 1 {\displaystyle 1} to n {\displaystyle n} , and t {\displaystyle t} goes from 1 {\displaystyle 1} to T {\displaystyle T} . Let each input matrix X i {\displaystyle X_{i}} be ∈ R D T {\displaystyle \in \mathbb {R} ^{DT}} , and let W {\displaystyle W} be of size D × T {\displaystyle D\times T} . A general model for the output y {\displaystyle y} can be posed as y i t = ⟨ W , X i t ⟩ F , {\displaystyle y_{i}^{t}=\left\langle W,X_{i}^{t}\right\rangle _{F},} where the inner product is the Frobenius inner product. For different applications the matrices X i {\displaystyle X_{i}} will have different forms, but for each of these the optimization problem to infer W {\displaystyle W} can be written as min W ∈ H E ( W ) + R ( W ) , {\displaystyle \min _{W\in {\mathcal {H}}}E(W)+R(W),} where E {\displaystyle E} defines the empirical error for a given W {\displaystyle W} , and R ( W ) {\displaystyle R(W)} is a matrix regularization penalty. The function R ( W ) {\displaystyle R(W)} is typically chosen to be convex and is often selected to enforce sparsity (using ℓ 1 {\displaystyle \ell ^{1}} -norms) and/or smoothness (using ℓ 2 {\displaystyle \ell ^{2}} -norms). Finally, W {\displaystyle W} is in the space of matrices H {\displaystyle {\mathcal {H}}} with Frobenius inner product ⟨ … ⟩ F {\displaystyle \langle \dots \rangle _{F}} . == General applications == === Matrix completion === In the problem of matrix completion, the matrix X i t {\displaystyle X_{i}^{t}} takes the form X i t = e t ⊗ e i ′ , {\displaystyle X_{i}^{t}=e_{t}\otimes e_{i}',} where ( e t ) t {\displaystyle (e_{t})_{t}} and ( e i ′ ) i {\displaystyle (e_{i}')_{i}} are the canonical basis in R T {\displaystyle \mathbb {R} ^{T}} and R D {\displaystyle \mathbb {R} ^{D}} . In this case the role of the Frobenius inner product is to select individual elements w i t {\displaystyle w_{i}^{t}} from the matrix W {\displaystyle W} . Thus, the output y {\displaystyle y} is a sampling of entries from the matrix W {\displaystyle W} . The problem of reconstructing W {\displaystyle W} from a small set of sampled entries is possible only under certain restrictions on the matrix, and these restrictions can be enforced by a regularization function. For example, it might be assumed that W {\displaystyle W} is low-rank, in which case the regularization penalty can take the form of a nuclear norm. R ( W ) = λ ‖ W ‖ ∗ = λ ∑ i | σ i | , {\displaystyle R(W)=\lambda \left\|W\right\|_{}=\lambda \sum _{i}\left|\sigma _{i}\right|,} where σ i {\displaystyle \sigma _{i}} , with i {\displaystyle i} from 1 {\displaystyle 1} to min D , T {\displaystyle \min D,T} , are the singular values of W {\displaystyle W} . === Multivariate regression === Models used in multivariate regression are parameterized by a matrix of coefficients. In the Frobenius inner product above, each matrix X {\displaystyle X} is X i t = e t ⊗ x i {\displaystyle X_{i}^{t}=e_{t}\otimes x_{i}} such that the output of the inner product is the dot product of one row of the input with one column of the coefficient matrix. The familiar form of such models is Y = X W + b {\displaystyle Y=XW+b} Many of the vector norms used in single variable regression can be extended to the multivariate case. One example is the squared Frobenius norm, which can be viewed as an ℓ 2 {\displaystyle \ell ^{2}} -norm acting either entrywise, or on the singular values of the matrix: R ( W ) = λ ‖ W ‖ F 2 = λ ∑ i ∑ j | w i j | 2 = λ Tr ⁡ ( W ∗ W ) = λ ∑ i σ i 2 . {\displaystyle R(W)=\lambda \left\|W\right\|_{F}^{2}=\lambda \sum _{i}\sum _{j}\left|w_{ij}\right|^{2}=\lambda \operatorname {Tr} \left(W^{}W\right)=\lambda \sum _{i}\sigma _{i}^{2}.} In the multivariate case the effect of regularizing with the Frobenius norm is the same as the vector case; very complex models will have larger norms, and, thus, will be penalized more. === Multi-task learning === The setup for multi-task learning is almost the same as the setup for multivariate regression. The primary difference is that the input variables are also indexed by task (columns of Y {\displaystyle Y} ). The representation with the Frobenius inner product is then X i t = e t ⊗ x i t . {\displaystyle X_{i}^{t}=e_{t}\otimes x_{i}^{t}.} The role of matrix regularization in this setting can be the same as in multivariate regression, but matrix norms can also be used to couple learning problems across tasks. In particular, note that for the optimization problem min W ‖ X W − Y ‖ 2 2 + λ ‖ W ‖ 2 2 {\displaystyle \min _{W}\left\|XW-Y\right\|_{2}^{2}+\lambda \left\|W\right\|_{2}^{2}} the solutions corresponding to each column of Y {\displaystyle Y} are decoupled. That is, the same solution can be found by solving the joint problem, or by solving an isolated regression problem for each column. The problems can be coupled by adding an additional regularization penalty on the covariance of solutions min W , Ω ‖ X W − Y ‖ 2 2 + λ 1 ‖ W ‖ 2 2 + λ 2 Tr ⁡ ( W T Ω − 1 W ) {\displaystyle \min _{W,\Omega }\left\|XW-Y\right\|_{2}^{2}+\lambda _{1}\left\|W\right\|_{2}^{2}+\lambda _{2}\operatorname {Tr} \left(W^{T}\Omega ^{-1}W\right)} where Ω {\displaystyle \Omega } models the relationship between tasks. This scheme can be used to both enforce similarity of solutions across tasks, and to learn the specific structure of task similarity by alternating between optimizations of W {\displaystyle W} and Ω {\displaystyle \Omega } . When the relationship between tasks is known to lie on a graph, the Laplacian matrix of the graph can be used to couple the learning problems. == Spectral regularization == Regularization by spectral filtering has been used to find stable solutions to problems such as those discussed above by addressing ill-posed matrix inversions (see for example Filter function for Tikhonov regularization). In many cases the regularization function acts on the input (or kernel) to ensure a bounded inverse by eliminating small singular values, but it can also be useful to have spectral norms that act on the matrix that is to be learned. There are a number of matrix norms that act on the singular values of the matrix. Frequently used examples include the Schatten p-norms, with p = 1 or 2. For example, matrix regularization with a Schatten 1-norm, also called the nuclear norm, can be used to enforce sparsity in the spectrum of a matrix. This has been used in the context of matrix completion when the matrix in question is believed to have a restricted rank. In this case the optimization problem becomes: min ‖ W ‖ ∗ subject to W i , j = Y i j . {\displaystyle \min \left\|W\right\|_{}~~{\text{ subject to }}~~W_{i,j}=Y_{ij}.} Spectral Regularization is also used to enforce a reduced rank coefficient matrix in multivariate regression. In this setting, a reduced rank coefficient matrix can be found by keeping just the top n {\displaystyle n} singular values, but this can be extended to keep any reduced set of singular values and vectors. == Structured sparsity == Sparse optimization has become the focus of much research interest as a way to find solutions that depend on a small number of variables (see e.g. the Lasso method). In principle, entry-wise sparsity can be enforced by penalizing the entry-wise ℓ 0 {\displaystyle \ell ^{0}} -norm of the matrix, but the ℓ 0 {\displaystyle \ell ^{0}} -norm is not convex. In practice this can be implemented by convex relaxation to the ℓ 1 {\displaystyle \ell ^{1}} -norm. While entry-wise regularization with an ℓ 1 {\displaystyle \ell ^{1}} -norm will find solutions with a small number of nonzero elements, applying an ℓ 1 {

    Read more →
  • Emergent algorithm

    Emergent algorithm

    An emergent algorithm is an algorithm that exhibits emergent behavior. In essence an emergent algorithm implements a set of simple building block behaviors that when combined exhibit more complex behaviors. One example of this is the implementation of fuzzy motion controllers used to adapt robot movement in response to environmental obstacles. An emergent algorithm has the following characteristics: it achieves predictable global effects it does not require global visibility it does not assume any kind of centralized control it is self-stabilizing Other examples of emergent algorithms and models include cellular automata, artificial neural networks and swarm intelligence systems (ant colony optimization, bees algorithm, etc.).

    Read more →
  • Pixelmator Pro

    Pixelmator Pro

    Pixelmator Pro is a photo, video, and vector graphic editor developed by Apple for macOS and iPadOS as part of its Pixelmator and pro apps platforms and as a part of their Apple Creator Studio suite of applications. Pixelmator Pro relies heavily on technologies from Apple platforms such as Metal, CoreML, Core Image, AVFoundation, GCD, and SwiftUI. == Features == GPU accelerated with Metal 50+ standard image editing tools Layer-based image editor Video editing support Vector graphic support (including SVG support) AI-powered editing features such as background removal ML Super Resolution and Smart Replace Supports a variety of media formats (JPEG, RAW, Apple ProRAW, PSD, PNG, GIF, MP4, HEIF, etc) == Reception == Pixelmator Pro was generally well-received by reviewers who praised its deep use of machine learning, fully macOS-native design, and relatively affordable one-time purchase compared to subscription software such as Adobe Photoshop. Some reviewers criticized that some features are hard to find or hard to use. It was awarded Apple's Mac App of the Year in 2018. Pixelmator Pro does not have support for panorama stitching. == Acquisition by Apple == On November 1, 2024, the Pixelmator Team announced that they were to be acquired by Apple, subject to regulatory approval. Their site promises "There will be no material changes to the Pixelmator Pro, Pixelmator for iOS, and Photomator apps at this time." The acquisition was completed in February 2025. On January 13, 2026, Apple announced that a new version of Pixelmator Pro with AI features would be included in its new Apple Creator Studio subscription, the app would be brought to the iPad and the Mac app would be redesigned with Liquid Glass. == Version history == == Applescript == In 2020 Pixelmator Pro added the ability to leverage Apple's automation language 'AppleScript' to automate many tasks in version 1.8 (Lynx). This enabled simple and advanced automation activities such as image resize, crop, color adjustments, format change, moving layers around, and more advanced actions like removing background, Gaussian blur, text replacement, shadows, color replacement, etc.

    Read more →
  • Alexander Y. Tetelbaum

    Alexander Y. Tetelbaum

    Alexander Y. Tetelbaum (born August 16, 1948) is a Ukrainian American computer scientist, inventor, and academic who has contributed to electronic design automation (EDA) and artificial intelligence (AI) since the late 1960s; and holds 46 U.S. patents in EDA and related fields. Tetelbaum is the founding president of International Solomon University, the first Jewish university in Ukraine, established during a period of renewed efforts to address antisemitism in Ukraine. == Early life and education == He graduated from a Kyiv mathematical high school with a silver medal in 1966. Tetelbaum enrolled at the Kyiv Polytechnic Institute (KPI), now National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute" in 1966, graduating in 1972 with an MS in Electronics with honors. He earned his PhD in Electrical and Computer Engineering from KPI in 1975, with a dissertation on electronic design automation, and his Doctor of Engineering Science in 1986. == Academic career == Tetelbaum began his academic career at KPI in 1973 as a junior scientist, becoming a professor in the Computer and Electrical Engineering Department in 1980. Later, he founded and served as president of International Solomon University in Kyiv from 1991 to 1996, the first Jewish university in Ukraine. The university became a major academic center for computer science and Jewish studies in the post-Soviet era. He was a visiting and adjunct professor at Michigan State University from 1993 to 1996. == Professional career == Tetelbaum worked as an engineer at the Kiev Institute of Cybernetics from 1972 to 1973, and later, he led the Design Automation Lab at Kyiv Polytechnic Institute from 1975 to 1987. In the United States, he served as EDA manager at Silicon Graphics Corporation from 1996 to 1998 and principal engineer at LSI Corporation from 1998 to 2012. He founded and served as CEO of Abelite Design Automation, Inc., from 2012 to 2022. == Contributions in computer science == Tetelbaum has contributed to electronic design automation (EDA) and artificial intelligence (AI) since the 1960s. His early work included methods for EDA, particularly physical design automation and mathematical optimization; and he developed force-directed placement and topological routing methods. Tetelbaum generalized Rent's rule for hierarchical systems and large blocks, proposing a graph-based framework that extends applicability to arbitrary partition sizes with improved accuracy. Additional IEEE and related conference contributions from the mid-1990s include: "Path Search for Complicated Function", 1995 IEEE International Symposium on Circuits and Systems "A Performance-driven Placement Approach of Standard Cells" (International Conference on Intelligent Systems, 1995) "Framework of a New Methodology for Behavioral to Physical Design Linkage" (38th Midwest Symposium on Circuits and Systems, 1996) Statistical timing design and variations Test Methodologies These and other works and patents contributed to timing-driven placement, crosstalk reduction, clock tree synthesis, and interconnect optimization in VLSI design. == Patents == Tetelbaum holds 46 U.S. patents in EDA and related fields. Notable examples include: For the full list of patents, see Justia Patents or Google Patents. == Publications == === Early publications in the Soviet Union === Before the appearance of American books on electronic design automation (EDA), Tetelbaum published several scientific books and monographs on the subject in Russian/Ukrainian. Electronic Design Automation, Kiev: Znanie Publisher, 1975. Planar Design of Electronic Circuits, Kiev: Znanie Publisher, 1977. Formal Design of Computer Systems, Moscow: Sovetskoe Radio, 1979. CAD of Electronic Equipment: Topological Approach, Kiev: Vyssha Shkola, 1980; 2nd ed. 1981. Automated Design of Electronic Circuits (1981) CAD of VLSI Circuits, Kiev: Vyssha Shkola, 1983. Topological Algorithms of Multilayer Printed Circuit Boards Routing, Moscow: Radio i Svyaz, 1983. CAD of VLSI Circuits on Master Slice Chips, Moscow: Radio i Svyaz, 1988. Increasing the Effectiveness of CAD Systems, Kiev: UMKVO, 1991. === Scientific Monographs (English) === Minimum Number of Timing Signoff Corners (2022) Interviewing AI (2026) The AI Debate (2026) New Nostradamus Predictions: 2026: The Next Decade & Beyond (2035–2050+) (2026) For a consolidated record of Tetelbaum's publications, see Alexander Y. Tetelbaum, Wikidata Q4720205. === Other publications === Tetelbaum also published educational books on problem-solving methods: Yes-No Puzzles-Games Puzzle Games for Kids Solving Non-Standard Problems Solving Non-Standard Very Hard Problems Additionally, Tetelbaum published three thrillers: Omerta Operations Executive Director Eruption Yacht Finally, he published his memoir and an entertaining book: Unfinished Equations Artificially Intelligent Humor

    Read more →
  • Attention (machine learning)

    Attention (machine learning)

    In machine learning, attention is a method that determines the importance of each component in a sequence relative to the other components in that sequence. In natural language processing, importance is represented by "soft" weights assigned to each word in a sentence. More generally, attention encodes vectors called token embeddings across a fixed-width sequence that can range from tens to millions of tokens in size. Unlike "hard" weights, which are computed during the backwards training pass, "soft" weights exist only in the forward pass and therefore change with every step of the input. Earlier designs implemented the attention mechanism in a serial recurrent neural network (RNN) language translation system, but a more recent design, namely the transformer, removed the slower sequential RNN and relied more heavily on the faster parallel attention scheme. Inspired by ideas about attention in humans, the attention mechanism was developed to address the weaknesses of using information from the hidden layers of recurrent neural networks. Recurrent neural networks favor information contained in words at the end of a sentence and thus deemed more recent, thereby tending to attenuate the significance and associated predictive weight assigned to information earlier in the sentence. Attention allows a token equal access to any part of a sentence directly, rather than only through the previous state. == History == Additional surveys of the attention mechanism in deep learning are provided by Niu et al. and Soydaner. The major breakthrough came with self-attention, where each element in the input sequence attends to all others, enabling the model to capture global dependencies. This idea was central to the Transformer architecture, which replaced recurrence with attention mechanisms. As a result, Transformers became the foundation for models like BERT, T5 and generative pre-trained transformers (GPT). == Overview == The modern era of machine attention was revitalized by grafting an attention mechanism (Fig 1. orange) to an Encoder-Decoder. Figure 2 shows the internal step-by-step operation of the attention block (A) in Fig 1. === Interpreting attention weights === In translating between languages, alignment is the process of matching words from the source sentence to words of the translated sentence. Networks that perform verbatim translation without regard to word order would show the highest scores along the (dominant) diagonal of the matrix. The off-diagonal dominance shows that the attention mechanism is more nuanced. Consider an example of translating I love you to French. On the first pass through the decoder, 94% of the attention weight is on the first English word I, so the network offers the word je. On the second pass of the decoder, 88% of the attention weight is on the third English word you, so it offers t'. On the last pass, 95% of the attention weight is on the second English word love, so it offers aime. In the I love you example, the second word love is aligned with the third word aime. Stacking soft row vectors together for je, t', and aime yields an alignment matrix: Sometimes, alignment can be multiple-to-multiple. For example, the English phrase look it up corresponds to cherchez-le. Thus, "soft" attention weights work better than "hard" attention weights (setting one attention weight to 1, and the others to 0), as we would like the model to make a context vector consisting of a weighted sum of the hidden vectors, rather than "the best one", as there may not be a best hidden vector. == Variants == Many variants of attention implement soft weights, such as fast weight programmers, or fast weight controllers (1992). A "slow" neural network outputs the "fast" weights of another neural network through outer products. The slow network learns by gradient descent. It was later renamed as "linearized self-attention". Bahdanau-style attention, also referred to as additive attention, Luong-style attention, which is known as multiplicative attention, Early attention mechanisms similar to modern self-attention were proposed using recurrent neural networks. However, the highly parallelizable self-attention was introduced in 2017 and successfully used in the Transformer model, positional attention and factorized positional attention. For convolutional neural networks, attention mechanisms can be distinguished by the dimension on which they operate, namely: spatial attention, channel attention, or combinations. These variants recombine the encoder-side inputs to redistribute those effects to each target output. Often, a correlation-style matrix of dot products provides the re-weighting coefficients. In the figures below, W is the matrix of context attention weights, similar to the formula in Overview section above. == Optimizations == === Flash attention === The size of the attention matrix is proportional to the square of the number of input tokens. Therefore, when the input is long, calculating the attention matrix requires a lot of GPU memory. Flash attention is an implementation that reduces the memory needs and increases efficiency without sacrificing accuracy. It achieves this by partitioning the attention computation into smaller blocks that fit into the GPU's faster on-chip memory, reducing the need to store large intermediate matrices and thus lowering memory usage while increasing computational efficiency. === FlexAttention === FlexAttention is an attention kernel developed by Meta that allows users to modify attention scores prior to softmax and dynamically chooses the optimal attention algorithm. == Applications == Attention is widely used in natural language processing, computer vision, and speech recognition. In NLP, it improves context understanding in tasks like question answering and summarization. In vision, visual attention helps models focus on relevant image regions, enhancing object detection and image captioning. === Attention maps as explanations for vision transformers === From the original paper on vision transformers (ViT), visualizing attention scores as a heat map (called saliency maps or attention maps) has become an important and routine way to inspect the decision making process of ViT models. One can compute the attention maps with respect to any attention head at any layer, while the deeper layers tend to show more semantically meaningful visualization. Attention rollout is a recursive algorithm to combine attention scores across all layers, by computing the dot product of successive attention maps. Because vision transformers are typically trained in a self-supervised manner, attention maps are generally not class-sensitive. When a classification head is attached to the ViT backbone, class-discriminative attention maps (CDAM) combines attention maps and gradients with respect to the class [CLS] token. Some class-sensitive interpretability methods originally developed for convolutional neural networks can be also applied to ViT, such as GradCAM, which back-propagates the gradients to the outputs of the final attention layer. Using attention as basis of explanation for the transformers in language and vision is not without debate. While some pioneering papers analyzed and framed attention scores as explanations, higher attention scores do not always correlate with greater impact on model performances. == Mathematical representation == === Standard scaled dot-product attention === For matrices: Q ∈ R m × d k , K ∈ R n × d k {\displaystyle Q\in \mathbb {R} ^{m\times d_{k}},K\in \mathbb {R} ^{n\times d_{k}}} and V ∈ R n × d v {\displaystyle V\in \mathbb {R} ^{n\times d_{v}}} , the scaled dot-product, or QKV attention, is defined as: Attention ( Q , K , V ) = softmax ( Q K T d k ) V ∈ R m × d v {\displaystyle {\text{Attention}}(Q,K,V)={\text{softmax}}\left({\frac {QK^{T}}{\sqrt {d_{k}}}}\right)V\in \mathbb {R} ^{m\times d_{v}}} where T {\displaystyle {}^{T}} denotes transpose and the softmax function is applied independently to every row of its argument. The matrix Q {\displaystyle Q} contains m {\displaystyle m} queries, while matrices K , V {\displaystyle K,V} jointly contain an unordered set of n {\displaystyle n} key-value pairs. Value vectors in matrix V {\displaystyle V} are weighted using the weights resulting from the softmax operation, so that the rows of the m {\displaystyle m} -by- d v {\displaystyle d_{v}} output matrix are confined to the convex hull of the points in R d v {\displaystyle \mathbb {R} ^{d_{v}}} given by the rows of V {\displaystyle V} . To understand the permutation invariance and permutation equivariance properties of QKV attention, let A ∈ R m × m {\displaystyle A\in \mathbb {R} ^{m\times m}} and B ∈ R n × n {\displaystyle B\in \mathbb {R} ^{n\times n}} be permutation matrices; and D ∈ R m × n {\displaystyle D\in \mathbb {R} ^{m\times n}} an arbitrary matrix. The softmax function is permutation equivariant in the sense that: softmax ( A D B ) = A softmax ( D ) B {\displays

    Read more →