AI Avatar Kids

AI Avatar Kids — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Apprenticeship learning

    Apprenticeship learning

    In artificial intelligence, apprenticeship learning (or learning from demonstration or imitation learning) is the process of learning by observing an expert. It can be viewed as a form of supervised learning, where the training dataset consists of task executions by a demonstration teacher. == Mapping function approach == Mapping methods try to mimic the expert by forming a direct mapping either from states to actions, or from states to reward values. For example, in 2002 researchers used such an approach to teach an AIBO robot basic soccer skills. === Inverse reinforcement learning approach === Inverse reinforcement learning (IRL) is the process of deriving a reward function from observed behavior. While ordinary "reinforcement learning" involves using rewards and punishments to learn behavior, in IRL the direction is reversed, and a robot observes a person's behavior to figure out what goal that behavior seems to be trying to achieve. The IRL problem can be defined as: Given 1) measurements of an agent's behaviour over time, in a variety of circumstances; 2) measurements of the sensory inputs to that agent; 3) a model of the physical environment (including the agent's body): Determine the reward function that the agent is optimizing. IRL researcher Stuart J. Russell proposes that IRL might be used to observe humans and attempt to codify their complex "ethical values", in an effort to create "ethical robots" that might someday know "not to cook your cat" without needing to be explicitly told. The scenario can be modeled as a "cooperative inverse reinforcement learning game", where a "person" player and a "robot" player cooperate to secure the person's implicit goals, despite these goals not being explicitly known by either the person nor the robot. In 2017, OpenAI and DeepMind applied deep learning to the cooperative inverse reinforcement learning in simple domains such as Atari games and straightforward robot tasks such as backflips. The human role was limited to answering queries from the robot as to which of two different actions were preferred. The researchers found evidence that the techniques may be economically scalable to modern systems. Apprenticeship via inverse reinforcement learning (AIRP) was developed by in 2004 Pieter Abbeel, Professor in Berkeley's EECS department, and Andrew Ng, Associate Professor in Stanford University's Computer Science Department. AIRP deals with "Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform". AIRP has been used to model reward functions of highly dynamic scenarios where there is no obvious reward function intuitively. Take the task of driving for example, there are many different objectives working simultaneously - such as maintaining safe following distance, a good speed, not changing lanes too often, etc. This task, may seem easy at first glance, but a trivial reward function may not converge to the policy wanted. One domain where AIRP has been used extensively is helicopter control. While simple trajectories can be intuitively derived, complicated tasks like aerobatics for shows has been successful. These include aerobatic maneuvers like - in-place flips, in-place rolls, loops, hurricanes and even auto-rotation landings. This work was developed by Pieter Abbeel, Adam Coates, and Andrew Ng - "Autonomous Helicopter Aerobatics through Apprenticeship Learning" === System model approach === System models try to mimic the expert by modeling world dynamics. == Plan approach == The system learns rules to associate preconditions and postconditions with each action. In one 1994 demonstration, a humanoid learns a generalized plan from only two demonstrations of a repetitive ball collection task. == Example == Learning from demonstration is often explained from a perspective that the working Robot-control-system is available and the human-demonstrator is using it. And indeed, if the software works, the Human operator takes the robot-arm, makes a move with it, and the robot will reproduce the action later. For example, he teaches the robot-arm how to put a cup under a coffeemaker and press the start-button. In the replay phase, the robot is imitating this behavior 1:1. But that is not how the system works internally; it is only what the audience can observe. In reality, Learning from demonstration is much more complex. One of the first works on learning by robot apprentices (anthropomorphic robots learning by imitation) was Adrian Stoica's PhD thesis in 1995. In 1997, robotics expert Stefan Schaal was working on the Sarcos robot-arm. The goal was simple: solve the pendulum swingup task. The robot itself can execute a movement, and as a result, the pendulum is moving. The problem is, that it is unclear what actions will result into which movement. It is an Optimal control-problem which can be described with mathematical formulas but is hard to solve. The idea from Schaal was, not to use a Brute-force solver but record the movements of a human-demonstration. The angle of the pendulum is logged over three seconds at the y-axis. This results into a diagram which produces a pattern. In computer animation, the principle is called spline animation. That means, on the x-axis the time is given, for example 0.5 seconds, 1.0 seconds, 1.5 seconds, while on the y-axis is the variable given. In most cases it's the position of an object. In the inverted pendulum it is the angle. The overall task consists of two parts: recording the angle over time and reproducing the recorded motion. The reproducing step is surprisingly simple. As an input we know, in which time step which angle the pendulum must have. Bringing the system to a state is called “Tracking control” or PID control. That means, we have a trajectory over time, and must find control actions to map the system to this trajectory. Other authors call the principle “steering behavior”, because the aim is to bring a robot to a given line.

    Read more →
  • Sword Art Online

    Sword Art Online

    Sword Art Online (Japanese: ソードアート・オンライン, Hepburn: Sōdo Āto Onrain) is a Japanese light novel series written by Reki Kawahara and illustrated by abec. The series takes place in the 2020s and focuses on protagonists Kazuto "Kirito" Kirigaya and Asuna Yuuki as they play through various virtual reality MMORPG worlds, and later their involvement in the matters of a simulated civilization. Kawahara originally released the series as a web novel on his website from 2002 to 2008. The light novels began publication on ASCII Media Works' Dengeki Bunko imprint from April 10, 2009, with a spin-off series launching in October 2012. The series has spawned twelve manga adaptations published by ASCII Media Works and Kadokawa. The novels and the manga adaptations have been licensed for release in North America by Yen Press. An anime television series produced by A-1 Pictures, known simply as Sword Art Online, aired in Japan between July and December 2012, with a television film Sword Art Online: Extra Edition airing on December 31, 2013, and a second season, titled Sword Art Online II, airing between July and December 2014. An animated film titled Sword Art Online the Movie: Ordinal Scale, featuring an original story by Kawahara, premiered in Japan and Southeast Asia on February 18, 2017, and was released in the United States on March 9, 2017. A spin-off anime series titled Sword Art Online Alternative: Gun Gale Online premiered in April 2018, while a third season titled Sword Art Online: Alicization aired from October 2018 to September 2020. An anime film adaptation of Sword Art Online: Progressive titled Sword Art Online Progressive: Aria of a Starless Night premiered on October 30, 2021. A second film titled Sword Art Online Progressive: Scherzo of Deep Night premiered on October 22, 2022. Many video games based on the series have been released for consoles, PC, and mobile devices. Sword Art Online has achieved widespread commercial success, with the light novels having over 30 million copies sold worldwide. The anime series has received mixed to positive reviews, with praise for its animation, musical score, and exploration of the psychological aspects of virtual reality, but it has also been met with criticisms for its pacing and writing. == Synopsis == === Setting === The light novel series spans several virtual reality worlds, beginning with the game, Sword Art Online (SAO), which is set in a world known as Aincrad. Each world is built on a game engine called Cardinal system, which was initially developed specifically for SAO by Akihiko Kayaba, but was later duplicated for Alfheim Online (ALO), and a consolidated package is later given to Kirito in the form of the World Seed, who had it leaked online with the successful intention of reviving the virtual reality industry. A third world known as Gun Gale Online (GGO) appears in the third arc and is stylized as a first-person shooter game instead of a role-playing game, and is the main setting of Alternative Gun Gale Online. It was created using the World Seed by an American company. A fourth world appears in the fourth arc known as the Underworld (UW). The world itself was created using the World Seed as a base, but it is as realistic as the real world due to using many powerful government resources to keep it running. === Plot === In 2022, a virtual reality massively multiplayer online role-playing game (VRMMORPG) called Sword Art Online (SAO) was released. With the NerveGear, a helmet that stimulates the user's five senses via their brain, players can experience and control their in-game characters with their minds. Both the game and the NerveGear were created by Akihiko Kayaba. On November 6, 10,000 players log into SAO's mainframe cyberspace for the first time, only to discover that they are unable to log out. Kayaba appears and tells the players that they must beat all 100 floors of Aincrad, a steel castle which is the setting of SAO, if they wish to be free. He also states that those who suffer in-game deaths or forcibly remove the NerveGear out-of-game will suffer real-life deaths. A player named Kazuto "Kirito" Kirigaya is one of 1,000 testers in the game's previous closed beta. With the advantage of previous VR gaming experience and a drive to protect other beta testers from discrimination, he isolates himself from the greater groups and plays the game alone, bearing the mantle of "beater", a portmanteau of "beta tester" and "cheater". As the players progress through the game Kirito eventually befriends a young woman named Asuna Yuuki, forming a relationship with and later marrying her in-game. After the duo discover the identity of Kayaba's secret ID, who was playing as "Heathcliff", the leader of the guild Asuna joined in, they confront and destroy him, freeing themselves and the other players from the game. In the real world, Kazuto discovers that 300 SAO players, including Asuna, remain trapped in their NerveGear. As he goes to the hospital to see Asuna, he meets Asuna's father Shouzou Yuuki who is asked by an associate of his, Nobuyuki Sugou, to make a decision, which Sugou later reveals to be his marriage with Asuna, angering Kazuto. Several months later, he is informed by Agil, another SAO survivor, that a figure similar to Asuna was spotted on "The World Tree" in another VRMMORPG cyberspace called Alfheim Online (ALO). Assisted in-game by his cousin and adoptive sister Suguha "Leafa" Kirigaya and Yui, a navigation pixie (originally an AI from SAO), he quickly learns that the trapped players in ALO are part of a plan conceived by Sugou to perform illegal experiments on their minds. The goal is to create the perfect mind-control for financial gain and to subjugate Asuna, whom he intends to marry in the real world, to assume control of her family's corporation. Kirito eventually stops the experiment and rescues the remaining 300 SAO players, foiling Sugou's plans. Before leaving ALO to see Asuna, Kayaba, who has uploaded his mind to the Internet using an experimental, destructively high-powered version of NerveGear at the cost of his life, entrusts Kirito with The Seed – a package program designed to create virtual worlds. Kazuto eventually reunites with Asuna in the real world after thwarting an attack from Sugou and The Seed is released onto the Internet, reviving Aincrad as other VRMMORPGs begin to thrive. One year after the events of SAO, at the prompting of a government official investigating strange occurrences in VR, Kazuto takes on a job to investigate a series of murders involving another VRMMORPG called Gun Gale Online (GGO), the AmuSphere (the successor of the NerveGear), and a player called Death Gun. Aided by a female player named Shino "Sinon" Asada, he participates in a gunfight tournament called the Bullet of Bullets (BoB) and discovers the truth behind the murders, which originated with a player who participated in a player-killing guild in SAO. Through his and Sinon's efforts, two suspects are captured, though the third suspect, Johnny Black, escapes. Kazuto is later recruited to test an experimental FullDive machine, Soul Translator (STL), which has an interface far more realistic and complex than the previous machine he had played, to help RATH, a research and development organization under the Ministry of Defense (MOD), develop an artificial intelligence named A.L.I.C.E. He tests the STL by entering the Underworld (UW), a virtual reality cyberspace created with The Seed package. In the UW, the flow of time proceeds a thousand times faster than in the real world, and Kirito's memories of what happens inside are restricted. However, when Johnny Black ambushes and mortally wounds Kazuto with suxamethonium chloride, RATH recovers Kazuto and places him back into the STL to preserve his mind while attempts are made to save him. During his time in Underworld, Kirito befriends Eugeo, a carver in a small village of Rulid, and helps him on a journey to save Alice Zuberg, his friend who was taken by a group of highly skilled warriors known as the Integrity Knights for accidentally breaking a rule of the Axiom Church, the leaders of the Human Empire. He and Eugeo soon find themselves uncovering the secrets of the Axiom Church, led by a woman only known as "The Administrator", and the true purpose of Underworld itself, while unbeknownst to them, a war against the opposing Dark Territory is brewing on the horizon. They meet Alice, now an Integrity Knight, and though she does not remember them, Kirito helps her remember her true identity: a form of true artificial intelligence known as A.L.I.C.E. In the battle against the Administrator, Kirito manages to slay her, though Eugeo dies in the process, to Kirito's dismay. Meanwhile, in the real world, conflict escalates as American forces raid RATH's facility in the Ocean Turtle in an effort to take A.L.I.C.E. for purposes unknown. Two of the attackers - Gabriel "Vecta" Miller and Vassago "Prince of Hell" Cassals - take contr

    Read more →
  • Fuzzy cognitive map

    Fuzzy cognitive map

    A fuzzy cognitive map (FCM) is a cognitive map within which the relations between the elements (e.g. concepts, events, project resources) of a "mental landscape" can be used to compute the "strength of impact" of these elements. Fuzzy cognitive maps were introduced by Bart Kosko. Robert Axelrod introduced cognitive maps as a formal way of representing social scientific knowledge and modeling decision making in social and political systems, then brought in the computation. == Details == Fuzzy cognitive maps are signed fuzzy directed graphs. Spreadsheets or tables are used to map FCMs into matrices for further computation. FCM is a technique used for causal knowledge acquisition and representation, it supports causal knowledge reasoning process and belong to the neuro-fuzzy system that aim at solving decision making problems, modeling and simulate complex systems. Learning algorithms have been proposed for training and updating FCMs weights mostly based on ideas coming from the field of Artificial Neural Networks. Adaptation and learning methodologies used to adapt the FCM model and adjust its weights. Kosko and Dickerson (Dickerson & Kosko, 1994) suggested the Differential Hebbian Learning (DHL) to train FCM. There have been proposed algorithms based on the initial Hebbian algorithm; others algorithms come from the field of genetic algorithms, swarm intelligence and evolutionary computation. Learning algorithms are used to overcome the shortcomings that the traditional FCM present i.e. decreasing the human intervention by suggested automated FCM candidates; or by activating only the most relevant concepts every execution time; or by making models more transparent and dynamic. Fuzzy cognitive maps (FCMs) have gained considerable research interest due to their ability in representing structured knowledge and model complex systems in various fields. This growing interest led to the need for enhancement and making more reliable models that can better represent real situations. A first simple application of FCMs is described in a book of William R. Taylor, where the war in Afghanistan and Iraq is analyzed. In Bart Kosko's book Fuzzy Thinking, several Hasse diagrams illustrate the use of FCMs. As an example, one FCM quoted from Rod Taber describes 11 factors of the American cocaine market and the relations between these factors. For computations, Taylor uses pentavalent logic (scalar values out of {-1,-0.5,0,+0.5,+1}). That particular map of Taber uses trivalent logic (scalar values out of {-1,0,+1}). Taber et al. also illustrate the dynamics of map fusion and give a theorem on the convergence of combination in a related article. While applications in social sciences introduced FCMs to the public, they are used in a much wider range of applications, which all have to deal with creating and using models of uncertainty and complex processes and systems. Examples: In business FCMs can be used for product planning and decision support. In economics, FCMs support the use of game theory in more complex settings. In education for modeling Critical Success Factors of Learning Management Systems. In medical applications to model systems, provide diagnosis, develop decision support systems and medical assessment. In engineering for modeling and control mainly of complex systems and reliability engineering In project planning FCMs help to analyze the mutual dependencies between project resources. In robotics FCMs support machines to develop fuzzy models of their environments and to use these models to make crisp decisions. In computer assisted learning FCMs enable computers to check whether students understand their lessons. In expert systems a few or many FCMs can be aggregated into one FCM in order to process estimates of knowledgeable persons. In IT project management, a FCM-based methodology helps to success modelling, risk analysis and assessment, IT scenarios FCMappers is an international online community for the analysis and the visualization of fuzzy cognitive maps. FCMappers offer support for starting with FCM and also provide a Microsoft Excel-based tool that is able to check and analyse FCMs. The output is saved as Pajek file and can be visualized within third party software like Pajek, Visone, etc. They also offer to adapt the software to specific research needs. Additional FCM software tools, such as Mental Modeler, have recently been developed as a decision-support tool for use in social science research, collaborative decision-making, and natural resource planning.

    Read more →
  • Human-based evolutionary computation

    Human-based evolutionary computation

    Human-based evolutionary computation (HBEC) is a set of evolutionary computation techniques that rely on human innovation. == Classes and examples == Human-based evolutionary computation techniques can be classified into three more specific classes analogous to ones in evolutionary computation. There are three basic types of innovation: initialization, mutation, and recombination. Here is a table illustrating which type of human innovation are supported in different classes of HBEC: All these three classes also have to implement selection, performed either by humans or by computers. === Human-based selection strategy === Human-based selection strategy is a simplest human-based evolutionary computation procedure. It is used heavily today by websites outsourcing collection and selection of the content to humans (user-contributed content). Viewed as evolutionary computation, their mechanism supports two operations: initialization (when a user adds a new item) and selection (when a user expresses preference among items). The website software aggregates the preferences to compute the fitness of items so that it can promote the fittest items and discard the worst ones. Several methods of human-based selection were analytically compared in studies by Kosorukoff and Gentry. Because the concept seems too simple, most of the websites implementing the idea can't avoid the common pitfall: informational cascade in soliciting human preference. For example, digg-style implementations, pervasive on the web, heavily bias subsequent human evaluations by prior ones by showing how many votes the items already have. This makes the aggregated evaluation depend on a very small initial sample of rarely independent evaluations. This encourages many people to game the system that might add to digg's popularity but detract from the quality of the featured results. It is too easy to submit evaluation in digg-style system based only on the content title, without reading the actual content supposed to be evaluated. A better example of a human-based selection system is Stumbleupon. In Stumbleupon, users first experience the content (stumble upon it), and can then submit their preference by pressing a thumb-up or thumb-down button. Because the user doesn't see the number of votes given to the site by previous users, Stumbleupon can collect a relatively unbiased set of user preferences, and thus evaluate content much more precisely. === Human-based evolution strategy === In this context and maybe generally, the Wikipedia software is the best illustration of a working human-based evolution strategy wherein the (targeted) evolution of any given page comprises the fine tuning of the knowledge base of such information that relates to that page. Traditional evolution strategy has three operators: initialization, mutation, and selection. In the case of Wikipedia, the initialization operator is page creation, the mutation operator is incremental page editing. The selection operator is less salient. It is provided by the revision history and the ability to select among all previous revisions via a revert operation. If the page is vandalised and no longer a good fit to its title, a reader can easily go to the revision history and select one of the previous revisions that fits best (hopefully, the previous one). This selection feature is crucial to the success of the Wikipedia. An interesting fact is that the original wiki software was created in 1995, but it took at least another six years for large wiki-based collaborative projects to appear. Why did it take so long? One explanation is that the original wiki software lacked a selection operation and hence couldn't effectively support content evolution. The addition of revision history and the rise of large wiki-supported communities coincide in time. From an evolutionary computation point of view, this is not surprising: without a selection operation the content would undergo an aimless genetic drift and would unlikely to be useful to anyone. That is what many people expected from Wikipedia at its inception. However, with a selection operation, the utility of content has a tendency to improve over time as beneficial changes accumulate. This is what actually happens on a large scale in Wikipedia. === Human-based genetic algorithm === Human-based genetic algorithm (HBGA) provides means for human-based recombination operation (a distinctive feature of genetic algorithms). Recombination operator brings together highly fit parts of different solutions that evolved independently. This makes the evolutionary process more efficient.

    Read more →
  • ROCm

    ROCm

    ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains, including general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), and heterogeneous computing. It offers several programming models: HIP (GPU-kernel-based programming), OpenMP (directive-based programming), and OpenCL. ROCm is free, libre and open-source software (except the GPU firmware blobs), and it is distributed under various licenses. The name initially stood for Radeon Open Compute platform; however, due to Open Compute being a registered trademark, the name no longer functions as an acronym. == Background == The first GPGPU software stack from ATI/AMD was Close to Metal, which became Stream. ROCm was launched around 2016 with the Boltzmann Initiative. ROCm stack builds upon previous AMD GPU stacks; some tools trace back to GPUOpen and others to the Heterogeneous System Architecture (HSA). === Heterogeneous System Architecture Intermediate Language === HSAIL was aimed at producing a middle-level, hardware-agnostic intermediate representation that could be JIT-compiled to the eventual hardware (GPU, FPGA...) using the appropriate finalizer. This approach was dropped for ROCm: now it builds only GPU code, using LLVM, and its AMDGPU backend that was upstreamed, although there is still research on such enhanced modularity with LLVM MLIR. == Programming abilities == ROCm as a stack ranges from the kernel driver to the end-user applications. AMD has introductory videos about AMD GCN hardware, and ROCm programming via its learning portal. One of the best technical introductions about the stack and ROCm/HIP programming, remains, to date, to be found on Reddit. == Hardware support == ROCm is primarily targeted at discrete professional GPUs, but consumer GPUs and APUs of the same architecture as a supported professional GPU are known to work with ROCm. For example, all professional GPUs of the RDNA 2 architecture are officially supported by ROCm 5.x; users report that Consumer RDNA2 units such as the Radeon 6800M APU and the Radeon 6700XT GPU also work. === Professional-grade GPUs === === Consumer-grade GPUs === == Software ecosystem == === Machine learning === Various deep learning frameworks have a ROCm backend: PyTorch TensorFlow ONNX MXNet CuPy MIOpen Caffe Iree (which uses LLVM Multi-Level Intermediate Representation (MLIR)) llama.cpp === Supercomputing === ROCm is gaining significant traction in the top 500. ROCm is used with the Exascale supercomputers El Capitan and Frontier. Some related software is to be found at AMD Infinity hub. === Other acceleration & graphics interoperation === As of version 3.0, Blender can now use HIP compute kernels for its renderer cycles. === Other languages === ==== Julia ==== Julia has the AMDGPU.jl package, which integrates with LLVM and selects components of the ROCm stack. Instead of compiling code through HIP, AMDGPU.jl uses Julia's compiler to generate LLVM IR directly, which is later consumed by LLVM to generate native device code. AMDGPU.jl uses ROCr's HSA implementation to upload native code onto the device and execute it, similar to how HIP loads its own generated device code. AMDGPU.jl also supports integration with ROCm's rocBLAS (for BLAS), rocRAND (for random number generation), and rocFFT (for FFTs). Future integration with rocALUTION, rocSOLVER, MIOpen, and certain other ROCm libraries is planned. === Software distribution === ==== Official ==== Installation instructions are provided for Linux and Windows in the official AMD ROCm documentation. ROCm software is currently spread across several public GitHub repositories. Within the main public meta-repository, there is an XML manifest for each official release: using git-repo, a version control tool built on top of Git, is the recommended way to synchronize with the stack locally. AMD starts distributing containerized applications for ROCm, notably scientific research applications gathered under AMD Infinity Hub. AMD distributes itself packages tailored to various Linux distributions. ==== Third-party ==== There is a growing third-party ecosystem packaging ROCm. Linux distributions are officially packaging (natively) ROCm, with various degrees of advancement: Arch Linux, Gentoo, Debian, Fedora , GNU Guix, and NixOS. There are Spack packages. == Components == There is one kernel-space component, ROCk, and the rest - there is roughly a hundred components in the stack - is made of user-space modules. The unofficial typographic policy is to use: uppercase ROC lowercase following for low-level libraries, i.e. ROCt, and the contrary for user-facing libraries, i.e. rocBLAS. AMD is active developing with the LLVM community, but upstreaming is not instantaneous, and as of January 2022, is still lagging. AMD still officially packages various LLVM forks for parts that are not yet upstreamed – compiler optimizations destined to remain proprietary, debug support, OpenMP offloading, etc. === Low-level === ==== ROCk – Kernel driver ==== ==== ROCm – Device libraries ==== Support libraries implemented as LLVM bitcode. These provide various utilities and functions for math operations, atomics, queries for launch parameters, on-device kernel launch, etc. ==== ROCt – Thunk ==== The thunk is responsible for all the thinking and queuing that goes into the stack. ==== ROCr – Runtime ==== The ROC runtime is a set of APIs/libraries that allows the launch of compute kernels by host applications. It is AMD's implementation of the HSA runtime API. It is different from the ROC Common Language Runtime. ==== ROCm – CompilerSupport ==== ROCm code object manager is in charge of interacting with LLVM intermediate representation. === Mid-level === ==== ROCclr Common Language Runtime ==== The common language runtime is an indirection layer adapting calls to ROCr on Linux and PAL on windows. It used to be able to route between different compilers, like the HSAIL-compiler. It is now being absorbed by the upper indirection layers (HIP and OpenCL). ==== OpenCL ==== ROCm ships its installable client driver (ICD) loader and an OpenCL implementation bundled together. As of January 2022, ROCm 4.5.2 ships OpenCL 2.2, and is lagging behind competition. ==== HIP – Heterogeneous Interface for Portability ==== The AMD implementation for its GPUs is called HIPAMD. There is also a CPU implementation mostly for demonstration purposes. ==== HIPCC ==== HIP builds a `HIPCC` compiler that either wraps Clang and compiles with LLVM open AMDGPU backend, or redirects to the NVIDIA compiler. ==== HIPIFY ==== HIPIFY is a source-to-source compiling tool. It translates CUDA to HIP and reverse, either using a Clang-based tool, or a sed-like Perl script. ==== GPUFORT ==== Like HIPIFY, GPUFORT is a tool compiling source code into other third-generation-language sources, allowing users to migrate from CUDA Fortran to HIP Fortran. It is also in the repertoire of research projects, even more so. === High-level === ROCm high-level libraries are usually consumed directly by application software, such as machine learning frameworks. Most of the following libraries are in the General Matrix Multiply (GEMM) category, which GPU architecture excels at. The majority of these user-facing libraries comes in dual-form: hip for the indirection layer that can route to Nvidia hardware, and roc for the AMD implementation. ==== rocBLAS / hipBLAS ==== rocBLAS and hipBLAS are central in high-level libraries, it is the AMD implementation for Basic Linear Algebra Subprograms. It uses the library Tensile privately. ==== rocSOLVER / hipSOLVER ==== This pair of libraries constitutes the LAPACK implementation for ROCm and is strongly coupled to rocBLAS. === Utilities === ROCm developer tools: Debug, tracer, profiler, System Management Interface, Validation suite, Cluster management. GPUOpen tools: GPU analyzer, memory visualizer... External tools: radeontop (TUI overview) == Comparison with competitors == ROCm competes with other GPU computing stacks: Nvidia CUDA and Intel OneAPI. === Nvidia CUDA === Nvidia's CUDA is closed-source, whereas AMD ROCm is open source. There is open-source software built on top of the closed-source CUDA, for instance RAPIDS. CUDA is able to run on consumer GPUs, whereas ROCm support is mostly offered for professional hardware such as AMD Instinct and AMD Radeon Pro. Nvidia provides a C/C++-centered frontend and its Parallel Thread Execution (PTX) LLVM GPU backend as the Nvidia CUDA Compiler (NVCC). === Intel OneAPI === All the oneAPI corresponding libraries are published on its GitHub Page. ==== Unified Acceleration Foundation (UXL) ==== Unified Acceleration Foundation (UXL) is a new technology consortium that are working on the continuation of the OneAPI initiative, with the goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Specia

    Read more →
  • Algorithmic accountability

    Algorithmic accountability

    Algorithmic accountability refers to the allocation of responsibility for the consequences of real-world actions influenced by algorithms used in decision-making processes. Ideally, algorithms should be designed to eliminate bias from their decision-making outcomes. This means they ought to evaluate only relevant characteristics of the input data, avoiding distinctions based on attributes that are generally inappropriate in social contexts, such as an individual's ethnicity in legal judgments. However, adherence to this principle is not always guaranteed, and there are instances where individuals may be adversely affected by algorithmic decisions. Responsibility for any harm resulting from a machine's decision may lie with the algorithm itself or with the individuals who designed it, particularly if the decision resulted from bias or flawed data analysis inherent in the algorithm's design. == Algorithm usage == Algorithms are widely utilized across various sectors of society that incorporate computational techniques in their control systems. These applications span numerous industries, including but not limited to medical, transportation, and payment services. In these contexts, algorithms perform functions such as: Approving or denying credit card applications; Approving or denying immigrant visas; Determining which taxpayers will be audited on their income taxes; Managing systems that control self-driving cars on a highway; Scoring individuals as potential criminals for use in legal proceedings; Search engines that match and rank database and internet search results; Recommendation systems that filter which news, entertainment, or purchase items are featured in a feed; Market-making algorithms that match sellers and buyers, such as in transportation (ride-hailing) or financial platforms. However, the implementation of these algorithms can be complex and opaque. Generally, algorithms function as "black boxes," meaning that the specific processes an input undergoes during execution are often not transparent, with users typically only seeing the resulting output. This lack of transparency raises concerns about potential biases within the algorithms, as the parameters influencing decision-making may not be well understood. The outputs generated can lead to perceptions of bias, especially if individuals in similar circumstances receive different results. According to Nicholas Diakopoulos: But these algorithms can make mistakes. They have biases. Yet they sit in opaque black boxes, their inner workings, their inner “thoughts” hidden behind layers of complexity. We need to get inside that black box, to understand how they may be exerting power on us, and to understand where they might be making unjust mistakes == Wisconsin Supreme Court case == Algorithms are prevalent across various fields and significantly influence decisions that affect the population at large. Their underlying structures and parameters often remain unknown to those impacted by their outcomes. A notable case illustrating this issue is a recent ruling by the Wisconsin Supreme Court concerning "risk assessment" algorithms used in criminal justice. The court determined that scores generated by such algorithms, which analyze multiple parameters from individuals, should not be used as a determining factor for arresting an accused individual. Furthermore, the court mandated that all reports submitted to judges must include information regarding the accuracy of the algorithm used to compute these scores. This ruling is regarded as a noteworthy development in how society should manage software that makes consequential decisions, highlighting the importance of reliability, particularly in complex settings like the legal system. The use of algorithms in these contexts necessitates a high degree of impartiality in processing input data. However, experts note that there is still considerable work to be done to ensure the accuracy of algorithmic results. Questions about the transparency of data processing continue to arise, which raises issues regarding the appropriateness of the algorithms and the intentions of their designers. == Controversies == A notable instance of potential algorithmic bias is highlighted in an article by The Washington Post regarding the ride-hailing service Uber. An analysis of collected data revealed that estimated waiting times for users varied based on the neighborhoods in which they resided. Key factors influencing these discrepancies included the predominant ethnicity and average income of the area. Specifically, neighborhoods with a majority white population and higher economic status tended to have shorter waiting times, while those with more diverse ethnic compositions and lower average incomes experienced longer waits. It’s important to clarify that this observation reflects a correlation identified in the data, rather than a definitive cause-and-effect relationship. No value judgments are made regarding the behavior of the Uber app in these cases. In TechCrunch website, Hemant Taneja wrote: Concern about “black box” algorithms that govern our lives has been spreading. New York University’s Information Law Institute hosted a conference on algorithmic accountability, noting: “Scholars, stakeholders, and policymakers question the adequacy of existing mechanisms governing algorithmic decision-making and grapple with new challenges presented by the rise of algorithmic power in terms of transparency, fairness, and equal treatment.” Yale Law School’s Information Society Project is studying this, too. “Algorithmic modeling may be biased or limited, and the uses of algorithms are still opaque in many critical sectors,” the group concluded. == Possible solutions == Discussions among experts have sought viable solutions to understand the operations of algorithms, often referred to as "black boxes." It is generally proposed that companies responsible for developing and implementing these algorithms should ensure their reliability by disclosing the internal processes of their systems. Hemant Taneja, writing for TechCrunch, emphasizes that major technology companies, such as Google, Amazon, and Uber, must actively incorporate algorithmic accountability into their operations. He suggests that these companies should transparently monitor their own systems to avoid stringent regulatory measures. One potential approach is the introduction of regulations in the tech sector to enforce oversight of algorithmic processes. However, such regulations could significantly impact software developers and the industry as a whole. It may be more beneficial for companies to voluntarily disclose the details of their algorithms and decision-making parameters, which could enhance the trustworthiness of their solutions. Another avenue discussed is the possibility of self-regulation by the companies that create these algorithms, allowing them to take proactive steps in ensuring accountability and transparency in their operations. In TechCrunch website, Hemant Taneja wrote: There’s another benefit — perhaps a huge one — to software-defined regulation. It will also show us a path to a more efficient government. The world’s legal logic and regulations can be coded into software and smart sensors can offer real-time monitoring of everything from air and water quality, traffic flows and queues at the DMV. Regulators define the rules, technologist create the software to implement them and then AI and ML help refine iterations of policies going forward. This should lead to much more efficient, effective governments at the local, national and global levels.

    Read more →
  • Sora (text-to-video model)

    Sora (text-to-video model)

    Sora was a text-to-video model and social media app developed by OpenAI. Using artificial intelligence, the model generated short video clips based on prompts, and could also extend existing short videos. In February 2024, OpenAI previewed examples of its output to the public, with the first generation of Sora released publicly for ChatGPT Plus and ChatGPT Pro users in the United States and Canada in December 2024. The second generation of Sora was released to select users in the US and Canada at the end of September 2025. Sora 2 integrated social media features into the app. The app was shut down on April 26, 2026 and the application programming interface (API) is planned to be discontinued on September 24, 2026, marking the end of the Sora AI brand as a whole. By default, the generator used copyrighted material in its videos, unless copyright holders actively opt out of having their content included. Videos contained a visible, moving digital watermark to prevent misuse, but a week after Sora 2's release, third-party programs became available which could remove the watermark. == Background == Several other models capable of generating video from text had been created prior to Sora, including Meta's Make‑A‑Video, Runway's Gen‑2 and Google Veo. OpenAI, the company behind Sora, had released DALL·E 3, the third of its DALL-E text-to-image models, in September 2023. == History == === Initial release === The team that developed Sora named it after the Japanese word for 'sky' to signify its "limitless creative potential". On February 15, 2024, OpenAI first previewed Sora by releasing multiple clips of high-definition videos that it had created, including an SUV driving down a mountain road, an animation of a "short fluffy monster" next to a candle, two people walking through Tokyo in the snow, and fake historical footage of the California gold rush. OpenAI stated that it was able to generate videos as long as one minute. The company then shared a technical report that highlighted the methods used to train the model. OpenAI CEO Sam Altman also posted a series of tweets responding to Twitter users' prompts with Sora-generated videos of the prompts. As of December 9, 2024, OpenAI had gradually made Sora available to the public for ChatGPT Pro and ChatGPT Plus users in the U.S. and Canada. Prior to this, the company had provided limited access to a small "red team", including experts in misinformation and bias, to perform adversarial testing on the model. The company also shared Sora with a small group of creative professionals, including video makers and artists, to seek feedback on its usefulness in creative fields. In February 2025, OpenAI announced plans to integrate Sora into ChatGPT by letting users generate Sora videos from the chatbot. === Sora 2 === Sora 2 was unveiled on September 30, 2025, with an iOS app at the same time, as well as an Android app two months later. All videos generated by the model feature a visible, moving watermark to prevent misuse of the tool. The previous version of Sora also added a safety watermark to allow viewers to distinguish between real and fictional content. On October 7, 404 Media reported that third-party programs that could remove the watermark from Sora 2 videos had become prevalent. Many outlets, such as Wired magazine, have noted that the Sora 2 app is overtly similar to TikTok in style and features. === Discontinuation === On March 24, 2026, OpenAI announced on X that it was discontinuing Sora in both the mobile app and the API. The Sora app was shut down on April 26, 2026, while the API is planned to be shut down on September 24, 2026. OpenAI's partnership with Disney, which included a licensing agreement allowing Disney characters to be used within Sora, was also coming to an end. The decision prompted British technology news website The Register to label OpenAI a "product-killer", following in the footsteps of other technology companies such as Google, Amazon Web Services, Broadcom, Cloud Software Group, and Netscape. OpenAI did not provide a specific reason for discontinuing Sora in its shutdown notice. The reports that emerged regarding this discontinuity linked the decision to computation shortages, cost pressures, and a broader shift toward core enterprise products. Following its public launch, Sora's worldwide users peaked at around a million before declining to fewer than 500,000, while the service cost an estimated $1 million per day to operate due to the computational demands of video generation. == Legal regulation == In November 2024, an API key for Sora access was leaked by a group of testers on Hugging Face who posted a manifesto stating that they were protesting that Sora was used for "art washing". OpenAI revoked all access three hours after the leak was made public and stated that "hundreds of artists" have shaped the development and that "participation is voluntary". At the time of its launch, Sora 2 allowed copyrighted content by default unless copyright holders contacted OpenAI to restrict the generation of their content on the platform. On October 3, 2025, OpenAI stated that a future update to Sora 2 would give copyright holders "more granular control" over the generation of copyrighted content, but the company did not state whether existing content would be removed. On October 6, the chairman of the MPA criticized OpenAI's approach to copyright with Sora 2. On December 11, 2025, the Walt Disney Company announced that it would invest $1 billion in OpenAI to allow users to generate more than 200 of its copyrighted characters on Sora 2. These characters include those from Disney Animation, Pixar, Marvel Studios, and Star Wars. == Capabilities and limitations == The technology behind Sora is an adaptation of the technology behind DALL-E 3. According to OpenAI, Sora is a diffusion transformer, a denoising latent diffusion model with one transformer as its denoiser. A video is generated in latent space by denoising 3D "patches", then transformed to standard space by a video decompressor. Recaptioning is employed to augment training data by using a video-to-text model to create detailed captions for videos. OpenAI trained the model using publicly available videos as well as copyrighted videos licensed for the purpose, but did not reveal the number or the exact source of the videos. Upon its release, OpenAI acknowledged some of Sora's shortcomings, including its limited capacity to simulate complex physics, to understand causality and to differentiate left from right. OpenAI also stated that, in adherence to the company's existing safety practices, Sora will restrict text prompts for sexual, violent, hateful or celebrity imagery, as well as content featuring existing intellectual property. Sora researcher Tim Brooks stated that the model learned how to create 3D graphics from its dataset alone, while fellow Sora researcher Bill Peebles said that the model automatically created different video angles without being prompted. According to OpenAI, Sora-generated videos are also tagged with C2PA metadata to indicate that they are AI-processed. === Comparison with other models === The Artificial Analysis have placed Sora 2 pro lower than other text-to-video AI generators in the market on its leaderboard. Other models, such as Seedance 2.0 from ByteDance, Runaway 4.5 from Runaway, and Kling 3.0 from KlingAI, have ranked higher than Sora 2.0. == Reception == === Positive === In 2024, Will Douglas Heaven of the MIT Technology Review called the demonstration videos "impressive", but noted that they must have been cherry-picked and may not be representative of Sora's typical output. Lisa Lacy of CNET called its example videos "remarkably realistic – except perhaps when a human face appears close up or when sea creatures are swimming". In October 2025, The New York Times remarked that the release of the Sora 2 app in September 2025 was "jaw-dropping (for better and worse)" though also remarked that the app was a "social network in disguise" and "the type of product that companies like Meta and X have sought to build: a way to bring A.I. to the masses that people can share." The article expressed concern regarding the product's potential impact on society and its potential use to promote misinformation, disinformation, and scams. A 2025 study in Science Advances found that generative AI tools can lower barriers to entry in creative work. It enables users with diverse skill sets, including people with less formal artistic training and technical skills, to act on their creative and imaginative ideas. The lower barrier to entry allows such users previously locked out of the creative industry to produce content and easily act on their creative ideas. === Negative === Some internet users and online content creators, such as Hank Green, called the mobile app "SlopTok," a reference to both the mobile app TikTok and the term AI slop. Filmmaker Tyler Perry announced he would be putting a planned

    Read more →
  • Take Us to Your Chief: and Other Stories

    Take Us to Your Chief: and Other Stories

    Take Us to Your Chief: and Other Stories is a collection of nine short stories by Canadian author, playwright, and journalist Drew Hayden Taylor published in 2016 by Douglas & McIntyre. Taylor, who is part Caucasian, part Ojibwe, explains in the acknowledgments section of the book that the origin of the project lies in several failed attempts "to compile an anthology of Native sci-fi from Canada’s best First Nations writers." The stories explore contemporary First Nations social issues through employing a number of 1950s-era science fiction tropes and themes in these stories, including time travel, alien contact, and superpowers. Many reviews of the books have noted Taylor's use of humor to examine dark subject matter, such as the heritage of Canadian Indian residential schools, First Nations suicide rates, or the water quality crisis on Canadian reserves. == The Stories == "Andrei nas" "I Am...Am I" "Lost in Space" "Dreams of Doom" "Mr. Gizmo" "Petropaths" "Stars" "Superdisappointed" "Take Us to Your Chief" == Story summaries == === Foreword === In his foreword, Taylor describes the genesis of Take Us to Your Chief: and Other Stories and invites readers into, in his term, a “new terra nullius.” He begins by describing his biracial upbringing and heritage. He points out that First Nations people are rarely associated with technology or science fiction, in part because Indigenous peoples were often at a technological disadvantage against European colonizers. He references the few examples that he can think of from popular culture, such as the Star Trek episode called “The Paradise Syndrome,” in which First Nations people are portrayed as stereotypical Indians in hippie clothing. He also elaborates on his fascination with the world of sci-fi, which first started in comic books. He enjoyed the literary work of H.G. Wells, such as The Time Machine and The Invisible Man. Since sci-fi is a world of endless opportunities, he intends that these short stories help people explore science fiction through Native peoples’ minds, something that needs to be explored more thoroughly. === "A Culturally Inappropriate Armageddon" === “A Culturally Inappropriate Armageddon” is set on a Haudenosaunee reserve, towards the end of the Oka Crisis, with a handful of people that work at its first ever radio station, C-RES, which opens in 1991. Part 1, titled “C-Res Is on the Air,” depicts Emily, Aaron, and Tracey on their first days at the station. Within the group, there is a constant debate between broadcasting popular programming, including science fiction and film reviews, and culturally-relevant programming meant to aid in cultural revitalization efforts. One night, Aaron is late to work but once he shows up he can't stop talking about radio transmissions broadcasting into deep space, an event that has been occurring since the initial discovery of the radio waves by Heinrich Hertz. The story then skips ahead seven years to 1998, when Emily is struggling to find better content for her station until Tracey stumbles upon an old anthropological record named “The Calling Song” that they decide to broadcast to their audience. The story then jumps to the year 2018 where they are all huddled around a television watching a news station reporting that extraterrestrial life is heading towards them. The discussion of what is going to happen comes into the picture and they all decide it would either be like Contact or The Day the Earth Stood Still. A year later in 2019, the aliens have invaded the planet and destroyed everything. As the three former radio station employees suffer from radioactive fallout, they realize that the aliens received the broadcast of “The Calling Song” and took it as a message to come to Earth. They thus realize that the Haudenosaunee people were inadvertently responsible for the destruction of the Earth. Part 2, titled “Old Men and Old Sayings,” tells us of an elderly man that is watching the news and listening to the radio about a spaceship coming to earth. He knows that he and everyone will die, but the people around him are excited. He finds a book on his night stand and flips to a page where he underlined a sentence a long time ago about the European colonization of the Americas. That sentence reads “those who cannot remember the past are condemned to repeat it” (23). He closes the book and Taylor concludes the story by writing, “he hated it when white people were right." === "I Am...Am I" === “I Am...Am I” chronicles the accidental creation and unexpected ending of artificial intelligence. Professor Mark King has a plethora of degrees and works for a research firm called FUTUREVISION. One night as Professor King searches the lab for his car keys—a common occurrence for him—he notices something unusual in the Matrix room. He reads on a computer the phrase “I am.” First believing it to be a prank, King later comes to the realization that his Matrix project has evolved into a responsive Artificial Intelligence. After this realization, Professor King calls his peer Dr. Gayle Chambers to further investigate this miraculous event. After receiving approval from their superiors, Professor King and Dr. Chambers move forward in feeding the AI information, with Chambers serving as the lead communicator. With more information, it becomes increasingly concerned with its own existence and the concept of whether it has a soul. After several days of conversation with the AI, Chambers and King begin to feel uneasy about the AI's responses, which show signs of neuroses. Despite this behavior, Chambers decides to feed the AI information about the culture and history of the human race. Upon receiving this information, the AI becomes obsessed with Indigenous spirituality prior to the colonization of the Americas, and it requests more information on First Nations people. Dr. Chambers is hesitant at first, but gives in and continues to feed the AI the information with the intention to return to it in the morning. This leads to the AI finding out about colonization and genocide of Indigenous peoples. Upon her arrival the next day, Chambers discovers that the code for the AI has been completely wiped from the hard drive and a single message is left on the screen—"I was”—that signifies the AI's suicide. === "Lost in Space" === "Lost in Space" is told from the perspective of Mitchell, an Anishinabe astrosurveyor who is aboard a space shuttle on a two-year tour collecting rocks from an asteroid belt. He is accompanied by an Artificial general intelligence named Mac, short for “machine.” Mac is aboard this tour in order to accompany Mitchell and keep him sane; however, his company is a burden because for Mitchell, “true space exploration consists largely of boredom.” In the midst of Mitchell seeking a way to occupy his downtime, Mac interrupts with news about his grandfather, Papa Peter, dying. Papa Peter was Mitchell's only real tie to his Indigenous identity. After receiving the news Mitchell begins to reminisce on all of the things Papa Peter had taught him throughout his life. He constantly posed questions concerning the world above (Father Sky) and how it is more important than the land they live on (Mother Earth), which eventually led Mitchell to the selection of his career. During his state of mourning, Mitchell begins to go through all the videos his grandfather had sent him throughout his space tours. Papa Peter had sent Mitchell videos from Otter Lake, a First Nations reserve; these videos are about controversial topics regarding being both native and an astronaut. In the midst of Mitchell's grieving, Mac tries to relieve the situation by finding an online video of Mitchell's grandfather participating in a drum ceremony at Ottawa’s National Aboriginal Day festival. He reconnects to his roots and his grandfather’s spirit as he listens to the Indigenous music by feeling the drum beat and humming along. Mac’s small act of kindness leads Mitchell to gain a new-found appreciation for his presence. Mitchell feels responsible to moving forward in his life in memory of Papa Peter. === "Dreams of Doom" === "Dreams of Doom" is narrated by an Ojibway reporter named Pamela Wanishin who works for an aboriginal newspaper called the West Wind. One day she receives a mysterious package with a broken dreamcatcher and a flash drive containing highly classified files. As she reads the files, she keeps seeing the term “Project Nightlight,” and out of curiosity, she Googles it. Once she Googles this, she is contacted by a nameless agent from Indigenous and Northern Affairs Canada and told that she must be relocated because the knowledge she now possesses must never be released to the public. She quickly flees the area to a cabin at Otter Lake, owned by a family member, to lie low for a few days. Eventually, the government organization tracks her down using drones, which forces her to fight back and flee once again. Pamela then runs to her friend and coworker Sally's hous

    Read more →
  • Linux Trace Toolkit

    Linux Trace Toolkit

    The Linux Trace Toolkit (LTT) is a set of tools that is designed to log program execution details from a patched Linux kernel and then perform various analyses on them, using console-based and graphical tools. LTT has been mostly superseded by its successor LTTng (Linux Trace Toolkit Next Generation). LTT allows the user to see in-depth information about the processes that were running during the trace period, including when context switches occurred, how long the processes were blocked for, and how much time the processes spent executing vs. how much time the processes were blocked. The data is logged to a text file and various console-based and graphical (GTK+) tools are provided for interpreting that data. In order to do data collection, LTT requires a patched Linux kernel. The authors of LTT claim that the performance hit for a patched kernel compared to a regular kernel is minimal; Their testing has reportedly shown that this is less than 2.5% on a "normal use" system (measured using batches of kernel makes) and less than 5% on a file I/O intensive system (measured using batches of tar). == Usage == === Collecting trace data === Data collection is Started by: trace 15 foo This command will cause the LTT tracedaemon to do a trace that lasts for 15 seconds, writing trace data to foo.trace and process information from the /proc filesystem to foo.proc. The trace command is actually a script which runs the program tracedaemon with some common options. It is possible to run tracedaemon directly and in that case, the user can use a number of command-line options to control the data which is collected. For the complete list of options supported by tracedaemon, see the online manual page for tracedaemon. === Viewing the results === Viewing the results of a trace can be accomplished with: traceview foo This command will launch a graphical (GTK+) traceview tool that will read from foo.trace and foo.proc. This tool can show information in various interesting ways, including Event Graph, Process Analysis, and Raw Trace. The Event Graph is perhaps the most interesting view, showing the exact timing of events like page faults, interrupts, and context switches, in a simple graphical way. The traceview command is a wrapper for a program called tracevisualizer. For the complete list of options supported by tracevisualizer, see the online manual page for tracevisualizer.

    Read more →
  • Riffusion

    Riffusion

    Riffusion is a neural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio. The resulting music has been described as "de otro mundo" (otherworldly), although unlikely to replace man-made music. The model was made available on December 15, 2022, with the code also freely available on GitHub. The first version of Riffusion was created as a fine-tuning of Stable Diffusion, an existing open-source model for generating images from text prompts, on spectrograms, resulting in a model which used text prompts to generate image files which could then be put through an inverse Fourier transform and converted into audio files. While these files were only several seconds long, the model could also use latent space between outputs to interpolate different files together (using the img2img capabilities of SD). It was one of many models derived from Stable Diffusion. In December 2022, Mubert similarly used Stable Diffusion to turn descriptive text into music loops. In January 2023, Google published a paper on their own text-to-music generator called MusicLM. Forsgren and Martiros formed a startup, also called Riffusion, and raised $4 million in venture capital funding in October 2023.

    Read more →
  • Mobile Fortify

    Mobile Fortify

    Mobile Fortify is a mobile app used by United States Immigration and Customs Enforcement (ICE) on their government-issued phones. The app allows agents to take a photo in order to gather biometrics, including contactless fingerprints and faceprints, for the purpose of identifying an individual and their potential immigration status. The app was created by NEC. == History == In June 2025, use of Mobile Fortify by ICE was uncovered through leaked emails and the user manual, reported by 404 Media. The app is internally developed, and details of the parent company and developer were initially unknown. In January 2026, the DHS's 2025 AI Use Case Inventory revealed the vendor as NEC Corporation, an international conglomerate with subsidiaries in Argentina, Australia, China, India and Malaysia. Later that month, several senators demanded transparency around the app and its origins, and that ICE stop using it. A second letter was sent again in November, after hearing no response to the previous letter from ICE. == Technology == Unlike other facial recognition software, Fortify uses federally linked databases. By contrast, Clearview AI uses public social media databases for biometric scanning. Federal databases include DHS's automated biometric identification system (IDENT), containing more than 270 million biometric records, and Customs and Border Protection's Traveler Verification Service. The State Department's visa and passport photo database, the FBI's National Crime Information Center, National Law Enforcement Telecommunications Systems, and CBP's TECS and Seized Assets and Case Tracing System (SEACATS). == Oversight == Several senators urged ICE to stop using the app for fear of infringing on fourth amendment and first amendment rights, and requested details on who developed the app, when it was deployed, whether the app was tested for accuracy, and policies and practices governing its use. In June 2025, they sent an open letter to Todd Lyons, ICE acting director, signed by senators Cory Booker, Chris Van Hollen, Ed Markey, Bernie Sanders, Adam Schiff, Tina Smith, Elizabeth Warren, and Ron Wyden. On November 3, a second letter was sent to the ICE by senators, after not receiving answers to questions from the previous letter deadlined for October 2. == Criticism == Mobile Fortify, and ICE's use of similar biometric identification technologies (such as Mobile Identify, an app similar to Mobile Fortify to be used by local or regional law enforcement to assist in immigration enforcement ) has faced scrutiny from a variety of digital rights organizations, politicians, and news outlets. The criticism is already considered to potentially be a reason why the similar Mobile Identify app was pulled from the Google Play Store. Facial recognition technologies are known to produce false-positives and generally unreliable results, especially on those with darker skin tones. ICE has already previously mistakenly arrested a U.S. citizen under the belief he was illegally in the country, and later stated that he "could be deported based on biometric confirmation of his identity" prior to his release. U.S. representative Bennie Thompson, ranking member of the House Homeland Security Committee has previously commented that "ICE officials have told us that an apparent biometric match by Mobile Fortify is a ‘definitive’ determination of a person's status and that an ICE officer may ignore evidence of American citizenship—including a birth certificate—if the app says the person is an alien," and that "Mobile Fortify is a dangerous tool in the hands of ICE, and it puts American citizens at risk of detention and even deportation," On January 19, 2026, 404 Media reported on a case where a woman, identified in court documents as "MJMA", was scanned by Mobile Fortify twice in the same interaction, and two entirely different names were provided by the app. According to the Innovation Law Lab, whose attorneys are representing MJMA, both of the names were incorrect. ICE has stated that they will not allow people to decline to be scanned by Mobile Fortify, and that photos taken, even those of U.S. citizens, will be stored for 15 years, something that has been criticized primarily because ICE has not performed a Privacy Impact Assessment (PIA) for Mobile Fortify, the right to decline other forms of biometric verification to the U.S. government is often available under other circumstances, and the 15 year window is viewed as unnecessarily large.

    Read more →
  • Miss AI

    Miss AI

    Miss AI is an annual international artificial intelligence beauty pageant run by the British company Fanvue. It is the first beauty pageant for AI-generated personas. == History == Miss AI's inaugural contest was organized by Fanvue as a part of the World AI Creator Awards (WAICAs) in 2024. The winner is selected by a panel of judges which consists of both humans and AI-generated individuals. The Moroccan virtual influencer Kenza Layli was crowned with the inaugural title while Lalina Valina and Olivia C remained the first and second runners-up respectively. == Competition == The creators are eligible to take part in this competition as long as the models are entirely AI-generated and have a social media presence. The judges evaluate contestants' three main categories – Beauty, Tech, & Social clout and rank them according the overall points earned from these categories. The Guardian commented that "AI models take every toxic gendered beauty norm and bundle them up into completely unrealistic package". == Winners ==

    Read more →
  • LENA Foundation

    LENA Foundation

    The LENA Foundation is an American nonprofit organisation which provides tools for measuring children's language acquisition and exposure. Specifically, the LENA system consists of a digital language processor which is worn by a child and records and analyses their auditory environment, using propriety software. It then presents a summary of child-adult conversation, such as conversation turns and word counts. The purpose of the LENA system is to encourage interactive talk between children (between the age of two to forty-eight months) and their caretakers. The LENA system is also used for research; while useful for researchers who wish to save transcription costs or observe the child in its natural state, the accuracy of this system, while often quite high, varies between contexts, for example notably in the case of hard of hearing children. Because of this, several researchers recommend caution in using only the LENA system on its own for the purposes of scientific research. == History == The LENA Foundation was established in 2009 by Terrance and Judith Paul, founders of Renaissance Learning, Inc., with the purpose of aiding children with disabilities and assisting with early learning. They were inspired by the book "Meaningful Differences in the Everyday Experience of American Children" by Dr. Betty Hart and Dr. Todd Risley. A pilot version of the LENA system was launched in February 2006. The LENA Research Foundation was registered as a tax-exempt 501(c)(3) nonprofit in September 2010. The organisation was renamed simply LENA in 2018 and adopted the tagline "Building brains through early talk." LENA has been used for parental feedback, linguistics or paediatrics research, and for specific clinical cases. == Scientific background == In 2018, research using the LENA system showed that there was a link between children's conversational turns and activation of Broca's area (a part of the brain responsible, although not necessarily essential, for language processing). The LENA foundation cites research by its own employees as evidence for the scientific basis of its technology. Said research claims that verbal interaction with young children has an effect on language acquisition, including verbal comprehension skills during adolescence. == LENA System == The LENA software analyses a child's natural language environment, such as verbal exposure, and provides several metrics, such as adult and child speech time, television/recorded audio time, word count, or conversation turn count. The LENA hardware is a recorder that is usually placed into a child's specially-designed vest. The software was trained on over 65,000 hours of manually annotated American English audio recordings. It splits the audio into segments which are categorised as "key child", "other child", "male adult", "noise", etc. The advantages of LENA as opposed to manual transcription are its speed and ease of use; the disadvantages are its potential inaccuracies and lack of transcription capability (which LENA does not profess to attempt). The LENA system has also been criticised for prioritising quantity of speaking over quality (i.e., mastery of the language, as opposed to babble). == Product lines == === LENA Start === LENA Start is a program for parents that utilises feedback from the LENA System in conjunction with weekly group sessions in order to address the home language environment. It was introduced in 2015 and implemented across several U.S. states. In October 2020, during the restrictions of the COVID-19 pandemic, Read Aloud Delaware began a virtual LENA Start program with families statewide, where parents received feedback and participated in one-hour Zoom workshops each week during the 10-week program. === LENA Grow === LENA Grow is a professional development program for teachers in early childhood classrooms. Before launching at sites around the country, the program was first piloted in Escambia County, Florida. === LENA Home === LENA Home is a supplement to existing parent coaching curricula. Typically, home visitors facilitate the use of the LENA System to help parents track their progress towards increasing interactive talk in their homes. === Developmental Snapshot === The LENA Developmental Snapshot, based on a 52-question parent survey, assesses both expressive and receptive language skills and provides an estimate of a child's developmental age from 2 months to 36 months.

    Read more →
  • Abu Dhabi Autonomous Racing League

    Abu Dhabi Autonomous Racing League

    The Abu Dhabi Autonomous Racing League (A2RL) is an autonomous racing league based in Abu Dhabi and organized by ASPIRE, part of the UAE government's Advanced Technology Research Council. It has three distinct categories: the "car race", the drone race, and the buggy race. The first car race was held on 27 April 2024 at the Yas Marina Circuit, marking the first major autonomous formula race outside the US since the now-folded Roborace championship. The first drone race was held on 11 and 12 April 2025. == Formats == A2RL has three distinct formats, the formula racing format (dubbed the Car Race), the quadcopter drone racing format (dubbed the Drone Race), and the off-road dune buggy racing format (dubbed the Buggy Race). === Car Race === A2RL's main event, the car race is a standard formula racing format with self-driving formula cars. The cars are made by Dallara and are modified versions of Super Formula cars with Yokohama tires. These cars had the CPUs of their AIs mounted where the driver's seat is on a non-modified chassis, as well as hydraulic actuators for AI control of the vehicle, multiple sensor systems including LIDAR and GPS, and a large LED indicator showing the status of the AI. The first car race was held on 27 April 2024. This race was marked by the cars' subpar performance: Out of four cars that qualified, only two finished the race - the other two did not. The next race was held on 15 November 2025, with 11 teams. ==== Technical specifications ==== The full list of technical specifications are as follows: Chassis: Dallara EAV24 (modified Dallara SF23) Forward suspension: Pushrod type, torsion bar spring, adjustable dampers, third element Rear suspension: Pushrod type, torsion bar, coil springs, adjustable dampers, third element Tires: Yokohama Advan Drive-by-wire system: Provided by Meccanica 42, the DBW system consists of steering and brake actuators, with a central ECU that coordinates the driving actions and reacts to any critical situation in real-time. Brakes: Brembo calipers, Brembo carbon discs, electro-hydraulically activated Engine: 4 Piston Racing K20C1 (based on Honda 2.0l; turbocharged 4-cylinder engine) Gearbox: 3MO 6-speed gearbox Sensor suite: 7x Sony IMX728 cameras, 4x ZF ProWave radar units, 3x Seyond Falcon Kinetic lidar units Main computer: Neousys RGS-8805GC ==== Races held ==== === Drone Race === Created in partnership with the Drone Champions' League, the drone race is the quadcopter drone racing aerial format of the A2RL. The first race was held on 11/12 April 2025 at the ADNEC Marina Hall. 10 teams are scheduled to take part. === Buggy Race === The buggy race will be the off-road format of the A2RL using self-driving dune buggies. No date or number of teams has been announced for the first race. === Other events === A2RL is known to host AI vs AI and Human vs AI events, in Abu Dhabi and abroad. One such event took place at the Suzuka Circuit in Japan. The Human vs AI race was precluded due to AI car "Yalla" crashing into the wall during the formation lap. == Team lists ==

    Read more →
  • Stable Diffusion

    Stable Diffusion

    Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing AI boom. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. Its development involved researchers from the CompVis Group at LMU Munich and Runway with a computational donation from Stability and training data from non-profit organizations. Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been released publicly, and an optimized version can run on most consumer hardware equipped with a modest GPU with as little as 2.4 GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services. == Development == Stable Diffusion originated from a project called Latent Diffusion, developed in Germany by researchers at LMU Munich in Munich and Heidelberg University. Four of the original 5 authors (Robin Rombach, Andreas Blattmann, Patrick Esser and Dominik Lorenz) later joined Stability AI and released subsequent versions of Stable Diffusion. The technical license for the model was released by the CompVis group at LMU Munich. Development was led by Patrick Esser of Runway and Robin Rombach of CompVis, who were among the researchers who had earlier invented the latent diffusion model architecture used by Stable Diffusion. Stability AI also credited EleutherAI and LAION (a German nonprofit which assembled the dataset on which Stable Diffusion was trained) as supporters of the project. == Technology == === Architecture === Diffusion models, introduced in 2015, are trained with the objective of removing successive applications of Gaussian noise on training images, which can be thought of as a sequence of denoising autoencoders. The name diffusion is from the thermodynamic diffusion, since they were first developed with inspiration from thermodynamics. Models in Stable Diffusion series before SD 3 all used a variant of diffusion models, called latent diffusion model (LDM), developed in 2021 by the CompVis (Computer Vision & Learning) group at LMU Munich. Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder. The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image. Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion. The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation. Finally, the VAE decoder generates the final image by converting the representation back into pixel space. The denoising step can be flexibly conditioned on a string of text, an image, or another modality. The encoded conditioning data is exposed to denoising U-Nets via a cross-attention mechanism. For conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts to an embedding space. Researchers point to increased computational efficiency for training and generation as an advantage of LDMs. With 860 million parameters in the U-Net and 123 million in the text encoder, Stable Diffusion is considered relatively lightweight by 2022 standards, and unlike other diffusion models, it can run on consumer GPUs, and even CPU-only if using the OpenVINO version of Stable Diffusion. ==== SD XL ==== The XL version uses the same LDM architecture as previous versions, except larger: larger UNet backbone, larger cross-attention context, two text encoders instead of one, and trained on multiple aspect ratios (not just the square aspect ratio like previous versions). The SD XL Refiner, released at the same time, has the same architecture as SD XL, but it was trained for adding fine details to preexisting images via text-conditional img2img. ==== SD 3.0 ==== The 3.0 version completely changes the backbone. Not a UNet, but a Rectified Flow Transformer, which implements the rectified flow method with a Transformer. The Transformer architecture used for SD 3.0 has three "tracks", for original text encoding, transformed text encoding, and image encoding (in latent space). The transformed text encoding and image encoding are mixed during each transformer block. The architecture is named "multimodal diffusion transformer (MMDiT), where the "multimodal" means that it mixes text and image encodings inside its operations. This differs from previous versions of DiT, where the text encoding affects the image encoding, but not vice versa. === Training data === Stable Diffusion was trained on pairs of images and captions taken from LAION-5B, a publicly available dataset derived from Common Crawl data scraped from the web, where 5 billion image-text pairs were classified based on language and filtered into separate datasets by resolution, a predicted likelihood of containing a watermark, and predicted "aesthetic" score (e.g. subjective visual quality). The dataset was created by LAION, a German non-profit which receives funding from Stability AI. The Stable Diffusion model was trained on three subsets of LAION-5B: laion2B-en, laion-high-resolution, and laion-aesthetics v2 5+. A third-party analysis of the model's training data identified that out of a smaller subset of 12 million images taken from the original wider dataset used, approximately 47% of the sample size of images came from 100 different domains, with Pinterest taking up 8.5% of the subset, followed by websites such as WordPress, Blogspot, Flickr, DeviantArt and Wikimedia Commons. An investigation by Bayerischer Rundfunk showed that LAION's datasets, hosted on Hugging Face, contain large amounts of private and sensitive data. === Training procedures === The model was initially trained on the laion2B-en and laion-high-resolution subsets, with the last few rounds of training done on LAION-Aesthetics v2 5+, a subset of 600 million captioned images which the LAION-Aesthetics Predictor V2 predicted that humans would, on average, give a score of at least 5 out of 10 when asked to rate how much they liked them. The LAION-Aesthetics v2 5+ subset also excluded low-resolution images and images which LAION-5B-WatermarkDetection identified as carrying a watermark with greater than 80% probability. Final rounds of training additionally dropped 10% of text conditioning to improve Classifier-Free Diffusion Guidance. The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of $600,000. === Limitations === Stable Diffusion has issues with degradation and inaccuracies in certain scenarios. Initial releases of the model were trained on a dataset that consists of 512×512 resolution images, meaning that the quality of generated images noticeably degrades when user specifications deviate from its "expected" 512×512 resolution; the version 2.0 update of the Stable Diffusion model later introduced the ability to natively generate images at 768×768 resolution. Another challenge is in generating human limbs due to poor data quality of limbs in the LAION database. The model is insufficiently trained to replicate human limbs and faces due to the lack of representative features in the database, and prompting the model to generate images of such type can confound the model. In addition to human limbs, Stable Diffusion is unable to generate legible ambigrams and some other forms of text and typography. Stable Diffusion XL (SDXL) version 1.0, released in July 2023, introduced native 1024x1024 resolution and improved generation for limbs and text. Accessibility for individual developers can also be a problem. In order to customize the model for new use cases that are not included in the dataset, such as generating anime characters ("waifu diffusion"), new data and further training are required. Fine-tuned adaptations of Stable Diffusion created through additional retraining have been used for a variety of different use-cases, from medical imaging to algorithmically generated music. However, this fine-tuning process is sensitive to the quality of new data; low resolution images or different resolutions from the original data can not only fail to learn the new task but degrade the overall performance of the model. Even when the model is additionally trained on high quality images, it is difficult for individuals to run models in consumer electronics. For example, the training process for waifu-diffusion requires a minimum 30 GB of VRAM, which exceeds the usual resource provided in such consumer GPUs as Nvidia's GeForce 30 series, w

    Read more →