AI Data Bias

AI Data Bias — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Butler in a Box

    Butler in a Box

    Butler in a Box was an early voice-controlled home automation device developed in 1983 by magician Gus Searcy and programmer Franz Kavan. The device allowed users to control various home electronics, such as lights and phones, using voice commands. It predated modern smart speakers and virtual assistants by several decades. == History == The idea for the Butler in a Box originated in 1983 when Searcy was asked by friends why he couldn't simply command lights to turn on and off if he could pull rabbits out of hats, given his background as a professional magician. Searcy partnered with former IBM programmer Kavan to develop the device, with their first prototype being named "Sidney". The Butler in a Box combined remote control technology with voice recognition to enable control of home devices. However, it faced challenges due to the technological limitations of the era and its high price point of nearly $1,500 (equivalent to around $3,700 in 2021). == Features and functionality == Users could activate the Butler in a Box by speaking a wake word, typically a traditional butler name, and the device would address the user as "boss". It was capable of performing tasks such as: Turning lights on and off, controlling individual zones if lights were connected to remote control modules Making and receiving phone calls Setting timers Pairing with sensors to function as a security alarm system However, the device required extensive voice training for each user, a time-consuming process compared to modern voice recognition. Additionally, settings and trained commands would be lost if power was out for over 3 hours due to the volatile memory technology used at the time. == Reception and legacy == While innovative for its time, the Butler in a Box did not achieve widespread commercial success due to its high price and the technical limitations of the 1980s. Nevertheless, it served as an important early step in the development of home automation and showcased the potential for voice-controlled technology to enhance accessibility and convenience in the home. Decades later, products like Amazon Alexa, Google Home, and Apple's Siri would make voice-controlled smart home devices commonplace and affordable, building on the groundwork laid by early attempts like the Butler in a Box.

    Read more →
  • Digital organism

    Digital organism

    A digital organism is a self-replicating computer program that mutates and evolves. Digital organisms are used as a tool to study the dynamics of Darwinian evolution, and to test or verify specific hypotheses or mathematical models of evolution. The study of digital organisms is closely related to the area of artificial life. == History == Digital organisms can be traced back to the game Darwin, developed in 1961 at Bell Labs, in which computer programs had to compete with each other by trying to stop others from executing . A similar implementation that followed this was the game Core War. In Core War, it turned out that one of the winning strategies was to replicate as fast as possible, which deprived the opponent of all computational resources. Programs in the Core War game were also able to mutate themselves and each other by overwriting instructions in the simulated "memory" in which the game took place. This allowed competing programs to embed damaging instructions in each other that caused errors (terminating the process that read it), "enslaved processes" (making an enemy program work for you), or even change strategies mid-game and heal themselves. Steen Rasmussen at Los Alamos National Laboratory took the idea from Core War one step further in his core world system by introducing a genetic algorithm that automatically wrote programs. However, Rasmussen did not observe the evolution of complex and stable programs. It turned out that the programming language in which core world programs were written was very brittle, and more often than not mutations would completely destroy the functionality of a program. The first to solve the issue of program brittleness was Thomas S. Ray with his Tierra system, which was similar to core world. Ray made some key changes to the programming language such that mutations were much less likely to destroy a program. With these modifications, he observed for the first time computer programs that did indeed evolve in a meaningful and complex way. Later, Chris Adami, Titus Brown, and Charles Ofria started developing their Avida system, which was inspired by Tierra but again had some crucial differences. In Tierra, all programs lived in the same address space and could potentially execute or otherwise interfere with each other's code. In Avida, on the other hand, each program lives in its own address space. Because of this modification, experiments with Avida became much cleaner and easier to interpret than those with Tierra. With Avida, digital organism research has begun to be accepted as a valid contribution to evolutionary biology by a growing number of evolutionary biologists. Evolutionary biologist Richard Lenski of Michigan State University has used Avida extensively in his work. Lenski, Adami, and their colleagues have published in journals such as Nature and the Proceedings of the National Academy of Sciences (USA). In 1996, Andy Pargellis created a Tierra-like system called Amoeba that evolved self-replication from a randomly seeded initial condition. More recently REvoSim - a software package based around binary digital organisms - has allowed evolutionary simulations of large populations that can be run for geological timescales.

    Read more →
  • Veo (text-to-video model)

    Veo (text-to-video model)

    Veo, or Google Veo, is a text-to-video model developed by Google DeepMind and announced in May 2024. As a generative AI model, it creates videos based on user prompts. Veo 3, released in May 2025, can also generate accompanying audio. == Development == In May 2024, a multimodal video generation model called Veo was announced at Google I/O 2024. Google claimed that it could generate 1080p videos over a minute long. In December 2024, Google released Veo 2, available via VideoFX. It supports 4K resolution video generation and has an improved understanding of physics. In April 2025, Google announced that Veo 2 became available for advanced users on the Gemini app. In May 2025, Google released Veo 3, which not only generates videos but also creates synchronized audio — including dialogue, sound effects, and ambient noise — to match the visuals. Google also announced Flow, a video-creation tool powered by Veo and Imagen. Google DeepMind CEO Demis Hassabis described the release as the moment when AI video generation left the era of the silent film. This was rebranded as Google Flow at the 2026 Google I/O keynote, along with the announcement of Google Flow Music. == Capabilities == Google Veo can be purchased at multiple subscription tiers and through Google "AI credits". The software itself can be run by two different consoles, Google Gemini and Google Flow. Gemini being geared towards shorter, quicker, and faster projects, using the Gemini AI chat model, with Google Flow, which is essentially a movie editor allowing users to create longer projects with continuity, using the same characters and actors. Users can create a maximum of eight seconds per clip. According to Gizmodo Veo 3 users were directing the model to generate low-quality content, such as man on the street interviews or haul videos of people unboxing products. 404 Media reported that the tool tended to repeat the same joke in response to different prompts. Commentators speculated that Google had trained the service on YouTube videos or Reddit posts. Google itself had not stated the source of its training content. In July 2025, Media Matters for America reported that racist and antisemitic videos generated using Veo 3 were being uploaded to TikTok. Ryan Whitwam of Ars Technica commented, "In a perfect world, Veo 3 would refuse to create these videos, but vagueness in the prompt and the AI's inability to understand the subtleties of racist tropes (i.e., the use of monkeys instead of humans in some videos) make it easy to skirt the rules."

    Read more →
  • Hyperion Cantos

    Hyperion Cantos

    The Hyperion Cantos is a series of science fiction novels by Dan Simmons. The title was originally used for the collection of the first pair of books in the series, Hyperion and The Fall of Hyperion, and later came to refer to the overall storyline, including Endymion, The Rise of Endymion, and a number of short stories. More narrowly, inside the fictional storyline, after the first volume, the Hyperion Cantos is an epic poem written by the character Martin Silenus covering in verse form the events of the first two books. Of the four novels, Hyperion received the Hugo and Locus Awards in 1990; The Fall of Hyperion won the Locus and British Science Fiction Association Awards in 1991; and The Rise of Endymion received the Locus Award in 1998. All four novels were also nominated for various science fiction awards. == Works == === Hyperion (1989) === First published in 1989, Hyperion has the structure of a frame story, similar to Geoffrey Chaucer's Canterbury Tales and Giovanni Boccaccio's Decameron. The story weaves the interlocking tales of a diverse group of travelers sent on a pilgrimage to the Time Tombs on Hyperion. The travelers have been sent by the Hegemony (the government of the human star systems), the All Thing, and the Church of the Final Atonement, alternately known as the Shrike Church, to make a request of the Shrike. As they progress in their journey, each of the pilgrims tells their tale. === The Fall of Hyperion (1990) === This book concludes the story begun in Hyperion. It abandons the storytelling frame structure of the first novel, and is instead presented primarily as a series of dreams by John Keats. === Endymion (1996) === The story commences 274 years after the events in the previous novel. Few main characters from the first two books are present in the later two. The main character is Raul Endymion, an ex-soldier who receives a death sentence after an unfair trial. He is rescued by Martin Silenus and asked to perform a series of rather extraordinarily difficult tasks. The main task is to rescue and protect the daughter of Brawne Lamia (one of the main characters of Hyperion), Aenea, a messiah coming from the time period just after the first books via time travel. The Catholic Church has become a dominant force in the human universe and views Aenea as a potential threat to their power. The group of Aenea, Endymion, and A. Bettik (an android) evades the Church's forces on several worlds through use of the Consul's spaceship, ending the story on Earth. === The Rise of Endymion (1997) === This final novel in the series finishes the story begun in Endymion, expanding on the themes in Endymion, as Raul and Aenea battle the Church and meet their respective destinies. === Short stories === The series also includes three short stories: "Remembering Siri" (1983, included almost verbatim in Hyperion) "The Death of the Centaur" (1990) "Orphans of the Helix" (1999) == Development == The Hyperion universe originated when Simmons was an elementary school teacher, as an extended tale he told at intervals to his young students; this is recorded in "The Death of the Centaur", and its introduction. It then inspired his short story "Remembering Siri", which eventually became the nucleus around which Hyperion and The Fall of Hyperion formed. After the quartet was published came the short story "Orphans of the Helix". "Orphans" is currently the final work in the Cantos, both chronologically and internally. The original Hyperion Cantos has been described as a novel published in two volumes, published separately at first for reasons of length. In his introduction to "Orphans of the Helix", Simmons elaborates: Some readers may know that I've written four novels set in the "Hyperion Universe"—Hyperion, The Fall of Hyperion, Endymion, and The Rise of Endymion. A perceptive subset of those readers—perhaps the majority—know that this so-called epic actually consists of two long and mutually dependent tales, the two Hyperion stories combined and the two Endymion stories combined, broken into four books because of the realities of publishing. == Influences == Much of the appeal of the series stems from its extensive use of references and allusions from a wide array of thinkers such as Teilhard de Chardin, John Muir, Norbert Wiener, and to the poetry of John Keats, the famous 19th-century English Romantic poet, Norse mythology, and the monk Ummon. A large number of technological elements are acknowledged by Simmons to be inspired by elements of Out of Control: The New Biology of Machines, Social Systems, and the Economic World. The Hyperion series has many echoes of Jack Vance, explicitly acknowledged in one of the later books. The title of the first novel, "Hyperion", is taken from one of Keats's poems, the unfinished epic Hyperion. Similarly, the title of the third novel is from Keats' poem Endymion. Quotes from actual Keats poems and the fictional Cantos of Martin Silenus are interspersed throughout the novels. Simmons goes so far as to have two artificial reincarnations of John Keats ("cybrids": artificial intelligences in human bodies) play a major role in the series. == Setting == Much of the action in the series takes place on the planet Hyperion. It is described as having one-fifth less gravity than Earth standard. Hyperion has a number of peculiar indigenous flora and fauna, notably Tesla trees, which are essentially large electricity-spewing trees. It is also a "labyrinthine" planet, which means that it is home to ancient subterranean labyrinths of unknown purpose. Most importantly, Hyperion is the location of the Time Tombs, large artifacts surrounded by "anti-entropic" fields that allow them to move backward through time. In the fictional universe of the Hyperion Cantos, the Hegemony of Man encompasses over 200 planets. Faster than light communications technology, Fatlines, are said to operate through tachyon bursts. However, in later books it is revealed that they operate through the Void Which Binds. The Farcaster network was given to humanity by the TechnoCore and again it was another use of the Void Which Binds that allowed this instantaneous travel between worlds. The Hawking Drive was developed by human scientists, allowing the faster than light travel which led to the Hegira (from the Arabic word هجرة Hijra, meaning 'migration'). The Gideon drive, a Core-provided starship drive, allows for near-instantaneous travel between any two points in human-occupied space. The drive's use kills any human on board a Gideon-propelled starship; thus, the technology is only of use with remote probes or when used in conjunction with the Pax's resurrection technology. The resurrection creche can regenerate someone carrying a cruciform from their remains. Treeships are living trees that are propelled by ergs (spider-like solid-state alien being that emits force fields) through space. === The Shrike === The region of the Tombs is also the home of the Shrike, a menacing half-mechanical, half-organic four-armed creature that features prominently in the series. It appears in all four Hyperion Cantos books and is an enigma in the initial two; its purpose is not revealed until the second book, but is still left nebulous. The Shrike appears to act both autonomously and as a servant of some unknown force or entity. In the first two Hyperion books, it exists solely in the area around the Time Tombs on the planet Hyperion. Its portrayal is changed significantly in the last two books, Endymion and The Rise of Endymion. In these novels, the Shrike appears effectively unfettered and protects the heroine Aenea against assassins of the opposing TechnoCore. Surrounded in mystery, the object of fear, hatred, and even worship by members of the Church of the Final Atonement (the Shrike Cult), the Shrike's origins are described as uncertain. It is portrayed as composed of razorwire, thorns, blades, and cutting edges, having fingers like scalpels and long, curved toe blades. It has the ability to control the flow of time, and may thus appear to travel infinitely fast. The Shrike may kill victims in a flash or it may transport them to an eternity of impalement upon an enormous artificial 'Tree of Thorns,' or 'Tree of Pain' in Hyperion's distant future. The Tree of Thorns is described as an unimaginably large, metallic tree, alive with the agonized writhing of countless human victims of all ages and races. It is also hinted in the second book that the Tree of Thorns is actually a simulation generated by a mystical interface which connects to human brains via a strong and pulsing (as if it were alive) cord. The name Shrike seems a reference to birds of the shrike family, a family of birds that impales their victims on thorns, spines, or twigs. === Worlds and Systems === In the fictional universe of the Hyperion Cantos, the Hegemony of Man encompasses over 200 planets. The following planets appear or are specifically mentioned in the Hyperion Cantos. Planets of

    Read more →
  • Capture the flag (cybersecurity)

    Capture the flag (cybersecurity)

    In computer security, Capture the Flag (CTF) is an exercise in which participants attempt to find text strings, called "flags", which are secretly hidden in purposefully vulnerable programs or websites. They can be used for both competitive or educational purposes. In two main variations of CTFs, participants either steal flags from other participants (attack/defense-style CTFs) or from organizers (jeopardy-style challenges). A mixed competition combines these two styles. Competitions can include hiding flags in hardware devices, they can be both online or in-person, and can be advanced or entry-level. The game is inspired by the traditional outdoor sport with the same name. CTFs are used as a tool for developing and refining cybersecurity skills, making them popular in both professional and academic settings. == Overview == Capture the Flag (CTF) is a cybersecurity competition that is used to test and develop computer security skills. It was first developed in 1996 at DEF CON, the largest cybersecurity conference in the United States which is hosted annually in Las Vegas, Nevada. The conference hosts a weekend of cybersecurity competitions, including their flagship CTF. Two popular CTF formats are jeopardy and attack-defense. Both formats test participant’s knowledge in cybersecurity, but differ in objective. In the Jeopardy format, participating teams must complete as many challenges of varying point values from a various categories such as cryptography, web exploitation, and reverse engineering. In the attack-defense format, competing teams must defend their vulnerable computer systems while attacking their opponent's systems. The exercise involves a diverse array of tasks, including exploitation and cracking passwords, but there is little evidence showing how these tasks translate into cybersecurity knowledge held by security experts. Recent research has shown that the Capture the Flag tasks mainly covered technical knowledge but lacked social topics like social engineering and awareness on cybersecurity. == Educational applications == CTFs have been shown to be an effective way to improve cybersecurity education through gamification. There are many examples of CTFs designed to teach cybersecurity skills to a wide variety of audiences, including PicoCTF, organized by the Carnegie Mellon CyLab, which is oriented towards high school students, and Arizona State University supported pwn.college. Beyond educational CTF events and resources, CTFs has been shown to be a highly effective way to instill cybersecurity concepts in the classroom. CTFs have been included in undergraduate computer science classes such as Introduction to Information Security at the National University of Singapore. CTFs are also popular in military academies. They are often included as part of the curriculum for cybersecurity courses, with the NSA organized Cyber Exercise culminating in a CTF competition between the US service academies and military colleges. == Competitions == Many CTF organizers register their competition with the CTFtime platform. This allows the tracking of the position of teams over time and across competitions. These include "Plaid Parliament of Pwning", "More Smoked Leet Chicken", "Dragon Sector", "dcua", "Eat, Sleep, Pwn, Repeat", "perfect blue", "organizers" and "Blue Water". Overall the "Plaid Parliament of Pwning" and "Dragon Sector" have both placed first worldwide the most with three times each. === Community competitions === Every year there are dozens of CTFs organized in a variety of formats. Many CTFs are associated with cybersecurity conferences such as DEF CON, various editions of SANS Institute's NetWars, HITCON, and BSides. The DEF CON CTF, an attack-defence CTF, is notable for being one of the oldest CTF competitions to exist, and has been variously referred to as the "World Series", "Superbowl", and "Olympics", of hacking by media outlets. The NYU Tandon hosted Cybersecurity Awareness Worldwide (CSAW) CTF is one of the largest open-entry competitions for students learning cybersecurity from around the world. In 2021, it hosted over 1200 teams during the qualification round. In addition to conference organized CTFs, many CTF clubs and teams organize CTF competitions. Many CTF clubs and teams are associated with universities, such as the CMU associated Plaid Parliament of Pwning, which hosts PlaidCTF, and the ASU associated Shellphish. Some community CTFs are online and open to all participants. The SANS Institute Holiday Hack Challenge and TryHackMe Advent of Cyber. === Government-supported competitions === Governmentally supported CTF competitions include the DARPA Cyber Grand Challenge and ENISA European Cybersecurity Challenge. In 2023, the US Space Force-sponsored Hack-a-Sat CTF competition included, for the first time, a live orbital satellite for participants to exploit. === Corporate-supported competitions === Corporations and other organizations sometimes use CTFs as a training or evaluation exercise, with benefits similar to those in educational settings. In addition to internal CTF exercises, some corporations such as Google and Tencent host publicly accessible CTF competitions. == In popular culture == In Mr. Robot, a qualification round for the DEF CON CTF competition is depicted in the season 3 opener "eps3.0_power-saver-mode.h". The logo for DEF CON can be seen in the background. In The Undeclared War, a CTF is depicted in the opening scene of the series as a recruitment exercise used by GCHQ. Go Go Squid!, a Chinese television series, is based around training for and competing in highly stylized CTF competitions .

    Read more →
  • Vague set

    Vague set

    In mathematics, vague sets are an extension of fuzzy sets. In a fuzzy set, each object is assigned a single value in the interval [0,1] reflecting its grade of membership. This single value does not allow a separation of evidence for membership and evidence against membership. Gau et al. proposed the notion of vague sets, where each object is characterized by two different membership functions: a true membership function and a false membership function. This kind of reasoning is also called interval membership, as opposed to point membership in the context of fuzzy sets. == Mathematical definition == A vague set V {\displaystyle V} is characterized by its true membership function t v ( x ) {\displaystyle t_{v}(x)} its false membership function f v ( x ) {\displaystyle f_{v}(x)} with 0 ≤ t v ( x ) + f v ( x ) ≤ 1 {\displaystyle 0\leq t_{v}(x)+f_{v}(x)\leq 1} The grade of membership for x is not a crisp value anymore, but can be located in [ t v ( x ) , 1 − f v ( x ) ] {\displaystyle [t_{v}(x),1-f_{v}(x)]} . This interval can be interpreted as an extension to the fuzzy membership function. The vague set degenerates to a fuzzy set, if 1 − f v ( x ) = t v ( x ) {\displaystyle 1-f_{v}(x)=t_{v}(x)} for all x. The uncertainty of x is the difference between the upper and lower bounds of the membership interval; it can be computed as ( 1 − f v ( x ) ) − t v ( x ) {\displaystyle (1-f_{v}(x))-t_{v}(x)} .

    Read more →
  • Argument Web

    Argument Web

    The Argument Web is a large-scale Web of interconnected arguments created by individuals as they express their opinions and interact with the opinions of others. The Argument Web aims to make online debate intuitive for participants such as mediators, students, academics, broadcasters and bloggers, to create a Web infrastructure that allows for the storage, automatic retrieval and analysis of linked argument data, and to improve the quality of online argument and debate. The Argument Web can be described as a portion of a larger Semantic Web. == AIFdb == AIFdb is a database implementation or ‘reification’ of the Argument Interchange Format (AIF), which allows for the storage and retrieval of AIF compliant argument structures. This database solution was provided as a foundation for an open, integrated Argument Web. It offers an extensive range of web services for interacting with stored argument data, while also offering search and argument visualisation features that are all consistent with the formal ontology of AIF. At a basic level, the AIFdb web services allow for the insertion and querying of basic components of an AIF argument, such as nodes, edges and schemes. Building upon this basis, it also facilitates more complex interactions with these AIF argument structures. Such complex queries could make it possible, for example, to determine all the statements made by a particular person in support a given I-Node. While, at its highest level of interaction, AIFdb can handle the import and export of many standard file formats, including SVG, DOT, RDF/XML and other formats of argument theory tools, like Carneades, Rationale and Araucaria. == Argument blogging == ArguBlogging is software which allows its users to select portions of hypertext on webpages in their Web browsers and to agree or disagree with the selected content, posting their arguments to their blogs with linked argument data. It is implemented as a bookmarklet, adding functionality to Web browsers and interoperating with blogging platforms such as Blogger and Tumblr.

    Read more →
  • DEAP (software)

    DEAP (software)

    Distributed Evolutionary Algorithms in Python (DEAP) is an evolutionary computation framework for rapid prototyping and testing of ideas. It incorporates the data structures and tools required to implement most common evolutionary computation techniques such as genetic algorithm, genetic programming, evolution strategies, particle swarm optimization, differential evolution, traffic flow and estimation of distribution algorithm. It is developed at Université Laval since 2009. == Example == The following code gives a quick overview how the Onemax problem optimization with genetic algorithm can be implemented with DEAP.

    Read more →
  • Micro stuttering

    Micro stuttering

    Micro stuttering is a visual artifact in real-time computer graphics in which the time intervals between consecutively displayed frames are uneven, even though the average frame rate reported by benchmarking software appears adequate. Tools such as 3DMark typically compute frame rates over intervals of one second or more, which can conceal momentary drops in the instantaneous frame rate that the viewer perceives as hitching or jerking of on-screen motion. At low frame rates the effect is visible as a stutter in moving images, degrading the experience in interactive applications such as video games. In severe cases a lower but more consistent frame rate can appear smoother than a higher but more erratic one. The term gained prominence in the late 2000s in discussions of multi-GPU rendering (see History), but micro stuttering also affects single-GPU systems. Common causes on modern hardware include real-time shader compilation, asset streaming from storage, VRAM exhaustion, and driver bugs. == Causes == === Shader compilation === A common cause of micro stuttering on modern PCs is real-time shader compilation. Shaders are small programs that instruct the GPU on how to render visual effects such as lighting, shadows, and reflections. On consoles, developers can pre-compile all shaders for the known, fixed hardware. On PCs, the variety of GPU architectures means shaders must often be compiled at run time, either when the game launches or during gameplay itself. When the rendering engine encounters a shader that has not yet been compiled, the CPU must finish the compilation before the GPU can draw the affected object. This causes a spike in frame time that the player perceives as a hitch. The problem has been particularly associated with games built on Unreal Engine 4 running under DirectX 12, because DX12 shifts more shader management responsibility to the application. Several techniques exist to reduce shader compilation stutter. Pipeline State Object (PSO) pre-caching records the shader permutations used at runtime so that they can be compiled in advance on subsequent launches. Asynchronous shader compilation moves the work to background CPU threads to avoid blocking the main rendering thread. Platform-level services such as Steam's shader pre-caching distribute previously compiled shaders to users with matching GPU hardware. The Steam Deck, which contains a single fixed GPU, benefits from pre-compiled shader caches because all units share the same hardware configuration. === Other causes === Micro stuttering on single-GPU systems can have several additional causes. CPU bottlenecks or scheduling interruptions from background tasks can prevent the processor from preparing frames at regular intervals. Asset streaming during gameplay (loading textures, geometry, or audio from storage) can produce hitches sometimes called traversal stutter; the use of solid-state drives and technologies such as DirectStorage has reduced but not eliminated this. VRAM exhaustion forces data to be swapped between video memory and system memory over the PCI Express bus, which is slower. Graphics driver bugs can also introduce stutter; Nvidia released hotfix driver 551.46 in February 2024 to correct intermittent micro stuttering when V-Sync was enabled. == Measurement == Micro stuttering drew attention to the limitations of average frame rate as a performance metric. In 2013, Scott Wasson at The Tech Report published a series of articles advocating frame time analysis, in which the delivery time of every individual frame is recorded and plotted rather than collapsed into a single frames-per-second figure. This approach was adopted by other hardware review publications in the following years. GPU reviews now routinely report 1% low and 0.1% low frame rates alongside the average. The 1% low is the average frame rate of the slowest 1% of frames in a sample; it serves as an indicator of worst-case smoothness. A large gap between the average and the 1% low suggests poor frame pacing. Tools for capturing per-frame timing data include FRAPS, PresentMon, OCAT, CapFrameX, and MSI Afterburner with RivaTuner Statistics Server. == Mitigation == === Frame pacing === Frame pacing is a software technique that regulates the timing of frame delivery to produce even intervals between displayed frames. Game engines, GPU drivers, and platform libraries all implement frame pacing strategies to varying degrees. On mobile platforms, Google provides the Android Frame Pacing library (Swappy) as part of the Android Game Development Kit. In December 2025, the Khronos Group published the VK_EXT_present_timing Vulkan extension, giving developers explicit control over presentation timing in a cross-platform graphics API for the first time. === Variable refresh rate === Variable refresh rate (VRR) display technologies allow a monitor's refresh rate to change to match the GPU's frame output. Implementations include Nvidia G-Sync (2013), AMD FreeSync (2015), and the VESA Adaptive-Sync standard built into DisplayPort 1.2a and later. VRR eliminates the screen tearing that results from a mismatch between frame rate and refresh rate, and avoids the frame-holding behaviour of V-Sync that can itself cause stutter. It is effective at smoothing moderate frame rate fluctuations but cannot compensate for large sudden spikes in frame time such as those caused by shader compilation or heavy asset streaming. VRR support has become standard in gaming monitors, televisions (via HDMI 2.1), and the Xbox Series X/S and PlayStation 5 consoles. === Frame generation === Beginning with DLSS 3 on the GeForce RTX 40 series in 2022, Nvidia introduced AI-based frame generation, which uses dedicated optical flow hardware and a neural network to create new frames between traditionally rendered ones. AMD followed with FSR 3 in 2023, using an algorithmic approach, and the AI-based FSR 4 for the Radeon RX 9000 series in 2025. DLSS 4, released in January 2025 for the GeForce RTX 50 series, can generate up to three frames per rendered frame using a technique called Multi Frame Generation. Frame generation increases the displayed frame rate but introduces its own frame pacing concerns. If the underlying rendered frames are unevenly timed, the interpolated frames can make the unevenness more apparent rather than less. DLSS 4 addresses this with hardware-level flip metering on the GPU's display engine, which controls the timing of frame presentation more precisely than the CPU-based pacing used in DLSS 3. Both vendors pair frame generation with latency-reduction features (Nvidia Reflex and AMD Anti-Lag+) to offset the additional input latency that results from inserting synthetic frames into the pipeline. === Frame rate limiters === Capping the frame rate below the display's maximum refresh rate, using tools such as RivaTuner Statistics Server, in-game limiters, or driver-level settings, is a common way to improve frame pacing. Preventing the GPU from running ahead of the display reduces variability in frame delivery times and can produce a smoother result than an uncapped but more irregular frame rate. == History == === Multi-GPU configurations === Micro stuttering was first widely documented in the late 2000s as a side effect of multi-GPU configurations using Alternate Frame Rendering (AFR), in which consecutive frames are assigned to alternating GPUs. Because each GPU may take a different amount of time to complete its assigned frame — due to varying scene complexity, driver scheduling, or inter-GPU communication overhead — the resulting frame delivery is irregular even when the average frame rate is high. Both Nvidia SLI and AMD CrossFireX were affected, with dual-GPU setups exhibiting the worst frame pacing irregularities. In 2012 benchmarks using Battlefield 3, dual Radeon HD 7970 cards in CrossFire showed 85% variation in frame delivery times compared with 7% for a single card, while dual GeForce GTX 680 cards in SLI showed only 7% variation compared with 5% for a single card. Multi-GPU micro stuttering became a significant factor in the eventual decline and discontinuation of consumer multi-GPU gaming. Nvidia restricted SLI to a handful of enthusiast-class cards from the GeForce 10 series onward, then replaced it with NVLink on the GeForce RTX 20 series, which saw limited gaming adoption. AMD ceased active CrossFire development around 2017. By the mid-2020s, neither vendor's current consumer GPUs support multi-GPU rendering for games. Other factors that contributed to the decline include DirectX 12 placing multi-GPU support in the hands of game developers rather than driver authors, the incompatibility of temporal anti-aliasing and other temporal rendering techniques with AFR, and the increasing size, power draw, and cost of individual GPUs. The third-party utility RadeonPro could reduce CrossFire micro stuttering through dynamic V-Sync and frame pacing adjustments, and AMD later introduced a driver-level frame paci

    Read more →
  • Sword Art Online

    Sword Art Online

    Sword Art Online (Japanese: ソードアート・オンライン, Hepburn: Sōdo Āto Onrain) is a Japanese light novel series written by Reki Kawahara and illustrated by abec. The series takes place in the 2020s and focuses on protagonists Kazuto "Kirito" Kirigaya and Asuna Yuuki as they play through various virtual reality MMORPG worlds, and later their involvement in the matters of a simulated civilization. Kawahara originally released the series as a web novel on his website from 2002 to 2008. The light novels began publication on ASCII Media Works' Dengeki Bunko imprint from April 10, 2009, with a spin-off series launching in October 2012. The series has spawned twelve manga adaptations published by ASCII Media Works and Kadokawa. The novels and the manga adaptations have been licensed for release in North America by Yen Press. An anime television series produced by A-1 Pictures, known simply as Sword Art Online, aired in Japan between July and December 2012, with a television film Sword Art Online: Extra Edition airing on December 31, 2013, and a second season, titled Sword Art Online II, airing between July and December 2014. An animated film titled Sword Art Online the Movie: Ordinal Scale, featuring an original story by Kawahara, premiered in Japan and Southeast Asia on February 18, 2017, and was released in the United States on March 9, 2017. A spin-off anime series titled Sword Art Online Alternative: Gun Gale Online premiered in April 2018, while a third season titled Sword Art Online: Alicization aired from October 2018 to September 2020. An anime film adaptation of Sword Art Online: Progressive titled Sword Art Online Progressive: Aria of a Starless Night premiered on October 30, 2021. A second film titled Sword Art Online Progressive: Scherzo of Deep Night premiered on October 22, 2022. Many video games based on the series have been released for consoles, PC, and mobile devices. Sword Art Online has achieved widespread commercial success, with the light novels having over 30 million copies sold worldwide. The anime series has received mixed to positive reviews, with praise for its animation, musical score, and exploration of the psychological aspects of virtual reality, but it has also been met with criticisms for its pacing and writing. == Synopsis == === Setting === The light novel series spans several virtual reality worlds, beginning with the game, Sword Art Online (SAO), which is set in a world known as Aincrad. Each world is built on a game engine called Cardinal system, which was initially developed specifically for SAO by Akihiko Kayaba, but was later duplicated for Alfheim Online (ALO), and a consolidated package is later given to Kirito in the form of the World Seed, who had it leaked online with the successful intention of reviving the virtual reality industry. A third world known as Gun Gale Online (GGO) appears in the third arc and is stylized as a first-person shooter game instead of a role-playing game, and is the main setting of Alternative Gun Gale Online. It was created using the World Seed by an American company. A fourth world appears in the fourth arc known as the Underworld (UW). The world itself was created using the World Seed as a base, but it is as realistic as the real world due to using many powerful government resources to keep it running. === Plot === In 2022, a virtual reality massively multiplayer online role-playing game (VRMMORPG) called Sword Art Online (SAO) was released. With the NerveGear, a helmet that stimulates the user's five senses via their brain, players can experience and control their in-game characters with their minds. Both the game and the NerveGear were created by Akihiko Kayaba. On November 6, 10,000 players log into SAO's mainframe cyberspace for the first time, only to discover that they are unable to log out. Kayaba appears and tells the players that they must beat all 100 floors of Aincrad, a steel castle which is the setting of SAO, if they wish to be free. He also states that those who suffer in-game deaths or forcibly remove the NerveGear out-of-game will suffer real-life deaths. A player named Kazuto "Kirito" Kirigaya is one of 1,000 testers in the game's previous closed beta. With the advantage of previous VR gaming experience and a drive to protect other beta testers from discrimination, he isolates himself from the greater groups and plays the game alone, bearing the mantle of "beater", a portmanteau of "beta tester" and "cheater". As the players progress through the game Kirito eventually befriends a young woman named Asuna Yuuki, forming a relationship with and later marrying her in-game. After the duo discover the identity of Kayaba's secret ID, who was playing as "Heathcliff", the leader of the guild Asuna joined in, they confront and destroy him, freeing themselves and the other players from the game. In the real world, Kazuto discovers that 300 SAO players, including Asuna, remain trapped in their NerveGear. As he goes to the hospital to see Asuna, he meets Asuna's father Shouzou Yuuki who is asked by an associate of his, Nobuyuki Sugou, to make a decision, which Sugou later reveals to be his marriage with Asuna, angering Kazuto. Several months later, he is informed by Agil, another SAO survivor, that a figure similar to Asuna was spotted on "The World Tree" in another VRMMORPG cyberspace called Alfheim Online (ALO). Assisted in-game by his cousin and adoptive sister Suguha "Leafa" Kirigaya and Yui, a navigation pixie (originally an AI from SAO), he quickly learns that the trapped players in ALO are part of a plan conceived by Sugou to perform illegal experiments on their minds. The goal is to create the perfect mind-control for financial gain and to subjugate Asuna, whom he intends to marry in the real world, to assume control of her family's corporation. Kirito eventually stops the experiment and rescues the remaining 300 SAO players, foiling Sugou's plans. Before leaving ALO to see Asuna, Kayaba, who has uploaded his mind to the Internet using an experimental, destructively high-powered version of NerveGear at the cost of his life, entrusts Kirito with The Seed – a package program designed to create virtual worlds. Kazuto eventually reunites with Asuna in the real world after thwarting an attack from Sugou and The Seed is released onto the Internet, reviving Aincrad as other VRMMORPGs begin to thrive. One year after the events of SAO, at the prompting of a government official investigating strange occurrences in VR, Kazuto takes on a job to investigate a series of murders involving another VRMMORPG called Gun Gale Online (GGO), the AmuSphere (the successor of the NerveGear), and a player called Death Gun. Aided by a female player named Shino "Sinon" Asada, he participates in a gunfight tournament called the Bullet of Bullets (BoB) and discovers the truth behind the murders, which originated with a player who participated in a player-killing guild in SAO. Through his and Sinon's efforts, two suspects are captured, though the third suspect, Johnny Black, escapes. Kazuto is later recruited to test an experimental FullDive machine, Soul Translator (STL), which has an interface far more realistic and complex than the previous machine he had played, to help RATH, a research and development organization under the Ministry of Defense (MOD), develop an artificial intelligence named A.L.I.C.E. He tests the STL by entering the Underworld (UW), a virtual reality cyberspace created with The Seed package. In the UW, the flow of time proceeds a thousand times faster than in the real world, and Kirito's memories of what happens inside are restricted. However, when Johnny Black ambushes and mortally wounds Kazuto with suxamethonium chloride, RATH recovers Kazuto and places him back into the STL to preserve his mind while attempts are made to save him. During his time in Underworld, Kirito befriends Eugeo, a carver in a small village of Rulid, and helps him on a journey to save Alice Zuberg, his friend who was taken by a group of highly skilled warriors known as the Integrity Knights for accidentally breaking a rule of the Axiom Church, the leaders of the Human Empire. He and Eugeo soon find themselves uncovering the secrets of the Axiom Church, led by a woman only known as "The Administrator", and the true purpose of Underworld itself, while unbeknownst to them, a war against the opposing Dark Territory is brewing on the horizon. They meet Alice, now an Integrity Knight, and though she does not remember them, Kirito helps her remember her true identity: a form of true artificial intelligence known as A.L.I.C.E. In the battle against the Administrator, Kirito manages to slay her, though Eugeo dies in the process, to Kirito's dismay. Meanwhile, in the real world, conflict escalates as American forces raid RATH's facility in the Ocean Turtle in an effort to take A.L.I.C.E. for purposes unknown. Two of the attackers - Gabriel "Vecta" Miller and Vassago "Prince of Hell" Cassals - take contr

    Read more →
  • Vidby

    Vidby

    Vidby AG (stylized in lower-case) is a start-up based in Rotkreuz, Switzerland specializing in AI language translation for videos. Founded by Alexander Konovalov (uk:Олександр Коновалов) and Eugen von Rubinberg in September 2021, the company has especially garnered attention for its use in translating speeches given by President Volodymyr Zelenskyy during the Russian invasion of Ukraine. == History == Vidby AG was founded by Alexander Konovalov and Eugen von Rubinberg. Konovalov is a native of Ukraine and retains Ukrainian citizenship; Rubinberg came to Switzerland from Germany and holds German citizenship. Both are residents of Switzerland. The latter founded his first business, a trading company, at age 16. In 2013, the business partners launched a consumer-oriented video-call translation service called DROTR (Droid Translator) AG, utilizing a Konovalov-created AI-powered language translation technology enabling simultaneous translation of messages, voice and video calls in 104 languages (written), with 44 available in spoken form. This was the world's first video calling app with translation. The technology was pronounced a competitor of Skype and Viber by Forbes and claimed first prize at the "Innovative Breakthrough 2013" Competition. In 2021, with a new business-oriented focus, DROTR became Vidby, with the former Google technology partners Konovalov and Rubinberg remaining at the helm, each with the title Co-CEO. While headquartered in Switzerland, Vidby's development team is, according to the company's founders, based in Ukraine. The technology behind Vidby has an accuracy level variously reported as up to 99 percent or 99 to 100 percent, equalling the highest level of human translation. Additionally, the technology is capable of removing the original language while maintaining ambient sounds. Currently, some 70 languages plus 60 dialects are possible with the algorithm-based technology. == Notable use == In addition to its use with speeches delivered by Pope Francis, the technology has been provided to Ukrainian authorities and embassies during the ongoing military conflict with Russia free of remuneration. By July, 2022, some 70 speeches given by President Zelenskyy totalling 650 minutes had been translated into 30 languages, for a total of over 10,000 minutes of video material. Of its use in translating Zelenskyy's wartime speeches, Konovalov has said, "Like any citizen, I want to help defend my country." Notable corporate clients of Vidby include Samsung, Siemens, Cisco, Kärcher, Generali and McDonald's Corporation; an academic client is Harvard University. Google Cloud Technology Partner status of Vidby was confirmed officially after a six-month audit in December 2022. Denys Krasnikov, a Vidby co-founder, is responsible for cooperation with Google, YouTube, Microsoft, and other key partners. After the launch of multilingual YouTube channels, Vidby started AI translating and dubbing creators' videos for this new type of channel at the end of February 2023. == Accolades == Vidby headed a list of the five best video translation services as named by TechRadar Deutschland in September, 2022. In the same month, Tech Times named Vidby #1 in their list of the five best such services. It similarly topped a list of the five best content translation technologies as judged by European Business Review in October, 2022. Prior to these lead-position rankings (August, 2022), it was featured as Business Insider's special start-up recommendation (German: "Unser Lesetipp auf Gründerszene"). In 2023, YouTube recognized Vidby as its recommended vendor.

    Read more →
  • Nature Manifesto

    Nature Manifesto

    Nature Manifesto is an Immersive sound piece and multimedia installation by Icelandic artist Björk and artist and curator Aleph Molinari, created in collaboration with the French Institute for Research and Coordination in Acoustics/Music (IRCAM). The installation was showcased at the Centre Pompidou in Paris, France from November 20, 2024 to December 9, 2024, as part of the museum's "Biodiversity: Which Culture for Which Future?" forum. It combines natural soundscapes, calls of extinct animals reconstructed through artificial intelligence, and Björk's narration to address damages to biodiversity and the collapse of ecosystems. == Background == Björk's work intricately weaves themes of nature and technology, reflecting her deep engagement with both realms. In 2008, she co-founded the Náttúra campaign to protest the construction of foreign-backed aluminum factories in Iceland, aiming to protect the country's natural landscapes. She released the single "Náttúra" featuring Thom Yorke, with all proceeds supporting this environmental initiative. Her 2011 album Biophilia further exemplifies this synthesis, exploring the relationships between music, nature, and technology through a multimedia project that included interactive apps, custom-made instruments, and educational workshops. Björk's Cornucopia tour (2019-2023) seamlessly integrates themes of nature preservation and environmental activism, and featured a recorded message by Swedish climate activist Greta Thunberg. The tour's fusion of music, technology, and natural imagery reflects Björk's vision of a harmonious coexistence between humanity and nature, advocating for sustainable futures. Björk has previously used artificial intelligence in her works. In 2020, she collaborated with Microsoft to create Kórsafn, a sound installation for the Sister City Hotel lobby in New York City which used an AI-powered model that elaborated choral recordings from her discography through a sensor on the rooftop of the building that would generate music according to data like the weather and the seasons. For her charity single "Oral", featuring Spanish singer Rosalía, she released a music video directed by photographer and visual artist Carlota Guerrero, who used AI-generated deepfake versions of the artists. == Concept == Nature Manifesto is a three-minute and forty-second immersive sound piece. The composition merges Björk's voice, as she articulates a manifesto on biodiversity and the climate crisis, with cries of extinct and endangered animals, harmonizing them with natural soundscapes. The installation was curated by Chloé Siganos and Aleph Molinari, with associate curator Delphine Le Gatt. The primary goal of Nature Manifesto is to foster a deeper understanding of humanity's impact on the natural world. Conceived as a "post-optimistic" manifesto, Aleph Molinari stated that the project's purpose was to "offer a voice to nature". He stated that "the modern concept of nature itself is problematic [...] because it’s a concept born in the Romantic period and, with the rise of the industrial era, became an antithesis to human civilisation and everything urban. Nature came to define what was outside, the savage Other... But nature is everything that we’re part of." The soundscape features recreated calls of extinct and endangered species, developed in collaboration with the French sound research institute IRCAM. Artificial intelligence was employed to simulate the vocalizations of animals that no longer exist in the wild. To save energy and lessen the ecological impact of the use of AI, the research institute developed a "frugal AI" model capable of generating audio in real-time on local servers without a graphics processing unit. The sounds were then produced and edited by Björk in collaboration with Robin Meier Wiratunga and Bergur Þórisson. The installation was located within the Centre Pompidou's escalator, known as the "caterpillar". The installation was further supported by videos created by visual artist Sam Balfus (also known as Balfua) by using artificial intelligence, and edited by Santiago Molinari. == Activism == To sustain and broaden the themes presented in Nature Manifesto, Björk publicly urged French President Emmanuel Macron to prohibit bottom trawling within France's marine protected areas (MPA). She criticized the French government's claim of protecting 30% of its marine territories, highlighting that over 90% of these MPAs exist only on paper, allowing destructive practices like bottom trawling to continue unchecked. She collaborated with non-governmental organizations Sustainable Ocean Alliance, Ungir umhverfissinnar and Bloom, to advocate for genuine ocean conservation. Björk promoted the cause through her social media profiles by sharing petitions. In November 2024, Björk lent her Instagram account to French environmental activists to directly address Macron. The activists used the platform to call for stronger protection of the ocean, urging Macron to impose stricter restrictions on harmful fishing practices, particularly bottom trawling. == Reception == Nature Manifesto received mixed to positive reviews from critics. Some critiques focused on the installation's setting, suggesting that the movement inherent to the escalator space diminished the immersive potential of the soundscape. The choice of using artificial intelligence was also questioned. Björk and Molinari defended this, as both see AI as a tool that can be used creatively and sustainably, with Björk focusing on the importance of human input to give AI a "soul", and Molinari stressing the need for sustainable technological practices in the broader context of digital life. After the exhibition ended, Björk further opinionated: "this is how we will work in the future. [...] if there is no soul in tomorrow's music made by AI it is because [no one] put it there and we have to speak out and guard this as listeners", further stating that there is already "soulless muzak" [sic] on Spotify, "mass manufactured without the attention of creativity".

    Read more →
  • Automation integrator

    Automation integrator

    An automation integrator is a systems integrator company or individual who makes different versions of automation hardware and software work together, generally combining several subsystems to work together as one large system. The title may refer to those who only integrate hardware, although these will often work with software integrators. Software created by automation integrators allows devices to communicate with each other, as well as collecting and reporting data. The magazine Control Engineering publishes an annual “Automation Integrator Guide” which lists over 2,000 automation integrators. They also give an annual system integrator of the year award to three automation integration firms. The Control System Integrators Association (CSIA) maintains a buyers' guide of over 1200 member and nonmember systems integrators known as the Industrial Automation Exchange, or CSIA Exchange for short. == Certification == The Control System Integrators Association (CSIA) certifies automation integrators, through an audit based on 79 critical criteria from the best practices manual. Companies must be associate members of the CSIA to be eligible for certification. Integrators can also receive certification through a program launched in 2012 by the Robotics Industries Association. == Industries == Automation Integrators work in a wide variety of industries which use robotics and automation. Some of the most common include:

    Read more →
  • Deep learning speech synthesis

    Deep learning speech synthesis

    Deep learning speech synthesis refers to the application of deep learning models to generate natural-sounding human speech from written text (text-to-speech) or spectrum (vocoder). Deep neural networks are trained using large amounts of recorded speech and, in the case of a text-to-speech system, the associated labels and/or input text. == Formulation == Given an input text or some sequence of linguistic units Y {\displaystyle Y} , the target speech X {\displaystyle X} can be derived by X = arg ⁡ max P ( X | Y , θ ) {\displaystyle X=\arg \max P(X|Y,\theta )} where θ {\displaystyle \theta } is the set of model parameters. Typically, the input text will first be passed to an acoustic feature generator, then the acoustic features are passed to the neural vocoder. For the acoustic feature generator, the loss function is typically L1 loss (Mean Absolute Error, MAE) or L2 loss (Mean Square Error, MSE). These loss functions impose a constraint that the output acoustic feature distributions must be Gaussian or Laplacian. In practice, since the human voice band ranges from approximately 300 to 4000 Hz, the loss function will be designed to have more penalty on this range: l o s s = α loss human + ( 1 − α ) loss other {\displaystyle loss=\alpha {\text{loss}}_{\text{human}}+(1-\alpha ){\text{loss}}_{\text{other}}} where loss human {\displaystyle {\text{loss}}_{\text{human}}} is the loss from human voice band and α {\displaystyle \alpha } is a scalar, typically around 0.5. The acoustic feature is typically a spectrogram or Mel scale. These features capture the time-frequency relation of the speech signal, and thus are sufficient to generate intelligent outputs. The Mel-frequency cepstrum feature used in the speech recognition task is not suitable for speech synthesis, as it reduces too much information. == History == In September 2016, DeepMind released WaveNet, which demonstrated that deep learning-based models are capable of modeling raw waveforms and generating speech from acoustic features like spectrograms or mel-spectrograms. Although WaveNet was initially considered to be computationally expensive and slow to be used in consumer products at the time, a year after its release, DeepMind unveiled a modified version of WaveNet known as "Parallel WaveNet," a production model 1,000 faster than the original. This was followed by Google AI's Tacotron 2 in 2018, which demonstrated that neural networks could produce highly natural speech synthesis but required substantial training data—typically tens of hours of audio—to achieve acceptable quality. Tacotron 2 used an autoencoder architecture with attention mechanisms to convert input text into mel-spectrograms, which were then converted to waveforms using a separate neural vocoder. When trained on smaller datasets, such as 2 hours of speech, the output quality degraded while still being able to maintain intelligible speech, and with just 24 minutes of training data, Tacotron 2 failed to produce intelligible speech. In 2019, Microsoft Research introduced FastSpeech, which addressed speed limitations in autoregressive models like Tacotron 2. FastSpeech utilized a non-autoregressive architecture that enabled parallel sequence generation, significantly reducing inference time while maintaining audio quality. Its feedforward transformer network with length regulation allowed for one-shot prediction of the full mel-spectrogram sequence, avoiding the sequential dependencies that bottlenecked previous approaches. The same year saw the release of HiFi-GAN, a generative adversarial network (GAN)-based vocoder that improved the efficiency of waveform generation while producing high-fidelity speech. In 2020, the release of Glow-TTS introduced a flow-based approach that allowed for fast inference and voice style transfer capabilities. In March 2020, the free text-to-speech website 15.ai was launched. 15.ai gained widespread international attention in early 2021 for its ability to synthesize emotionally expressive speech of fictional characters from popular media with minimal amount of data. The creator of 15.ai (known pseudonymously as 15) stated that 15 seconds of training data is sufficient to perfectly clone a person's voice (hence its name, "15.ai"), a significant reduction from the previously known data requirement of tens of hours. 15.ai is credited as the first platform to popularize AI voice cloning in memes and content creation. 15.ai used a multi-speaker model that enabled simultaneous training of multiple voices and emotions, implemented sentiment analysis using DeepMoji, and supported precise pronunciation control via ARPABET. The 15-second data efficiency benchmark was later corroborated by OpenAI in 2024. == Semi-supervised learning == Currently, self-supervised learning has gained much attention through better use of unlabelled data. Research has shown that, with the aid of self-supervised loss, the need for paired data decreases. == Zero-shot speaker adaptation == Zero-shot speaker adaptation is promising because a single model can generate speech with various speaker styles and characteristic. In June 2018, Google proposed to use pre-trained speaker verification models as speaker encoders to extract speaker embeddings. The speaker encoders then become part of the neural text-to-speech models, so that it can determine the style and characteristics of the output speech. This procedure has shown the community that it is possible to use only a single model to generate speech with multiple styles. == Neural vocoder == In deep learning-based speech synthesis, neural vocoders play an important role in generating high-quality speech from acoustic features. The WaveNet model proposed in 2016 achieves excellent performance on speech quality. Wavenet factorised the joint probability of a waveform x = { x 1 , . . . , x T } {\displaystyle \mathbf {x} =\{x_{1},...,x_{T}\}} as a product of conditional probabilities as follows p θ ( x ) = ∏ t = 1 T p ( x t | x 1 , . . . , x t − 1 ) {\displaystyle p_{\theta }(\mathbf {x} )=\prod _{t=1}^{T}p(x_{t}|x_{1},...,x_{t-1})} where θ {\displaystyle \theta } is the model parameter including many dilated convolution layers. Thus, each audio sample x t {\displaystyle x_{t}} is conditioned on the samples at all previous timesteps. However, the auto-regressive nature of WaveNet makes the inference process dramatically slow. To solve this problem, Parallel WaveNet was proposed. Parallel WaveNet is an inverse autoregressive flow-based model which is trained by knowledge distillation with a pre-trained teacher WaveNet model. Since such inverse autoregressive flow-based models are non-auto-regressive when performing inference, the inference speed is faster than real-time. Meanwhile, Nvidia proposed a flow-based WaveGlow model, which can also generate speech faster than real-time. However, despite the high inference speed, parallel WaveNet has the limitation of needing a pre-trained WaveNet model, so that WaveGlow takes many weeks to converge with limited computing devices. This issue has been solved by Parallel WaveGAN, which learns to produce speech through multi-resolution spectral loss and GAN learning strategies.

    Read more →
  • Hindsight optimization

    Hindsight optimization

    Hindsight optimisation (HOP) is a computer science technique used in artificial intelligence for analysis of actions which have stochastic results. HOP is used in combination with a deterministic planner. By creating sample results for each of the possible actions from the given state (i.e. determinising the actions), and using the deterministic planner to analyse those sample results, HOP allows an estimate of the actual action.

    Read more →