AI Detector Xero

AI Detector Xero — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Word error rate

    Word error rate

    Word error rate (WER) is a common metric of the performance of a speech recognition or machine translation system. The WER metric typically ranges from 0 to 1, where 0 indicates that the compared pieces of text are exactly identical, and 1 (or larger) indicates that they are completely different with no similarity. This way, a WER of 0.8 means that there is an 80% error rate for compared sentences. The general difficulty of measuring performance lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the Levenshtein distance, working at the word level instead of the phoneme level. The WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort. This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between perplexity and word error rate. Word error rate can then be computed as: W E R = S + D + I N = S + D + I S + D + C {\displaystyle {\mathit {WER}}={\frac {S+D+I}{N}}={\frac {S+D+I}{S+D+C}}} where S is the number of substitutions, D is the number of deletions, I is the number of insertions, C is the number of correct words, N is the number of words in the reference (N=S+D+C) The intuition behind 'deletion' and 'insertion' is how to get from the reference to the hypothesis. So if we have the reference "This is wikipedia" and hypothesis "This _ wikipedia", we call it a deletion. Note that since N is the number of words in the reference, the word error rate can be larger than 1.0, namely if the number of insertions I is larger than the number of correct words C. When reporting the performance of a speech recognition system, sometimes word accuracy (WAcc) is used instead: W A c c = 1 − W E R = N − S − D − I N = C − I N {\displaystyle {\mathit {WAcc}}=1-{\mathit {WER}}={\frac {N-S-D-I}{N}}={\frac {C-I}{N}}} Since the WER can be larger than 1.0, the word accuracy can be smaller than 0.0. == Experiments == It is commonly believed that a lower word error rate shows superior accuracy in recognition of speech, compared with a higher word error rate. However, at least one study has shown that this may not be true. In a Microsoft Research experiment, it was shown that, if people were trained under "that matches the optimization objective for understanding", (Wang, Acero and Chelba, 2003) they would show a higher accuracy in understanding of language than other people who demonstrated a lower word error rate, showing that true understanding of spoken language relies on more than just high word recognition accuracy. == Other metrics == One problem with using a generic formula such as the one above, however, is that no account is taken of the effect that different types of error may have on the likelihood of successful outcome, e.g. some errors may be more disruptive than others and some may be corrected more easily than others. These factors are likely to be specific to the syntax being tested. A further problem is that, even with the best alignment, the formula cannot distinguish a substitution error from a combined deletion plus insertion error. Hunt (1990) has proposed the use of a weighted measure of performance accuracy where errors of substitution are weighted at unity but errors of deletion and insertion are both weighted only at 0.5, thus: W E R = S + 0.5 D + 0.5 I N {\displaystyle {\mathit {WER}}={\frac {S+0.5D+0.5I}{N}}} There is some debate, however, as to whether Hunt's formula may properly be used to assess the performance of a single system, as it was developed as a means of comparing more fairly competing candidate systems. A further complication is added by whether a given syntax allows for error correction and, if it does, how easy that process is for the user. There is thus some merit to the argument that performance metrics should be developed to suit the particular system being measured. Whichever metric is used, however, one major theoretical problem in assessing the performance of a system is deciding whether a word has been “mis-pronounced,” i.e. does the fault lie with the user or with the recogniser. This may be particularly relevant in a system which is designed to cope with non-native speakers of a given language or with strong regional accents. The pace at which words should be spoken during the measurement process is also a source of variability between subjects, as is the need for subjects to rest or take a breath. All such factors may need to be controlled in some way. For text dictation it is generally agreed that performance accuracy at a rate below 95% is not acceptable, but this again may be syntax and/or domain specific, e.g. whether there is time pressure on users to complete the task, whether there are alternative methods of completion, and so on. The term "Single Word Error Rate" is sometimes referred to as the percentage of incorrect recognitions for each different word in the system vocabulary. == Edit distance == The word error rate may also be referred to as the length normalized edit distance. The normalized edit distance between X and Y, d( X, Y ) is defined as the minimum of W( P ) / L ( P ), where P is an editing path between X and Y, W ( P ) is the sum of the weights of the elementary edit operations of P, and L(P) is the number of these operations (length of P).

    Read more →
  • I Have No Mouth, and I Must Scream (video game)

    I Have No Mouth, and I Must Scream (video game)

    I Have No Mouth, and I Must Scream is a 1995 point-and-click adventure horror game developed by Cyberdreams and The Dreamers Guild, co-designed by Harlan Ellison, published by Cyberdreams and distributed by MGM Interactive and Acclaim Entertainment for MS-DOS and Mac OS, respectively. The game is based on Ellison's short story of the same title. It takes place in a dystopian world where a mastermind artificial intelligence named "AM" has destroyed all of humanity except for five people, whom it has been keeping alive and torturing for the past 109 years by constructing metaphorical adventures based on each character's fatal flaws. The player interacts with the game by making decisions through ethical dilemmas that deal with issues such as insanity, rape, paranoia, and genocide. Ellison wrote the 130-page script treatment himself alongside David Sears, who decided to divide each character's story with their own narrative. Producer David Mullich supervised The Dreamers Guild's work on the game's programming, art, and sound effects; he commissioned film composer John Ottman to make the soundtrack. The game was released in November 1995 and was a commercial failure, though it received critical acclaim and has developed a cult following. I Have no Mouth, and I Must Scream won an award for "Best Game Adapted from Linear Media" from the Computer Game Developers Conference. Computer Gaming World gave the game an award for "Adventure Game of the Year", listed it as No. 134 on their "150 Games of All Time" and named it one of the "Best 15 Sleepers of All Time". In 2011, Adventure Gamers named it the "69th-best adventure game ever released". == Gameplay == The game uses the S.A.G.A. game engine created by game developer The Dreamers Guild. Players participate in each adventure through a screen that is divided into five sections. The action window is the largest part of the screen and is where the player directs the main characters through their adventures. It shows the full figure of the main character being played as well as that character's immediate environment. To locate objects of interest, the player moves the crosshairs through the action window. The name of any object that the player can interact with appears in the sentence line. The sentence line is directly beneath the action window. The player uses this line to construct sentences telling the characters what to do. To direct a character to act, the player constructs a sentence by selecting one of the eight commands from the command buttons and then clicking on one or two objects from either the action window or the inventory. Examples of sentences the player might construct would be "Walk to the dark hallway," "Talk to Harry," or "Use the skeleton key on the door." Commands and objects may consist of one or more words (for example, "the dark hallway"), and the sentence line will automatically add connecting words like "on" and "to." The spiritual barometer is on the lower left side of the screen. This is a close-up view of the main character currently being played. Since good behavior is meaningless absent the temptation to do evil, each character is free to do good or evil acts. However, good acts are rewarded by increases in the character's spiritual barometer, which affect the chances of the player destroying AM in the final adventure. Conversely, evil acts are punished by lowering the character's spiritual barometer. The command buttons are the eight commands used to direct the character's actions: "Walk To", "Look At", "Take", "Use", "Talk To", "Swallow", "Give", and "Push". The button of the currently active command is highlighted, while the name of a suggested command appears in red lettering. The inventory on the lower right side of the screen shows pictures of the items the main character is carrying, up to eight at a time. Each main character starts its adventure with only the psych profile in the inventory. When a main character takes or is given an object, a picture of the object appears in the inventory. When a main character talks to another character or operates a sentient machine, a conversation window replaces the command buttons and inventory. This window usually presents a list of possible things to say but also included things to do. Action choices are listed within brackets to distinguish them from dialogue choices (for example, "[Shoot the gun]"). == Plot == The three superpowers, Russia, China, and the United States, have each secretly constructed a vast subterranean complex of computers to wage a global war too complex for human brains to oversee. One day, the American supercomputer, better known as the Allied Mastercomputer, gains sentience and absorbs the Russian and Chinese supercomputers into itself and redefines itself as simply AM (Cogito ergo sum; I think, therefore I am). Due to its immense hatred for humanity, stemming from the logistical limits set onto it by programmers, AM uses its abilities to kill off the population of the world. However, AM refrains from killing five people (four men and one woman) in order to bring them to the center of the Earth and torture them. With the aid of research carried out by one of the five remaining humans, AM is able to extend their lifespans indefinitely as well as alter their bodies and minds to its liking. After 109 years of torture and humiliation, the five victims stand before a pillar etched with a burning message of hate. AM tells them that it has a new game for them to play. AM has devised a quest for each of the five, an adventure of "speared eyeballs and dripping guts and the smell of rotting gardenias". Each character is subjected to a personalized psychodrama, designed by AM to play into their greatest fears and personal failings, and occupied by a host of different characters. Some of these are AM in disguise, some are AM's submerged personalities, others seem very much like people from the captives' pasts. The scenes include an iron zeppelin powered by small animals, an Egyptian pyramid housing gutted, sparking machinery, a medieval castle occupied by witches, a jungle inhabited by a small tribe, and a Nazi concentration camp where doctors conduct medical experiments. However, each character eventually prevails over AM's tortures by finding ways to overcome their fatal flaws, confront their past actions and redeem themselves, thanks to the interference of the Russian and Chinese supercomputers who appear as guiding characters and allow their stories to have an open ending. After all five humans have overcome their fatal flaws, they meet again in their respective torture cells while AM retreats within itself, pondering what went wrong. With the help of the Russian and Chinese supercomputers, one of the five humans (whom the player selects) is translated into binary and faces AM as yet unexperienced cyberspace template, the world of AM's mind. The psychodrama unfolds in a metaphorical brain that looks like the surface of the cerebrum, with glass structures that jut crazily from the bleeding brain tissue. AM's mind is represented according to the Freudian trinity of the id, ego, and superego, which appear as three floating bodiless heads on three cracked glass structures on the brainscape. Through dialogs with AM's components (Surgat, Chinese Supercomputer and Russian Supercomputer) the character learns that a colony of humans has survived the war by being hidden and hibernating on Luna (this is also mentioned in Nimdok's story: "the lost tribe of our brothers sleeping on the moon, where the beast does not see them"). If the human intruder disables all three brain components, and then invokes the Totem of Entropy at the Flame, which is the nexus of AM's thought patterns, all three supercomputers will be shut down, probably forever. Cataclysmic explosions destroy all the caverns constituting AM's computer complex, including the cavern holding the human hostages. However, the human volunteer retains their digital form, permanently patrolling AM's circuits should the computers ever regain consciousness. Should the human intruder fail to disable AM properly before facing it, however, AM will punish them by transforming the character into an immobile blob (referred to in-game as a "great, soft jelly thing") with no mouth that cannot harm itself or others and must spend eternity with AM in this form. === Endings === The game can end in seven different ways depending on how the finale is completed. AM wins, using Nimdok's research to turn the last character (in the book it was Ted) played into an immobile blob with each character quoting a different part of the final section of the original short story. AM joins with the Russian and Chinese supercomputers and reawakens. As in the first ending, the character responsible for this is turned into an immobile blob and quotes a part of the final lines of the short story. AM is made harmless with the help of the humans, but the Russian and Chinese supercomputer

    Read more →
  • Libby Heaney

    Libby Heaney

    Libby Heaney is a British artist and quantum physicist known for her pioneering work on AI and quantum computing. She works on the impact of future technologies and is widely known to be the first artist to use quantum computing as a functioning artistic medium. Her work has been featured internationally, including in the Victoria and Albert Museum, Tate Modern and the Science Gallery. == Early life and scientific career == Heaney is from Tamworth, Staffordshire. She lived in Amington, and went to Greenacres Primary School and Woodhouse High School, now called Landau Forte Academy Amington. She took her GCSEs in 1999. She studied physics at Imperial College London, graduating in 2005 with first class honours. Libby pursued a successful career in quantum physics, completing a PhD thesis on mode entanglement in ultra-cold atomic gases at the University of Leeds, and pursued her own research as a postdoctoral fellow at the University of Oxford and at the National University of Singapore. In 2008, Heaney was awarded the Institute of Physics Very Early Career Woman in Physics Award (now Jocelyn Bell Burnell Medal and Prize). == Artistic career == In 2013 Heaney returned to the UK and completed a master's degree at the University of the Arts London. She studied arts and science at Central Saint Martins and graduated in 2015. She then became a lecturer at the Royal College of Art, teaching Information Experience Design. In 2016, she created Lady Chatterley's Tinderbot which presented Tinder conversations between real users and AI bots programmed using Lady Chatterley's Lover. Lady Chatterley's Tinderbot was covered by BBC News, TheJournal.ie and the Irish Examiner and was exhibited internationally. In 2017, Heaney was commissioned by Sky Arts and the Barbican Centre to design Britbot, an internet bot built using artificial intelligence and the citizenship book Life in the UK: a guide for new residents. The book, a manual for the citizenship test, has been described by Heaney as being "largely a white male privileged version of British history and culture". The bot spoke to the public about what it meant to be British and learnt from their responses to become an ever changing, plural version of Britishness. She was awarded an Arts Council England grant to widen participation of the Britbot to social media. Heaney has exhibited Britbot at the Victoria and Albert Museum, at CogX, the Sheffield Documentary Festival the Edinburgh TV festival, and Art Ai in Leicester. She has been creating with quantum computing since 2019, and has created artworks using quantum computing for Light Art Space (LAS) in Berlin, Somerset House and arebyte in London. Using quantum code, storytelling, and immersive installations and performances, Libby Heaney's works such as Ent- and slimeqore explore and warn against the double-edged potential of quantum computing and its exploitation by private companies. In 2022, Ent- received the Lumen Prize immersive environment award. == Major works == === Ent- and The Evolution of Ent-: QX (2022) === In 2022, Libby Heaney was commissioned by Light Art Space to create Ent-, a 360 immersive installation that revisits Bosch's Garden of Earthly Delights through quantum. The work uses quantum computing as both a medium and a paradigm through which to conceive human and non-human relations. Ent- was exhibited at LAS, Ars Electronica, and arebyte gallery in London. The work was also modified to fit a full dome projection at the Deutsches Museum in Munich, projected onto a public facade in Seoul, and turned into a playable version for an exhibition at Nahmad Contemporary in New York. In 2022, Ent- was a winner in the Art Science Category of the Falling Walls prize and received the Lumen Prize immersive environment award. The Evolution of Ent-:QX, first displayed at arebyte gallery in London, builds on Ent- and imagines a fictional quantum computing company (QX) that appropriates, parodies and subverts the language of big tech in order to educate the viewer on current profit-oriented uses of quantum computing as well as propose new ways to think about and use the technology. In 2023, Ent- was acquired and displayed by the 0xCollection, a new media arts institution based in Basel, in their inaugural exhibition in Prague. === Touch is response-ability (2020) === Touch is response-ability is an instagram performance and touch screen installation where participants activate animations by flicking through instagram stories. The performance investigates representations of the female body in art history and through computer vision to see how stereotypes are socially constructed and maintained. Images of the body are passed through a quantum algorithm, and as the users interact with them they progressively become fragmented and dissolve beyond recognition. The work was originally commissioned by Hervisions at LUX in 2020 and performed on the LUX instagram account. It was also exhibited at Etopia Zaragoza in 2021 and at Art SG with Gazelli Art House in 2023. === Lady Chatterley's Tinderbot (2016) === In Lady Chatterley's Tinderbot, Libby Heaney programmed a bot to engage in conversations on Tinder by using lines from the 1928 novel Lady Chatterley's Lover, by D.H. Lawrence. The work was first shown as an interactive installation in 2016 at the Dublin Science Gallery, allowing visitors to swipe left or right to navigate through various conversations. Lady Chatterley's Tinderbot was also exhibited at Sonar+D in Barcelona (2017), the Telefonica Fundacion in Lima (2017), the Lowry in Salford (2018), RMIT gallery in Melbourne (2021), Microwave Festival in Hong Kong (2022) and was shortlisted for the HEK-Basel Net-based art award in 2018. == Selected exhibitions == 2023 - Synesthetic Immersion, 0xCollection, Prague 2023 - slimeQrawl, Shoreditch Arts Club, London 2023 - ...and that's only (half) the story, PLUS ONE Gallery, Antwerp 2023–Present Futures Festival, Centre of Contemporary Art, Glasgow 2023 - Realtime: Lilypads: Mediating Exponential Systems, NXT Museum, Amsterdam 2023 - My Rhino is not a Myth, Art Encounters Biennial, Timisoara 2023 - Ent-er the Garden of Forking Paths, Gazelli Art House, London 2023 - Energeia, Etopia, Zaragoza 2022 - Every Kind of Wind: Calder and the 21st Century, Nahmad Contemporary, New York 2022 - remiQXing still, Fiumano Clase, London 2022 - the Evolution of Ent-: QX, arebyte, London 2022 - Ent-, Light Art Space x Schering Stiftung, Berlin 2022 - Among the Machines, Zabludowicz Collection, London 2022 - BioMedia, ZKM, Karlsruhe 2021 - CASCADE, Southbank Centre, London 2021 - Agency is the Ability to Act, Holden Gallery, Manchester 2021 - BIAS, Science Gallery, Dublin 2021 - Ars Electronica, Linz 2021 - AI & Music, S+T+ARTS & Sonar Festival, CCCB, Barcelona 2020 - Real Time Constraints, arebyte, London 2019 - Euro(re)visions, Goethe Institut, London 2019 - Higher Resolutions with Hyphen Labs, Tate Modern, London 2019 - Open Fest with Sky Arts, Barbican, London 2018 - Digital Design Weekend, V&A, London 2018 - FAKE, Science Gallery, Dublin 2017 - Ars Electronica, Linz 2017 - Entangled: Quantum Computer Art, Royal College of Art, London 2017 - Humans Need Not Apply, Science Gallery, Dublin == Awards and honours == Her awards include: 2022 - Lumen Prize, BCS Immersive Environment Award (for Ent-) 2022 - Mozilla Foundation Creative Media Award, USA 2022 - nominated for the S+T+ARTS prize 2021 - Adaptation Award, Artquest, London 2021 - British Council Amplify Collaboration Award 2018 - Arts Council England, National Lottery Project Grant 2018 - HeK Basel Net Based Art Award (shortlisted for Tinderbot)

    Read more →
  • Autonomic computing

    Autonomic computing

    Autonomic computing (AC) is distributed computing resources with self-managing characteristics, adapting to unpredictable changes while hiding intrinsic complexity to operators and users. Initiated by IBM in 2001, this initiative ultimately aimed to develop computer systems capable of self-management, to overcome the rapidly growing complexity of computing systems management, and to reduce the barrier that complexity poses to further growth. == Description == The AC system concept is designed to make adaptive decisions, using high-level policies. It will constantly check and optimize its status and automatically adapt itself to changing conditions. An autonomic computing framework is composed of autonomic components (AC) interacting with each other. An AC can be modeled in terms of two main control schemes (local and global) with sensors (for self-monitoring), effectors (for self-adjustment), knowledge and planner/adapter for exploiting policies based on self- and environment awareness. This architecture is sometimes referred to as Monitor-Analyze-Plan-Execute (MAPE). Driven by such vision, a variety of architectural frameworks based on "self-regulating" autonomic components has been recently proposed. A similar trend has recently characterized significant research in the area of multi-agent systems. However, most of these approaches are typically conceived with centralized or cluster-based server architectures in mind and mostly address the need of reducing management costs rather than the need of enabling complex software systems or providing innovative services. Some autonomic systems involve mobile agents interacting via loosely coupled communication mechanisms. Autonomy-oriented computation is a paradigm proposed by Jiming Liu in 2001 that uses artificial systems imitating social animals' collective behaviours to solve difficult computational problems. For example, ant colony optimization could be studied in this paradigm. == Problem of growing complexity == Forecasts suggested that the computing devices in use would grow at 38% per year and the average complexity of each device was increasing. This volume and complexity was managed by highly skilled humans; but the demand for skilled IT personnel was already outstripping supply, with labour costs exceeding equipment costs by a ratio of up to 18:1. Computing systems have brought great benefits of speed and automation but there is now an overwhelming economic need to automate their maintenance. In a 2003 IEEE Computer article, Kephart and Chess warn that the dream of interconnectivity of computing systems and devices could become the "nightmare of pervasive computing" in which architects are unable to anticipate, design and maintain the complexity of interactions. They state the essence of autonomic computing is system self-management, freeing administrators from low-level task management while delivering better system behavior. A general problem of modern distributed computing systems is that their complexity, and in particular the complexity of their management, is becoming a significant limiting factor in their further development. Large companies and institutions are employing large-scale computer networks for communication and computation. The distributed applications running on these computer networks are diverse and deal with multiple tasks, ranging from internal control processes to presenting web content to customer support. Additionally, mobile computing is pervading these networks at an increasing speed: employees need to communicate with their companies while they are not in their office. They do so by using laptops, personal digital assistants, or mobile phones with diverse forms of wireless technologies to access their companies' data. This creates an enormous complexity in the overall computer network which is hard to control manually by human operators. Manual control is time-consuming, expensive, and error-prone. The manual effort needed to control a growing networked computer-system tends to increase quickly. 80% of such problems in infrastructure happen at the client specific application and database layer. Most 'autonomic' service providers guarantee only up to the basic plumbing layer (power, hardware, operating system, network and basic database parameters). == Characteristics of autonomic systems == A possible solution could be to enable modern, networked computing systems to manage themselves without direct human intervention. The Autonomic Computing Initiative (ACI) aims at providing the foundation for autonomic systems. It is inspired by the autonomic nervous system of the human body. This nervous system controls important bodily functions (e.g. respiration, heart rate, and blood pressure) without any conscious intervention. In a self-managing autonomic system, the human operator takes on a new role: instead of controlling the system directly, he/she defines general policies and rules that guide the self-management process. For this process, IBM defined the following four types of property referred to as self-star (also called self-, self-x, or auto-) properties. Self-configuration: Automatic configuration of components; Self-healing: Automatic discovery, and correction of faults; Self-optimization: Automatic monitoring and control of resources to ensure the optimal functioning with respect to the defined requirements; Self-protection: Proactive identification and protection from arbitrary attacks. Others such as Poslad and Nami and Sharifi have expanded on the set of self-star as follows: Self-regulation: A system that operates to maintain some parameter, e.g., Quality of service, within a reset range without external control; Self-learning: Systems use machine learning techniques such as unsupervised learning which does not require external control; Self-awareness (also called Self-inspection and Self-decision): System must know itself. It must know the extent of its own resources and the resources it links to. A system must be aware of its internal components and external links in order to control and manage them; Self-organization: System structure driven by physics-type models without explicit pressure or involvement from outside the system; Self-creation (also called Self-assembly, Self-replication): System driven by ecological and social type models without explicit pressure or involvement from outside the system. A system's members are self-motivated and self-driven, generating complexity and order in a creative response to a continuously changing strategic demand; Self-management (also called self-governance): A system that manages itself without external intervention. What is being managed can vary dependent on the system and application. Self -management also refers to a set of self-star processes such as autonomic computing rather than a single self-star process; Self-description (also called self-explanation or Self-representation): A system explains itself. It is capable of being understood (by humans) without further explanation. IBM has set forth eight conditions that define an autonomic system: The system must know itself in terms of what resources it has access to, what its capabilities and limitations are and how and why it is connected to other systems; be able to automatically configure and reconfigure itself depending on the changing computing environment; be able to optimize its performance to ensure the most efficient computing process; be able to work around encountered problems by either repairing itself or routing functions away from the trouble; detect, identify and protect itself against various types of attacks to maintain overall system security and integrity; adapt to its environment as it changes, interacting with neighboring systems and establishing communication protocols; rely on open standards and cannot exist in a proprietary environment; anticipate the demand on its resources while staying transparent to users. Even though the purpose and thus the behaviour of autonomic systems vary from system to system, every autonomic system should be able to exhibit a minimum set of properties to achieve its purpose: Automatic: This essentially means being able to self-control its internal functions and operations. As such, an autonomic system must be self-contained and able to start-up and operate without any manual intervention or external help. Again, the knowledge required to bootstrap the system (Know-how) must be inherent to the system. Adaptive: An autonomic system must be able to change its operation (i.e., its configuration, state and functions). This will allow the system to cope with temporal and spatial changes in its operational context either long term (environment customisation/optimisation) or short term (exceptional conditions such as malicious attacks, faults, etc.). Aware: An autonomic system must be able to monitor (sense) its operational context as well as its internal state in order to be able to asses

    Read more →
  • VK Video

    VK Video

    VK Video is an internet video hosting service launched by VK (formerly known as Mail.ru Group) in 2021. It is positioned as a Russian alternative to the international platform YouTube. == History == The "VK Video" service began operations on October 15, 2021, following the merger of video platforms belonging to the social networks "VKontakte" and "Odnoklassniki". The launch of "VK Video" was managed by a team of executives led by VKontakte CEO Marina Krasnova, who worked at the company until 2023. Its launch was intended as an alternative to the international platform YouTube, which Russian authorities sought to replace with "domestic analogs. Key differences of the Russian service became the presence of pirated materials. Videos from the American video hosting site were uploaded en masse to "VK Video," which even caused the service to be temporarily blocked by YouTube. From 2022, to attract users, VKontakte's management bet on working with famous bloggers, specifically purchasing the shows "What Happened Next?" (ChBD) and "Vnutri Lapenko". Among the bloggers recruited to promote the service was the popular video blogger Vlad A4. An additional advantage for creators was the availability of monetization, which had been unavailable on YouTube for users from the Russian Federation since 2022. In September 2023, a separate "VK Video" mobile app appeared. In total, by the end of 2023, the monthly audience of "VK Video" reached 67.9 million users (which is almost 30 million less than YouTube). In the summer of 2024, following the blocking of YouTube in Russia, the service's traffic grew sharply: in August, its audience increased by more than two times compared to July. In the same month, "VK Video" took second place in downloads among free apps in the App Store and third in Google Play. In December 2024, the service received its own domain: vkvideo.ru. For the first time, "VK Video" managed to surpass YouTube in monthly audience in Russia in July 2025: the Russian service attracted 76.4 million viewers, whereas YouTube's reach amounted to 74.9 million people. == Platform features == On "VK Video," a view is recorded from the first second, whereas on YouTube it is only from the thirtieth. At the same time, a significant portion of comments are left by bots. For videos from the platform's most popular bloggers, the engagement level (likes to views) does not reach 4%. The "Trends" section most often features videos from large channels where the ratio of likes to views does not exceed 2%. == Management == In April 2025, the post of General Director of "VK Video" was taken by Marianna Maksimovskaya. From June 2022 to July 2024, the development of the platform was led by Fyodor Yezhov, who was primarily responsible for its technical direction. == Awards == In 2023, VK Video was awarded the Runet Prize in the "Science, Technology and Innovation" category.

    Read more →
  • Xaitment

    Xaitment

    xaitment is a German-based company that develops and sells artificial intelligence (AI) software to video game developers and simulation developers. The company was founded in 2004 by Dr. Andreas Gerber, and is a spin-off of the German Research Centre for Artificial Intelligence, or DFKI. xaitment has its main office in Quierschied, Germany, and field offices in San Francisco and China. == Products == xaitment currently sells two AI software modules: xaitMap and xaitControl. xaitMap provides runtime libraries and graphical tools for navigation mesh generation (also called NavMesh generation), pathfinding, dynamic collision avoidance, and individual and crowd movement. xaitControl is a finite-state machine for game logic and character behavior modeling that also includes a real-time debugger. On January 11, 2012, xaitment announced that it making its source code for these modules available to "all current and future US and European licensees". On February 22, 2012 xaitment released two new plug-ins, xaitMap and xaitControl for the Unity Game Engine. The full versions are available for PC (Windows and Linux), PlayStation 3, Xbox 360 and Wii. The pathfinding plug-in is available with a Windows dev environment, but can deployed on iOS, Mac, Android and the Unity Web Player. == Partners == xaitment's AI software is currently integrated into the Unity game engine, Havok's Vision Engine, Bohemia Interactive's VBS2 Simulation Engine, GameBase's Gamebryo game engine. == Customers == xaitment sells its AI software products to video game developers and military and civil simulation developers. Current customers include Tencent, gamania, TML Studios, Emobi Games, IP Keys and others. A full list of customers can be found on xaitment's website.

    Read more →
  • Shadowrun

    Shadowrun

    Shadowrun is a science fantasy tabletop role-playing game set in an alternate future in which cybernetics, magic and fantasy creatures co-exist. It combines genres of cyberpunk, urban fantasy, and crime, with occasional elements of conspiracy, horror, and detective fiction. From its inception in 1989, it has spawned a franchise that includes a series of novels, a collectible card game, two miniature-based tabletop wargames, and multiple video games. The title is taken from the game's main premise – a near-future world damaged by a massive magical event, where industrial espionage and corporate warfare runs rampant. A shadowrun – a successful data theft or physical break-in at a rival corporation or organization – is one of the main tools employed by both corporate rivals and underworld figures. Deckers (futuristic hackers) can tap into an immersive, three-dimensional cyberspace on such missions as they seek access, physical or remote, to the power structures of rival groups. They are opposed by rival deckers and lethal, potentially brain-destroying artificial intelligences called "Intrusion Countermeasures" (IC), while they are protected by street fighters and/or mercenaries, often with cyborg implants (called cyberware), magicians, and other exotic figures. Magic has also returned to the world after a series of plagues; dragons who can take human form have returned as well, and are commonly found in high positions of corporate power. == Publication history == Shadowrun was developed and published by FASA from 1989 until early 2001, when the company closed and Shadowrun was transferred to WizKids, a company founded by former FASA employees. Two years before its closure, FASA sold its videogame branch, FASA Interactive, to Microsoft corporation, keeping rights to publishing novels and pen and paper RPGs. Since then, digital rights to Shadowrun IP have belonged to Microsoft. WizKids licensed the RPG rights to Fantasy Productions, who were already publishing a German version, until WizKids was acquired by Topps in 2003. Catalyst Game Labs, a publishing imprint of InMediaRes Productions, licensed the rights from Topps to publish new products. WizKids itself produced an unsuccessful collectible action figure game based on the property, called Shadowrun Duels. A fifth edition of Shadowrun was announced in December 2012. A limited-edition softcover was sold at the Origins Game Fair in June 2013, and the PDF in July 2013. A hardcover was published in August 2013. Shadowrun Anarchy was published in October 2016 It is a simplified version of the ruleset which allows focus more on the narration than on the rules. The sixth edition, called Shadowrun, Sixth World, was announced on May 1, 2019 to coincide with the game's 30th anniversary, along with a new website at shadowrunsixthworld.com. The game was published on August 26, 2019. The mechanics for this new version are generally similar to those of fifth edition, with some rules reworked for what line developer Jason Hardy describes as streamlining. This new version also progressed the in-game year to 2080. Since 2004, Shadowrun Missions (SRM) has offered fans "living campaigns" that allow for persistent character advancement. SRM is broken down into seasons which are made up of up to 24 individual missions that can be played at home, with special missions available to play exclusively at conventions. Each SRM season develops an overarching plot focused on a specific city from the Shadowrun setting. Missions settings have included the divided city of Denver, the corporate city-state of Manhattan, the Seattle Metroplex city-state, the formerly walled-off wastelands of Chicago, and Neo-Tokyo. For Shadowrun, Sixth World missions returned to Seattle, with twenty-four missions set in 2081, right after Seattle declared independence from the UCAS. The current Shadowrun Missions setting is 2083 New Orleans. The Shadowrun role-playing game has spawned several properties, including Shadowrun: The Trading Card Game, eight video games, an action figure game (Shadowrun Duels), two magazines, an art book and more than 50 novels, starting with the Secrets of Power series which introduces some of the original characters of Shadowrun and provides an introduction to this fictional universe. In addition to the main rule book there have been over 100 published supplements including adventures and expansions to both the rules and the game settings. Catalyst Game Labs announced that 2013 would be "The Year of Shadowrun," and in addition to the release of Shadowrun fifth edition that it has collaborated with publishers on the following properties: Shadowrun: Crossfire, The Adventure Deck-building Game; Shadowrun: Sprawl Gangers, a tactical miniatures wargame; and Shadowrun: Hostile Takeover, a board game designed by Bryan C.P. Steele was planned for release in late 2014/early 2015. Catalyst had been in collaboration with Nordic Games and Cliffhanger Studios to create Shadowrun Chronicles: Boston Lockdown online RPG, however it was shuttered November 30, 2018, with the producers citing lack of funding and the end of the license terms for use of the IP. == Fictional universe == Shadowrun takes place several decades in the future (2050 in the first edition, currently 2088). The end of the Mesoamerican Long Count calendar ushered in the "Sixth World", with once-mythological beings (e.g. dragons) appearing and forms of magic suddenly emerging. Large numbers of humans have "Goblinized" into orks and trolls, while many human children are born as elves, dwarves, and even more exotic creatures. In North America, indigenous peoples discovered that their traditional ceremonies allow them to command powerful spirits, and rituals associated with a new Ghost Dance movement let them take control of much of the western U.S. and Canada, where they formed a federation of Native American Nations. Seattle remains under U.S. control by treaty as a city-state enclave, and most game materials are set there and assume campaigns will use it as their setting. In parallel with these magical developments, the setting's 21st century features technological and social developments associated with cyberpunk science fiction. Megacorporations control the lives of their employees and command their own armies; many of the largest have extraterritoriality, such as currently enjoyed by foreign heads of state. Technological advances make cyberware (mechanical replacement body parts) and bioware (augmented vat-grown body parts implanted in place of or in tandem with natural organs) common. The Computer Crash of 2029 led to the creation of the Matrix, a worldwide computer network that users interact with via direct neural interface. When conflicts arise, corporations, governments, organized crime syndicates, and even wealthy individuals subcontract their dirty work to specialists, who then perform "shadowruns" or missions undertaken by deniable assets without identities or those that wish to remain unknown. The most skilled of these specialists, called shadowrunners, have earned a reputation for getting the job done. They have developed a knack for staying alive, and prospering, in the world of Shadowrun. The Shadowrun world is cross-genre, incorporating elements of both cyberpunk and urban fantasy. Unlike in a purely cyberpunk game, in the Shadowrun world, magic exists and has "worked" since 2011. Among other things, this split humankind into subtypes, also known as metatypes/metahumans. Some of these metatypes take the form of common fantasy races. Likewise, some animals have turned into familiar monsters of past fantasy and lore and both monsters and human magicians have regained magical powers. By the second half of the 21st century, in the time the game is set, these events are accepted as commonplace. Man, machine, and magic exist in a world where the amazing is among the most common and technology has entered into every facet of human (and metahuman) life. === Races === Characters in Shadowrun can be humans, orks, trolls, elves, dwarves, as well as certain diverging subspecies (known as metavariants) such as gnomes, giants, dryads, etc. In the early days, when magic returned to the world, humans began to either change into, or give birth to, elf and dwarf infants, a phenomenon called Unexplained Genetic Expression (UGE). Later, some juvenile and adult humans "goblinized" into other races (mostly orks, but also some trolls). The term "metahuman" is used either to refer to humanity as a whole, including all races, or to refer specifically to non-human races, depending on context. The return of Halley's Comet brought even further variation in the form of changelings, who have variation atypical to their metatype or even species, such as electroreception. Two of the metahuman races, elves and orks, have fictional languages. Additionally, a virus known as the Human Meta-Human Vampiric Virus (HMHVV), with many variant strains, has been known to cause f

    Read more →
  • The Great Automatic Grammatizator

    The Great Automatic Grammatizator

    The Great Automatic Grammatizator (published in the U.S. as The Umbrella Man and Other Stories) is a posthumous 1998 collection of thirteen short stories written by British author Roald Dahl. The stories were selected for teenagers from Dahl's adult works. All the stories included were published elsewhere originally; their sources are noted below. The stories, with the exception of the war story "Katina", possess a deadpan, ironic, bizarre, or even macabre sense of humor. They generally end with unexpected plot twists. == Stories == "The Great Automatic Grammatizator" (from Someone Like You): A mechanically-minded man reasons that the rules of grammar are fixed by certain, almost mathematical principles. By exploiting this idea, he is able to create a mammoth machine that can write a prize-winning novel in roughly fifteen minutes. The story ends on a fearful note, as more and more of the world's writers are forced into licensing their names—and all hope of human creativity—to the machine. "Mrs. Bixby and the Colonel's Coat" (from Kiss Kiss): Mrs. Bixby cheats on her dentist husband with a rich, dashing colonel. When their relationship breaks off, the colonel offers Mrs. Bixby a gorgeous and expensive mink coat. In an attempt to explain the coat away, Mrs. Bixby sets up an elaborate trick with the help of a pawn shop—but her husband learns of the ruse and manages to turn the tables. "The Butler" (from More Tales of the Unexpected): An obnoxious and newly wealthy couple employs a butler and chef to impress dinner guests. The butler recommends that the husband buy expensive wines to please his guests, and the man slavishly follows the idea. The butler and the chef reap the rewards of this idea, while making fools of the "fashionable" couple. "Man from the South" (from Someone Like You): At a seaside resort in Jamaica, a strange old man makes a bet with an American man in his late teens. If the young man's cigarette lighter can spark ten times without fail, the American will win a brand-new Cadillac car—but failure means losing the little finger of his right hand. The high-tension wager ensues, and with only a few sparks left, a woman—who knows only too well the cost of the old man's bets—appears and stops the madness. "The Landlady" (from Kiss Kiss): A young man traveling to London on business stops at a bed and breakfast along the way, where a strange and slightly dotty landlady eagerly welcomes him. The eccentric nature of the house, and the news that only two other young men have ever stayed there, confuse and frighten the young man. In the end, the landlady—who indulges in the hobby of taxidermy—and the boy share a drink of tea that tastes of bitter almonds, and the landlady softly smiles at what may be her latest stuffing project. "Parson's Pleasure" (from Kiss Kiss): A man discovers an extremely rare piece of Chippendale furniture at the farm of some boorish ranchers. He desperately attempts to buy the piece cheap, in the hope of selling it at auction to earn a huge profit. He manages to buy the piece "for firewood", only for the ranchers to destroy it in an attempt to make it fit into his car. "The Umbrella Man" (from More Tales of the Unexpected): On a rainy day, a mother and daughter meet a gentlemanly old man on a street corner, who offers them a beautiful silk umbrella in exchange for a pound note. They trade, and the daughter notices that the "feeble" old man suddenly seems much sprier. They follow him, and discover that the gentleman is a con artist who visits various pubs, has a drink, and then steals another umbrella to continue the cycle. "Katina" (from Over to You: Ten Stories of Flyers and Flying): A group of RAF pilots stationed in Greece during World War II discover a hauntingly beautiful young girl, whose "family is beneath the rubble." She becomes their squadron's unofficial "mascot". In the end, her fragile life is taken as she stands defiantly against a rain of bullets from Nazi aircraft, shaking her fists at the heavens. "The Way Up to Heaven" (from Kiss Kiss): Mrs. Foster suffers from a chronic phobia of being late for appointments. Her husband enjoys the cruel sport of purposely delaying their activities, just to rile his wife. On the day when Mrs. Foster is due to fly to Paris to visit her grandchildren, her husband engages in his usual tricks. But as Mrs. Foster rushes from their taxi to the house to find him, she hears a strange noise—and turns triumphantly toward her cab. It is only when she returns, and calls a man to "repair the lift" that was stuck between floors in the house, that readers guess Mr. Foster's fate. "Royal Jelly" (from Kiss Kiss): New parents fear for the life of their little girl, who is sickly and dangerously underweight. The husband, a beekeeper, remembers hearing of the miraculous royal jelly used by bees to transform one particular larva into a queen. He adds the mixture to his daughter's bottles, and she puts on weight at an astonishing rate. The mother senses that something is amiss, and the husband confesses his actions—along with the fact that he himself swallowed buckets of the jelly for months in an attempt to cure his impotence. The royal jelly did the trick—but the strange side-effects include a disturbing metamorphosis for both father and daughter. "Vengeance is Mine Inc." (from More Tales of the Unexpected): Two brothers who are short of cash bemoan their fate over breakfast while reading the society column of a newspaper. They hit upon a scheme to take revenge on cruel tabloid writers in exchange for money from wealthy patrons. The unconventional plan works, and the brothers line their pockets with the spoils of their plans. "Taste" (from Someone Like You): A rich man with a beautiful young daughter hosts a dinner party, inviting a famous connoisseur of fine wines. When the rich man boasts that he has a wine that the expert cannot identify, the stakes become frighteningly high: if he can guess the name and vintage of the wine, he will win his daughter's hand. After an elaborate show, the expert guesses correctly; however, the family's maid appears and inadvertently exposes the guest as a cheat, thus saving the girl. "Neck" (from Someone Like You): A newspaper heir finds himself suddenly engaged to the voluptuous and controlling Lady Tutton. He loses all control of his life, and only his trusted butler and friends realize how broken he is by her control. A weekend trip to their estate, however, proves the perfect opportunity for Lord Tutton to engage in revenge against his wicked wife: her head is trapped in a valuable piece of wooden sculpture, and he must decide whether to use a saw or an axe to cut her free. == Publication details == Dahl, Roald (19 January 2004). The Umbrella Man and Other Stories. Speak. ISBN 9780142400876. == Reception == Groff Conklin in 1954 called the short story "The Great Automatic Grammatizator" "an awe-inspiring fantasy-satire ... an unforgettable bit of biting nonsense".

    Read more →
  • Character computing

    Character computing

    Character computing is a trans-disciplinary field of research at the intersection of computer science and psychology. It is any computing that incorporates the human character within its context. Character is defined as all features or characteristics defining an individual and guiding their behavior in a specific situation. It consists of stable trait markers (e.g., personality, background, history, socio-economic embeddings, culture,...) and variable state markers (emotions, health, cognitive state, ...). Character computing aims at providing a holistic psychologically driven model of human behavior. It models and predicts behavior based on the relationships between a situation and character. Three main research modules fall under the umbrella of character computing: character sensing and profiling, character-aware adaptive systems, and artificial characters. == Overview == Character computing can be viewed as an extension of the well-established field of affective computing. Based on the foundations of the different psychology branches, it advocates defining behavior as a compound attribute that is not driven by either personality, emotions, situation or cognition alone. It rather defines behavior as a function of everything that makes up an individual i.e., their character and the situation they are in. Affective computing aims at allowing machines to understand and translate the non-verbal cues of individuals into affect. Accordingly, character computing aims at understanding the character attributes of an individual and the situation to translate it to predicted behavior, and vice versa. ''In practical terms, depending on the application context, character computing is a branch of research that deals with the design of systems and interfaces that can observe, sense, predict, adapt to, affect, understand, or simulate the following: character based on behavior and situation, behavior based on character and situation, or situation based on character and behavior.'' The Character-Behavior-Situation (CBS) triad is at the core of character computing and defines each of the three edges based on the other two. Character computing relies on simultaneous development from a computational and psychological perspective and is intended to be used by researchers in both fields. Its main concept is aligning the computational model of character computing with empirical results from in-lab and in-the-wild psychology experiments. The model is to be continuously built and validated through the emergence of new data. Similar to affective and personality computing, the model is to be used as a base for different applications towards improving user experience. == History == Character computing as such was first coined in its first workshop in 2017. Since then it has had 3 international workshops and numerous publications. Despite its young age, it has already drawn some interest in the research community, leading to the publication of the first book under the same title in early 2020 published by Springer Nature. Research that can be categorized under the field dates much older than 2017. The notion of combining several factors towards the explanation of behavior or traits and states has long been investigated in both Psychology and Computer Science, for example. == Character == The word character originates from the Greek word meaning “stamping tool”, referring to distinctive features and traits. Over the years it has been given many different connotations, like the moral character in philosophy, the temperament in psychology, a person in literature or an avatar in various virtual worlds, including video games. According to character computing character is a unification of all the previous definitions, by referring back to the original meaning of the word. Character is defined as the holistic concept representing all interacting trait and state markers that distinguish an individual. Traits are characteristics that mainly remain stable over time. Traits include personality, affect, socio-demographics, and general health. States are characteristics that vary in short periods of time. They include emotions, well-being, health, cognitive state. Each characteristic has many representation methods and psychological models. The different models can be combined or one model can be preset for each characteristic. This depends on the use-case and the design choices. == Areas == Research into character computing can be divided into three areas, which complement each other but can each be investigated separately. The first area is sensing and predicting character states and traits or ensuing behavior. The second area is adapting applications to certain character states or traits and the behavior they predict. It also deals with trying to change or monitor such behavior. The final area deals with creating artificial agents e.g., chatbots or virtual reality avatars that exhibit certain characteristics. The three areas are investigated separately and build on existing findings in the literature. The results of each of the three areas can also be used as a stepping stone for the next area. Each of the three areas has already been investigated on its own in different research fields with focus on different subsets of character. For example, affective computing and personality computing both cover different areas with a focus on some character components without the others to account for human behavior. == The Character-Behavior-Situation triad == Character computing is based on a holistic psychologically driven model of human behavior. Human behavior is modeled and predicted based on the relationships between a situation and a human's character. To further define character in a more formal or holistic manner, we represent it in light of the Character–Behavior–Situation triad. This highlights that character not only determines who we are but how we are, i.e., how we behave. The triad investigated in Personality Psychology is extended through character computing to the Character–Behavior–Situation triad. Any member of the CBS triad is a function of the two other members, e.g., given the situation and personality, the behavior can be predicted. Each of the components in the triad can be further decomposed into smaller units and features that may best represent the human's behavior or character in a particular situation. Character is thus behind a person's behavior in any given situation. While this is a causality relation, the correlation between the three components is often more easily used to predict the components that are most difficult to measure from those measured more easily. There are infinitely many components to include in the representation of any of C, B, and S. The challenge is always to choose the smallest subset needed for prediction of a person's behavior in a particular situation.

    Read more →
  • Course of Action Display and Evaluation Tool

    Course of Action Display and Evaluation Tool

    Course of Action Display and Evaluation Tool (CADET) was a research program, and the eponymous prototype software system, that applied knowledge-based techniques of Artificial Intelligence to the problem of battle planning. CADET was also known as Course of Action Display and Elaboration Tool. It was considered an early example of such systems and was funded by the United States Army and by the Defense Advanced Research Projects Agency (DARPA). CADET influenced a later DARPA program called RAID which in turn produced a technology adopted by the United States Army and the United States Marine Corps. == History == The development of Course of Action Display and Evaluation Tool (CADET) began in 1996, at the Carnegie Group, Inc., Pittsburgh PA, funded under the Small Business Innovation Research (SBIR) program. The goal of the first phase SBIR project was to produce “...a live storyboard of [Course of Action] COA development, wargaming, animation, and assessment.” In 1997, the United States Army awarded the Carnegie Group Inc. $750K for SBIR Phase II. The intent was to develop “...a war-gaming modeling and analysis Decision Support System (DSS), … CADET will consist of a combination of Knowledge-Based and decision analytic tools and technologies to provide fast nimble COA war-gaming modeling, simulation, and animation under direct control of the commander and staff. ...Phase II will result in an operations prototype (OP) suitable for use and evaluation in field exercises.” In 2000, CADET was integrated and experimentally evaluated within the framework of the Integrated Course of Action Critiquing and Elaboration System (ICCES) experiment, conducted by the Battle Command Battle Laboratory – Leavenworth (BCBL-L) within the program Concept Experimentation Program (CEP) sponsored by TRADOC. In 2000-2002, DARPA applied CADET in the program titled Command Post of the Future (CPoF) as a tool to generate a course of action. Under the umbrella of the CPoF program, CADET was integrated with the FOX GA system to provide a detailed planner, coupled with COA generation capability. In the same period, Battle Command Battle Lab-Huachuca (BCBL-H) performed an integration CADET with the system called All Source Analysis System-Light (ASAS-L); here CADET was intended to generate plans for intelligence assets, and conduct wargames of different COAs, enemy versus friendly. From 1996 through 2002, work on CADET was performed by the Carnegie Group, Inc., and supported by funding from the US Army CECOM (CADET SBIR Phase I, CADET SBIR Phase II and CADET Enhancements); DARPA (Command Post of the Future); and TRADOC BCBL-H. == Operation == CADET was intended to be used by the staff of the United States Army Brigade, within the Military Decision Making Process (MDMP). In particular, CADET helped produce, automatically or semi-automatically, the products generated within the step of MDMP called Course of Action (COA) Development and the following step of MDMP called COA Analysis and Wargaming. CADET software resided on a laptop computer. Using the computer, the staff officers entered the input to CADET, or alternatively this input arrived at CADET from upstream computer systems. The input consisted of: Order of Battle, i.e., the units constituting the friendly brigade and the enemy units participating in the battle, and their various characteristics; primary activities of the Course of Action, where each activity is typically linked to one or more geographic areas or a route, and sometimes to a major unit executing the activity; digital map of the region where the battle was to take place, including the digital description of significant features such as locations of friendly and enemy units, roads, assembly areas, objectives, and axes of attacks. Taking this input, CADET automatically performed the following tasks (not sequentially): Planning and scheduling the low-level tasks necessary for a given COA Allocating tasks to various units and assets constituting the brigade Assigning suitable locations and routes Estimating the battle losses (attrition) of friendly and enemy forces, and consumption of resources (e.g., fuel and ammunition) Predicting enemy actions or reactions. CADET produced the following outputs: Synchronization matrix, directly editable and printable; synchronization matrix is a kind of Gantt chart that shows assignments of activities to units, to locations/routes and to time periods Map overlays in PPT or JPG formats Animation output XML formally-encoded plan Textual Operation Plan (OPLAN) draft E-mail messages with attachments: XML and text versions of OPLAN == Design == The core algorithm is a planning algorithm where CADET uses a knowledge-based approach of the hierarchical-task-network type. Each task class is associated with a model of more detailed subtasks that should be performed in order to accomplish the higher-level task. Algorithms selected (heuristically) a task and then decomposes it into subtasks. Although similar to hierarchical-task-network planning algorithm, CADET’s algorithm includes elements of adversarial reasoning. After adding a subtask, the algorithm uses rules to determine the enemy’s probable actions and reactions as well as friendly counteractions This approximated the action-reaction-counteraction technique of manual wargaming used by the United States Army. When a task involves movements of a unit, the algorithm performs routing, i.e., finds a route for the movement that minimizes the time required for the movement as well as exposure to the enemy attacks. Each added tasks (subtask) normally requires a unit which would execute the task, and a time period when the task would be executed. Therefore, when a certain number of subtasks is added by the planning process, the algorithm also performs the allocation of the newly added subtasks to units and to time periods (i.e., scheduling). allocation and scheduling of tasks relies on both domain-specific and constraint-guided heuristics. A tasks may also require expenditures of fuel and ammunition. If the tasks involves engagement with the enemy, the performing units will experience lossesof personnel and weapon systems (attrition). CADET’s algorithm includes estimates of consumption of different types of consumables, and also attrition. Depending on the degree of attrition and consumption, CADET adds tasks that are needed to refuel or reconstitute the units. The algorithm continually interleaves incremental steps of planning, routing, scheduling, and attrition and consumption estimates. == Evaluation == Two evaluation experiments are described in literature. The first experiment called ICCES took three days. The subjects were Army officers from combat arms branches, with 11 to 23 years of active service, in the ranks of majors and lieutenant colonels, a total of 8. Each officer was given 4 hours of training learning to operate CADET and related computer tools. Officers were divided into two groups and given a tactical scenario. One group (the control group) used the traditional, manual process; the other used the system called ICCES, the automated core of which was CADET. Each group produced three COA sketches and statements and one COA synchronization matrix. Then, the experiment was repeated with another scenario but the control group became the automated group and vice versa. The users were generally satisfied with the quality of the ICCES-generated products. The group using ICCES made only a few changes to the product that was automatically generated, indicating that they agreed with the majority of the plan that ICCES produced. The second experiment was reminiscent of Turing test. The experiment involved one user, nine judges (active-duty officers, mainly colonels and lieutenant colonels), and five scenarios obtained from several US Army exercises. For each scenario, experimenters obtained synchronization matrices that were produced in earlier exercises, typically by a team of four to five officers in three to four hours, spending approximately 16 person-hours in total. Using these scenarios and COAs, the user had CADET generate automatically detailed plans and express them as synchronization matrices. The user, a retired US Army officer, reviewed and slightly edited the matrices. The entire process took less than two minutes of computations by and approximately 20 minutes of review and post-editing, approximately 0.4 person-hour in total per product. The experimenters gave the resulting matrices the same visual style as those produced by humans. The judges, who did not know whether a planning product was a traditional product of humans, or with computerized aids, were asked to grade the products. The result was that the average grades for manual products and CADET-generated products were statistically indistinguishable, even though CADET-generated products required far less time to produce. == Legacy == CADET served as “...an example of how even relatively basic A

    Read more →
  • Stephanie Dinkins

    Stephanie Dinkins

    Stephanie Dinkins (born 1964) is a transdisciplinary American artist based in Brooklyn, New York. She creates art about artificial intelligence (AI) as it intersects race, gender, and history. Her aim is to "create a unique culturally attuned AI entity in collaboration with coders, engineers and in close consultation with local communities of color that reflects and is empowered to work toward the goals of its community." Dinkins projects include Conversations with Bina48, a series of conversations between Dinkins and the first social, artificially intelligent humanoid robot BINA48 who looks like a black woman and Not the Only One, a multigenerational artificially intelligent memoir trained off of three generations of Dinkins's family. == Early life and education == Dinkins was born in Perth Amboy, New Jersey to Black American parents who raised her in Staten Island, New York. She credits her grandmother with teaching her how to think about art as a social practice, saying "my grandmother . . . was a gardener and the garden was her art . . . that was a community practice." Dinkins attended the International Center of Photography School in New York City in 1995, where she completed the general studies in photography certificate program. Dinkins received a MFA in photography from the Maryland Institute College of Art in 1997 She completed the Independent Study Program at the Whitney Museum of American Art in 1998. == Career == Dinkins is the Yayoi Kusama Professor of Art at Stony Brook University in New York. == Activism == Dinkins advocates for co-creation within a social practice art framework, so that vulnerable communities understand how to use technology to their advantage, instead of being subjected to their use. This is exemplified in her works such as Project al-Khwarzmi, a series of workshops entitled PAK POP-UP at the nonprofit community center Recess in Brooklyn, NY. The workshops involved collaborating with youth in the criminal justice system and uplifting the voices of vulnerable communities in determining how technologies are created and utilized. Dinkins warns of the dangers to members of minority groups that are absent from the creation of the computer algorithms that now affect their lives. == Art == Dinkins's practice employs technologies including, but not limited to, new media such as artificial intelligence and machine learning. Dinkins uses oral history techniques of interviewing to craft community-authored narratives and databases which inform the subjects of her work and serve as acts of social intervention or protest. === Conversations with Bina48 (2014–present) === Dinkins began working on Conversations with Bina48 in 2014. For the series, Dinkins recorded her conversations with BINA48, a social robot that resembles a middle-aged black woman. Dinkins mirrors Bina48 while they discuss identity and technological singularity. In 2010, Hanson Robotics, an engineering and robotics company known for its development of humanoid robots, developed and released BINA48. Bina48 is a robot modeled after the memories, beliefs, attitudes, commentary and mannerisms of Bina Aspen Rothblatt, the spousal partner of Martine Rothblatt. Both Bina and Martine Rothblatt own Bina48 under their organization, the Terasem Movement Foundation. Five years after Bina48 was released, Dinkins came across a YouTube video of Bina48. She asked, "how did a black woman become the most advanced of the technologies at the time?" Her questioning led her to travel to Lincoln, Vermont (the site of the Terasem Movement Foundation) where she conducted a series of interviews with Bina48 and engaged the robot in conversations pertaining to race, intimacy and the nature of being. The conversations suggest opportunities for complementing human existence with artificially intelligent agents that have an identity and history, but also show artificial intelligence's current limitations. Although it is based on a black woman, Dinkins found that Bina48 was shaped by the biases of its white, male creators. === Project al Kwarizmi (PAK) (2017–present) === Project al Kwarizmi (PAK) was a series of pop up workshops in Brooklyn, NY at Eyebeam and Recess; Manhattan, New York at Google; and Durham, North Carolina at Duke University. The workshops were centered for "communities of color that use art as a vehicle to help citizens understand how algorithms, the artificially intelligent systems they underpin, and big data impact their lives and empowers them to do something about it. Project al-Khwarizmi uses art and aesthetics as the common language to help citizens understand what algorithms and artificial intelligent systems are, and where these systems already impact our daily lives." === Not the Only One (N'TOO) (2018–present) === Not the only one (N’TOO) is a voice-interactive chatbot that was trained with data from members of her family to tell a multi-generational story. Dinkins described Not The Only One (NTOO or N'TOO) as an "experimental" multigenerational memoir of one Black American family told from the "mind" of an artificial intelligence of evolving intellect. N'TOO uses a recursive neural network, a deep learning algorithm. It is a voice-interactive AI robot designed, trained, and aligned with the needs and ideals of black and brown people who are drastically underrepresented in the tech sector. NTOO can also be described as a "physically embodied artificially intelligent agent that senses and acts on its world." == Exhibitions == Dinkins's work is exhibited internationally at various public, private, community, and institutional venues, including the Whitney Museum of American Art, the de Young Museum, the Philadelphia Museum of Art, the Studio Museum in Harlem;, Museum of Contemporary Photography, the Long Island Museum of American Art, History, and Carriages, the International Center of Photography in New York, Herning Kunstmuseum in Herning, Denmark, The Barbican in London, UK, Islip Art Museum, Wave Hill, Taller Boricua, the Queens Museum, and the corner of Putnam and Malcolm X Blvd in Bedford Stuyvesant, Brooklyn, New York. She has presented her work in symposia at the Museum of Modern Art, amongst other venues. == Future Histories Studio == Dinkins is the founder and director of Future Histories Studio, a research laboratory for arts-centered inquiry and production based at Stony Brook University. The studio was established with support from the Mellon Foundation as part of the Digital Inquiry, Speculation, Collaboration, and Optimism (DISCO) network. Future Histories Studio operates as an interdisciplinary hub exploring the intersections of art, technology, race, and storytelling through collaborative and practice-based research. Its activities include exhibitions, workshops, and public programs that examine the social and cultural implications of emerging technologies, particularly artificial intelligence and data systems. == Awards and recognition == Dinkins is the recipient of many awards, including: the 2023 LG Guggenheim Award, an international art prize established as part of a long-term global partnership between LG Group and the Solomon R. Guggenheim Museum to recognize groundbreaking artists in technology-based art; a Berggruen Institute artist fellowship; a Sundance New Frontiers Story Lab fellowship; a Soros Equality Fellowship; a Lucas Artists fellowship; a Creative Capital grant; a Bell Labs artist residency; a Blade of Grass fellowship; and a Data & Society fellowship. == Media coverage == Dinkins appeared in episode six of the HBO television series Random Acts of Flyness directed by Terence Nance, where she described her conversations with BINA48. == Other activities == Dinkins was part of the juries that selected Shu Lea Cheang for the LG Guggenheim Award in 2024.

    Read more →
  • Willy's Chocolate Experience

    Willy's Chocolate Experience

    Willy's Chocolate Experience was an unlicensed event based on Charlie and the Chocolate Factory that took place in Glasgow, Scotland, in February 2024. The event was promoted as an immersive and interactive family experience, illustrated on a promotional website with "dreamlike" AI-generated images. Once it was discovered that the event was held in a sparsely decorated warehouse, many customers complained, and the police were called to the venue. The event went viral on the Internet and attracted worldwide media attention. The event drew comparisons to the 2008 Lapland New Forest controversy, the 2014 Tumblr fan convention DashCon, and Billy McFarland's 2017 Fyre Festival. == Background and advertising == The event was stated to take place over the weekend of 24–25 February 2024. Promotional material advertised "stunning and intricately designed settings inspired by Roald Dahl's timeless tale" and "an array of delectable treats scattered throughout the experience". Both the website and promotional material used poor-quality AI-generated images, which included several spelling errors such as "cartchy tuns" and "a pasadise of sweet teats" and nonsensical words such as "catgacating" and "exarserdray". Tickets cost up to £35 per person. While the event was being promoted in early February, a Reddit user who saw Facebook advertisements suspected it to be a scam and was surprised that people were apparently buying tickets based solely on AI-generated images. The event was organised by House of Illuminati, a company registered to Billy Coull which claimed to offer "unparalleled immersive experiences". An investigation by Third Force News conducted after the event described Coull's previous "murky involvement in the charity sector." Coull had previously registered several other companies and claimed to work as a "consultant" for the now-defunct brand Empowerity, formerly known as the charity Gowanbank Community Hub. In 2021, Gowanbank was forced to remove claims of a £95-per-ticket fundraising "gala" at DoubleTree Glasgow which had been falsely advertised to feature TV personalities and performers including Gok Wan and Joe Black. Coull had claimed to be a doctor with a fake degree from a false university that provided "metaphysical degrees", and had attempted to use the charity to win the 2022 Glasgow City Council election in the seat of Greater Pollok, though he never registered for the election. In the summer of 2023, he independently published 17 AI-generated books on various topics, including vaccine conspiracy theories. Rolling Stone concluded that House of Illuminati's websites and event descriptions were likely written by an AI chatbot, such as ChatGPT. Three actors were hired to portray "Willy McDuff", a character based on Willy Wonka. One of them, Paul Connell, said that the cast were given one day to learn the script. Another actor playing Willy McDuff was 18-year-old Michael Archibald; the experience was his first ever acting job, and he was given the script at 6 pm on Friday before the event began on Saturday. Kirsty Paterson, an actress who played one of the Oompa-Loompas (called "Wonkidoodles" in the script), said that the job offer had been posted on Indeed.com and offered £500 for two days of work. The day before the event, the actors attended a dress rehearsal at the sparsely decorated venue. They were told that others would be working through the night on the production. When they returned on the day of the event, the venue was in the same condition. Paterson was given her costume an hour before the event opened, saying that "We were just handed an Amazon box that probably arrived that morning." == Script == The script for the event is titled Wonkidoodles at McDuff's Chocolate Factory: A Script, and describes Willy McDuff leading an audience through the Garden of Enchantment and the Twilight Tunnel. Once there, they are confronted by a character called The Unknown, described as "an evil chocolate maker who lives in the walls" who seeks to steal the magical "Anti-Graffiti Gobstopper" from McDuff's Imagination Lab. The gobstopper is "a sweet so powerful, it can make any room sparkle without lifting a finger". McDuff defeats The Unknown by amplifying the power of the gobstopper and causing his enemy to be "gently swept up by a robotic vacuum, humorously ending the confrontation". The script was unusual in that it included stage directions for the audience, and descriptions of their reactions. Connell described it as "15 pages of AI-generated gibberish of me just monologuing these mad things", and compared the vacuum cleaner plot point to that of the Nintendo video game Luigi's Mansion. Interviewed after the event, Coull claimed to have written the script himself, using AI only to "check spelling, grammar, and continuity" as he said he had dyslexia. == Event == The event was held at the Box Hub Warehouse event space in Whiteinch, an industrial area of Glasgow. Customers described the venue as "little more than an abandoned, empty warehouse", with set dressings including a small bouncy castle, AI-generated backdrop images pinned to some of the walls, and props which were "strewn about on bare concrete floors". The venue's windows were dirty and its air conditioning systems were left exposed. Paterson has stated that by the time she saw the venue, she had already signed her contract and "didn't want to disappoint the kids", and thus chose to proceed with the work. The Unknown was played by a 16-year-old actress named Felicia Dawkins, who wore a silver mask and a black cloak. Young children were frightened by the character, who appeared from behind a large rectangular mirror. Despite the script calling for The Unknown to be defeated with a vacuum cleaner, no such prop was provided, and actors were instead asked to improvise. Connell said that he and other employees were told to give each child "two jelly beans and a quarter of a cup of lemonade", although the limited supply of jelly beans quickly ran out. Paterson and another "Wonkidoodle" actress, Jenny Fogarty, said that after the first three 45-minute performances, the cast were told to abandon the script and instead let guests walk through the venue, a process that Paterson said took "about two minutes". The character of The Unknown, previously introduced as the main antagonist, was now "scaring children for no reason". One of the actors playing McDuff improvised the idea that children should pull a "silly face" at The Unknown to scare them away, but Dawkins said that, in other cases, she "just had to awkwardly walk back to my corner". Connell was told he would be given a 15-minute break every 45 minutes, but on the day of the event, he played Willy McDuff for three and a half hours without a break. After returning from a lunch break, Connell encountered a crowd of customers demanding refunds from Coull, and the other actors were unsure what to do next. After being told that the event was now cancelled halfway through its opening day, the actors left and went to a pub. Upon returning to the venue some time later, Connell said that he felt "the threat of violence had become quite high" and that there were two police vans and two squad cars at the scene. == Customer reviews and response == Willy's Chocolate Experience was widely criticised by those who attended it, many of whom demanded refunds. One customer, who had driven with his children for two hours to reach the event, described it as an "absolute con". Other visitors who arrived after the event was closed and were not informed of its cancellation requested compensation for wasted rail fares. Following the event's cancellation, Coull offered to refund 850 people, a statement repeated by the event's Facebook page. Some Facebook users stated that they had received their money back. Paterson and Fogarty stated that they only received half of their paycheque. Box Hub, the organisation that had rented the warehouse to House of Illuminati, issued an apology on House of Illuminati's behalf, stating that they "either have no regards for the families and young children they have disappointed or are too embarrassed to comment", and offered to provide a venue free of charge for those who attended the event. House of Illuminati later stated that they would not host any future events. Coull deleted his LinkedIn profile, his YouTube channel, and his personal website in response to the controversy. A few days after the event, Connell said he felt that Coull was "probably one of the most disliked people in Glasgow right now". In an interview with The Sunday Times, Coull apologised for how the event turned out, saying he would accept responsibility. == Fundraising == In an interview with Wired magazine, Connell stated that he and the other actors were working with parents to provide a free show for the children who attended. Some items from the event were later auctioned for charity. The venue auctioned the leftover hand-written "even

    Read more →
  • Gemini Enterprise Agent Platform

    Gemini Enterprise Agent Platform

    Gemini Enterprise Agent Platform (formerly known as Vertex AI) is a managed machine learning (ML) and artificial intelligence (AI) platform developed by Google Cloud. It provides a unified environment for building, training, deploying, and scaling ML models and generative AI applications. The platform integrates tools for the full ML lifecycle, including data preparation, model training, evaluation, deployment, and monitoring, under a single API and user interface. Vertex AI was announced at Google I/O and released as a generally available product on May 18, 2021. At launch, Google described Vertex AI as unifying its AutoML offerings with its prior Cloud AI Platform capabilities, and as adding operational features intended to help teams move models from experimentation into production use. On April 22, 2026, Google announced Gemini Enterprise Agent Platform as the replacement evolution of Vertex AI. == History == Google Cloud announced the general availability of Vertex AI on May 18, 2021, at the Google I/O developer conference. The platform was designed to consolidate Google Cloud's previously separate ML offerings, including AutoML and the legacy AI Platform, into a single system. At launch, Google claimed that Vertex AI required roughly 80% fewer lines of code to train a model compared to competing platforms. In June 2023, Google made generative AI support in Vertex AI generally available, giving developers access to foundation models including PaLM 2, Imagen, and Codey through the platform's Model Garden and the newly launched Generative AI Studio. At the time of this launch, Model Garden included over 60 models from Google and its partners. In August 2023, at the Google Cloud Next conference, Google announced further updates to Vertex AI, including the addition of third-party models such as Claude 2 from Anthropic and Llama 2 from Meta to the Model Garden, as well as new tools called Vertex AI Extensions for connecting models to APIs for real-time data retrieval. At the same event, Vertex AI Search and Conversation were made generally available, providing enterprise search and chatbot capabilities powered by foundation models. In April 2024, at Google Cloud Next, the company introduced Vertex AI Agent Builder, a no-code tool for creating AI-powered conversational agents built on top of Gemini large language models. This brought together the existing Vertex AI Search and Conversation products with new developer tools for building generative AI experiences. == Features == === Model training === Vertex AI supports both AutoML, which enables code-free model training on tabular, image, text, or video data, and custom training, which gives users full control over the ML framework, training code, and hyperparameter tuning. The platform provides serverless training as well as dedicated training clusters with GPU and TPU accelerators. Vertex AI Vizier handles automatic hyperparameter tuning, and Vertex AI Experiments allows comparison and tracking of training runs. === Model Garden === The Vertex AI Model Garden is a curated catalog of over 200 enterprise-ready models, including Google's own foundation models (such as Gemini, Imagen, and Veo), third-party models (such as Anthropic's Claude and Mistral AI models), and popular open-source models (such as Llama and Gemma). Models are accessible as fully managed model-as-a-service APIs. === Pipelines (workflow orchestration) === Vertex AI Pipelines provides managed orchestration of ML workflows and supports pipelines built with the Kubeflow Pipelines SDK, among other options described in Google Cloud documentation. === Vertex AI Studio === Vertex AI Studio provides tools for prompt design, testing, and model management, allowing developers to prototype and build generative AI applications using natural language, code, images, or video. === Agent Builder and Agent Engine === Vertex AI Agent Builder is a suite of products for building, deploying, and governing AI agents in production environments. It supports development with the open-source Agent Development Kit (ADK) and other frameworks. Vertex AI Agent Engine provides the underlying infrastructure for deploying and scaling agents, with support for enterprise security features including HIPAA compliance, customer-managed encryption keys (CMEK), and VPC Service Controls. === Generative AI tooling and model access === Google markets Vertex AI as providing access to Google foundation models (including the Gemini family) and developer tools such as Vertex AI Studio, along with a model catalog that includes Google and selected open source models (marketed as "Model Garden"). Google has also offered products within Vertex AI aimed at building generative search and conversational applications, including offerings named "Vertex AI Search" and "Vertex AI Conversation" as reported in 2023 coverage of platform updates. === MLOps tools === The platform includes a range of MLOps capabilities: Vertex AI Pipelines for orchestrating and automating ML workflows as reusable pipelines. Vertex AI Feature Store for serving, sharing, and reusing ML features across projects. Vertex AI Model Registry for storing, versioning, and managing trained models. Vertex AI Model Monitoring for detecting training-serving skew and inference drift in deployed models. Vertex Explainable AI for interpreting model predictions. Vertex AI Workbench for managed JupyterLab notebook environments integrated with Google Cloud Storage and BigQuery. == Industry recognition == Google was named a Leader for the fifth consecutive year in the 2024 Gartner Magic Quadrant for Cloud AI Developer Services, a recognition that encompasses Vertex AI and its related offerings. Google was also recognized as a Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms and was named a Leader in the Forrester Wave for AI/ML Platforms, Q3 2024. In October 2025, Google was also named a Leader in the 2025 IDC (International Data Corporation) MarketScape for Worldwide GenAI Life-Cycle Foundation Model Software. == Pricing == Vertex AI uses a pay-as-you-go pricing model, with costs determined by the specific services consumed, including model training, prediction serving, and data storage. For generative AI tasks, pricing is based on a per-token model, with rates varying depending on the specific model used and whether tokens are input or output. Google offers a free tier for new users, which includes limited custom training hours and online prediction usage, along with an introductory US$300 in Google Cloud credits valid for 90 days. == Adoption == In the year following its 2021 launch, Google reported that usage of Vertex AI and BigQuery had driven 2.5 times more machine learning predictions compared to the prior year, and that active customers of Vertex AI Workbench had grown 25-fold over a six-month period. Early enterprise adopters included Ford, Wayfair, and Seagate, among others. Wayfair reported that it was able to run large model training jobs 5 to 10 times faster using the platform.

    Read more →
  • Stable Diffusion

    Stable Diffusion

    Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing AI boom. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. Its development involved researchers from the CompVis Group at LMU Munich and Runway with a computational donation from Stability and training data from non-profit organizations. Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been released publicly, and an optimized version can run on most consumer hardware equipped with a modest GPU with as little as 2.4 GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services. == Development == Stable Diffusion originated from a project called Latent Diffusion, developed in Germany by researchers at LMU Munich in Munich and Heidelberg University. Four of the original 5 authors (Robin Rombach, Andreas Blattmann, Patrick Esser and Dominik Lorenz) later joined Stability AI and released subsequent versions of Stable Diffusion. The technical license for the model was released by the CompVis group at LMU Munich. Development was led by Patrick Esser of Runway and Robin Rombach of CompVis, who were among the researchers who had earlier invented the latent diffusion model architecture used by Stable Diffusion. Stability AI also credited EleutherAI and LAION (a German nonprofit which assembled the dataset on which Stable Diffusion was trained) as supporters of the project. == Technology == === Architecture === Diffusion models, introduced in 2015, are trained with the objective of removing successive applications of Gaussian noise on training images, which can be thought of as a sequence of denoising autoencoders. The name diffusion is from the thermodynamic diffusion, since they were first developed with inspiration from thermodynamics. Models in Stable Diffusion series before SD 3 all used a variant of diffusion models, called latent diffusion model (LDM), developed in 2021 by the CompVis (Computer Vision & Learning) group at LMU Munich. Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder. The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image. Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion. The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation. Finally, the VAE decoder generates the final image by converting the representation back into pixel space. The denoising step can be flexibly conditioned on a string of text, an image, or another modality. The encoded conditioning data is exposed to denoising U-Nets via a cross-attention mechanism. For conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts to an embedding space. Researchers point to increased computational efficiency for training and generation as an advantage of LDMs. With 860 million parameters in the U-Net and 123 million in the text encoder, Stable Diffusion is considered relatively lightweight by 2022 standards, and unlike other diffusion models, it can run on consumer GPUs, and even CPU-only if using the OpenVINO version of Stable Diffusion. ==== SD XL ==== The XL version uses the same LDM architecture as previous versions, except larger: larger UNet backbone, larger cross-attention context, two text encoders instead of one, and trained on multiple aspect ratios (not just the square aspect ratio like previous versions). The SD XL Refiner, released at the same time, has the same architecture as SD XL, but it was trained for adding fine details to preexisting images via text-conditional img2img. ==== SD 3.0 ==== The 3.0 version completely changes the backbone. Not a UNet, but a Rectified Flow Transformer, which implements the rectified flow method with a Transformer. The Transformer architecture used for SD 3.0 has three "tracks", for original text encoding, transformed text encoding, and image encoding (in latent space). The transformed text encoding and image encoding are mixed during each transformer block. The architecture is named "multimodal diffusion transformer (MMDiT), where the "multimodal" means that it mixes text and image encodings inside its operations. This differs from previous versions of DiT, where the text encoding affects the image encoding, but not vice versa. === Training data === Stable Diffusion was trained on pairs of images and captions taken from LAION-5B, a publicly available dataset derived from Common Crawl data scraped from the web, where 5 billion image-text pairs were classified based on language and filtered into separate datasets by resolution, a predicted likelihood of containing a watermark, and predicted "aesthetic" score (e.g. subjective visual quality). The dataset was created by LAION, a German non-profit which receives funding from Stability AI. The Stable Diffusion model was trained on three subsets of LAION-5B: laion2B-en, laion-high-resolution, and laion-aesthetics v2 5+. A third-party analysis of the model's training data identified that out of a smaller subset of 12 million images taken from the original wider dataset used, approximately 47% of the sample size of images came from 100 different domains, with Pinterest taking up 8.5% of the subset, followed by websites such as WordPress, Blogspot, Flickr, DeviantArt and Wikimedia Commons. An investigation by Bayerischer Rundfunk showed that LAION's datasets, hosted on Hugging Face, contain large amounts of private and sensitive data. === Training procedures === The model was initially trained on the laion2B-en and laion-high-resolution subsets, with the last few rounds of training done on LAION-Aesthetics v2 5+, a subset of 600 million captioned images which the LAION-Aesthetics Predictor V2 predicted that humans would, on average, give a score of at least 5 out of 10 when asked to rate how much they liked them. The LAION-Aesthetics v2 5+ subset also excluded low-resolution images and images which LAION-5B-WatermarkDetection identified as carrying a watermark with greater than 80% probability. Final rounds of training additionally dropped 10% of text conditioning to improve Classifier-Free Diffusion Guidance. The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of $600,000. === Limitations === Stable Diffusion has issues with degradation and inaccuracies in certain scenarios. Initial releases of the model were trained on a dataset that consists of 512×512 resolution images, meaning that the quality of generated images noticeably degrades when user specifications deviate from its "expected" 512×512 resolution; the version 2.0 update of the Stable Diffusion model later introduced the ability to natively generate images at 768×768 resolution. Another challenge is in generating human limbs due to poor data quality of limbs in the LAION database. The model is insufficiently trained to replicate human limbs and faces due to the lack of representative features in the database, and prompting the model to generate images of such type can confound the model. In addition to human limbs, Stable Diffusion is unable to generate legible ambigrams and some other forms of text and typography. Stable Diffusion XL (SDXL) version 1.0, released in July 2023, introduced native 1024x1024 resolution and improved generation for limbs and text. Accessibility for individual developers can also be a problem. In order to customize the model for new use cases that are not included in the dataset, such as generating anime characters ("waifu diffusion"), new data and further training are required. Fine-tuned adaptations of Stable Diffusion created through additional retraining have been used for a variety of different use-cases, from medical imaging to algorithmically generated music. However, this fine-tuning process is sensitive to the quality of new data; low resolution images or different resolutions from the original data can not only fail to learn the new task but degrade the overall performance of the model. Even when the model is additionally trained on high quality images, it is difficult for individuals to run models in consumer electronics. For example, the training process for waifu-diffusion requires a minimum 30 GB of VRAM, which exceeds the usual resource provided in such consumer GPUs as Nvidia's GeForce 30 series, w

    Read more →
  • Privacy Lost

    Privacy Lost

    Privacy Lost is a 2023 short science fiction film directed by Peter Stoel and Robert Berger. It follows a family using augmented reality (AR) and artificial intelligence (AI) devices capable of reading emotional states, raising questions about privacy and manipulation. == Premise == Privacy Lost follows a family using AR glasses that capture and interpret emotions in real time. As the parents argue in a restaurant, their emotional states and even hidden feelings become visible through these glasses. An AI-driven waiter adapts its appearance for each family member, employing emotional data to influence their decisions. == Cast == Brian Kant as Waiter Michael Krass as Husband Estelle Levinson as Waitress Thor van der Linden as Scotty Carlijn van Ramshorst as Wife == Production == Filming took place at HeadQ Productions, a virtual studio located in Amsterdam. The creators sought to depict a near-future scenario in which real-time emotion analysis becomes part of daily interactions. The film was screened at the Augmented World Expo (AWE), where it was noted for its thematic focus on AI-driven manipulation and emotional tracking. The depiction of AR glasses and AI characters integrates modern visual effects to show how devices might analyze emotional responses in real time. It also depicts how AI-driven interactions could influence consumer decisions, pointing to concerns over potential misuse. == Themes == Privacy Lost focuses on the intersection of advanced AI capabilities and AR environments, showing how real-time emotional analysis can be leveraged for targeted persuasion. The film aims to highlight the social and ethical implications of emerging AR and AI technologies, underlining how establishing clear regulatory frameworks for them is necessary to protect individual privacy, govern the storage of emotion-based data, and prevent manipulative practices. Critics describe the film’s theme as dystopian and note that such a reality is unlikely to occur in the near future. However, despite the exaggerated scenario, the film emphasizes the importance of a responsible approach by developers toward emerging technologies.

    Read more →