AI Detector No Character Limit

AI Detector No Character Limit — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

Oculus Quill

Quill is a painting and animation software for virtual reality. It runs on Microsoft Windows with Oculus Rift headsets. It is used to create 3D paintings and animated cartoons. Quill was released on November 29, 2016, on the Oculus Store. Theater Elsewhere(formerly Quill Theater), an application for viewing creations made in Quill, was later made available following the release of the Oculus Quest. In September 2021, Facebook, now known as Meta Platforms, and the owner of Oculus, sold Quill to its original creator, who continues to develop and support the app. == Development == Quill was originally developed by Oculus Story Studio as an internal tool for the creative needs of the studio's project Dear Angelica directed by Saschka Unseld along with its art-director Wesley Allsbrook. == Controls == The software works on Oculus Rift utilizing its 6DoF motion controllers. Users can paint in 3D space using their hands naturally, and animate those paintings with keyframes. They can also capture videos and photos of their creations. == Reception == Dear Angelica, a VR story fully painted in Quill, was nominated for an Emmy Award in 2017.
Read more →
Smartglasses

Smartglasses or smart glasses are eye or head-worn wearable computers. Many smartglasses include displays that add information alongside or to what the wearer sees. Alternatively, smartglasses are sometimes defined as glasses that are able to change their optical properties, such as smart sunglasses that are programmed to change tint by electronic means. Alternatively, smartglasses are sometimes defined as glasses that include headphone functionality. A pair of smartglasses can be considered an augmented reality device if it performs pose tracking. Superimposing information onto a field of view is achieved through an optical head-mounted display (OHMD) or embedded wireless glasses with transparent heads-up display (HUD) or augmented reality (AR) overlay. These systems have the capability to reflect projected digital images as well as allowing the user to see through it or see better with it. While early models can perform basic tasks, such as serving as a front end display for a remote system, as in the case of smartglasses utilizing cellular technology or Wi-Fi, modern smart glasses are effectively wearable computers which can run self-contained mobile apps. Some are handsfree and can communicate with the Internet via natural language voice commands, while others use touch buttons. Like other computers, smartglasses may collect information from internal or external sensors. It may control or retrieve data from other instruments or computers. In most cases, it supports wireless technologies like Bluetooth, Wi-Fi, and GPS. A small number of models run a mobile operating system and function as portable media players to send audio and video files to the user via a Bluetooth or WiFi headset. Some smartglasses models also feature full lifelogging and activity tracker capability. Smartglasses devices may also have features found on a smartphone. Some have activity tracker functionality features (also known as "fitness tracker") as seen in some GPS watches. == Features and applications == As with other lifelogging and activity tracking devices, the GPS tracking unit and digital camera of some smartglasses can be used to record historical data. For example, after the completion of a workout, data can be uploaded into a computer or online to create a log of exercise activities for analysis. Some smart watches can serve as full GPS navigation devices, displaying maps and current coordinates. Users can "mark" their current location and then edit the entry's name and coordinates, which enables navigation to those new coordinates. Although some smartglasses models manufactured in the 21st century are completely functional as standalone products, most manufacturers recommend or even require that consumers purchase mobile phone handsets that run the same operating system so that the two devices can be synchronized for additional and enhanced functionality. The smartglasses can work as an extension, for head-up display (HUD) or remote control of the phone and alert the user to communication data such as calls, SMS messages, emails, and calendar invites. === Security applications === Smart glasses could be used as a body camera. In 2018, Chinese police in Zhengzhou and Beijing were using smart glasses to take photos which are compared against a government database using facial recognition to identify suspects, retrieve an address, and track people moving beyond their home areas. === Sport applications === Smart glasses are used in sports like cycling, running, skiing, golf, tennis, or sailing, giving athletes real-time, heads-up data without looking down at the screen of a watch or smartphone. In 2025, Meta has announced a new partnership with sports eyewear brand Oakley. === Healthcare applications === Several proofs of concept for Google Glasses have been proposed in healthcare. In July 2013, Lucien Engelen started research on the usability and impact of Google Glass in health care. Engelen, who is based at Singularity University and in Europe at Radboud University Medical Center, is participating in the Glass Explorer program. Key findings of Engelen's research included: The quality of pictures and video are usable for healthcare education, reference, and remote consultation. The camera needs to be tilted to different angle for most of the operative procedures Tele-consultation is possible—depending on the available bandwidth—during operative procedures. A stabilizer should be added to the video function to prevent choppy transmission when a surgeon looks to screens or colleagues. Battery life can be easily extended with the use of an external battery. Controlling the device and/or programs from another device is needed for some features because of a sterile environment. Text-to-speech ("Take a Note" to Evernote) exhibited a correction rate of 60 percent, without the addition of a medical thesaurus. A protocol or checklist displayed on the screen of Google Glass can be helpful during procedures. Dr. Phil Haslam and Dr. Sebastian Mafeld demonstrated the first concept for Google Glass in the field of interventional radiology. They demonstrated the manner in which the concept of Google Glass could assist a liver biopsy and fistulaplasty, and the pair stated that Google Glass has the potential to improve patient safety, operator comfort, and procedure efficiency in the field of interventional radiology. In June 2013, surgeon Dr. Rafael Grossmann was the first person to integrate Google Glass into the operating theater, when he wore the device during a PEG (percutaneous endoscopic gastrostomy) procedure. In August 2013, Google Glass was also used at Wexner Medical Center at Ohio State University. Surgeon Dr. Christopher Kaeding used Google Glass to consult with a colleague in a distant part of Columbus, Ohio. A group of students at The Ohio State University College of Medicine also observed the operation on their laptop computers. Following the procedure, Kaeding stated, "To be honest, once we got into the surgery, I often forgot the device was there. It just seemed very intuitive and fit seamlessly." 16 November 2013, in Santiago de Chile, the maxillofacial team led by Dr.gn Antonio Marino conducted the first orthognathic surgery assisted with Google Glass in Latin America, interacting with them and working with simultaneous three-dimensional navigation. The surgical team was interviewed by ADN radio. In January 2014, Indian Orthopedic Surgeon Selene G. Parekh conducted the foot and ankle surgery using Google Glass in Jaipur, which was broadcast live on Google website via the internet. The surgery was held during a three-day annual Indo-US conference attended by a team of experts from the US and co-organized by Ashish Sharma. Sharma said Google Glass allows looking at an X-Ray or MRI without taking the eye off of the patient and allows a doctor to communicate with a patient's family or friends during a procedure. In Australia, during January 2014, Melbourne tech startup Small World Social collaborated with the Australian Breastfeeding Association to create the first hands-free breastfeeding Google Glass application for new mothers. The application, named Google Glass Breastfeeding app trial, allows mothers to nurse their baby while viewing instructions about common breastfeeding issues (latching on, posture etc.) or call a lactation consultant via a secure Google Hangout, who can view the issue through the mother's Google Glass camera. The trial was successfully concluded in Melbourne in April 2014, and 100% of participants were breastfeeding confidently. == Display types == Various techniques have existed for see-through HMDs. Most of these techniques can be summarized into two main families: "Curved Mirror" (or Curved Combiner) based and "Waveguide" or "Light-guide" based. The mirror technique has been used in EyeTaps, by Meta in their Meta 1, by Vuzix in their Star 1200 product, by Olympus, and by Laster Technologies. Various waveguide techniques have existed for some time. These techniques include diffraction optics, holographic optics, polarized optics, reflective optics, and projection: Diffractive waveguide – slanted diffraction grating elements (nanometric 10E-9). Nokia technique now licensed to Vuzix. Holographic waveguide – 3 holographic optical elements (HOE) sandwiched together (RGB). Used by Sony and Konica Minolta. Reflective waveguide – A thick light guide with single semi-reflective mirror is used by Epson in their Moverio product. A curved light guide with partial-reflective segmented mirror array to out-couple the light is used by tooz technologies GmbH. Virtual retinal display (VRD) – Also known as a retinal scan display (RSD) or retinal projector (RP), is a display technology that draws a raster display (like a television) directly onto the retina of the eye - developed by MicroVision, Inc. OLED microdisplays for near-eye applications (outdoor optical equipment, night vision glasses, ocular equipment for medical devices, augme
Read more →
Question (short story)

"Question" is a science fiction short story by American writer Isaac Asimov. The story first appeared in the March 1955 issue of Computers and Automation (thought to be the first computer magazine), and was reprinted in the April 30, 1957, issue of Science World. It is the first of a loosely connected series of stories concerning a fictional supercomputer called Multivac. The story concerns two technicians who are servicing Multivac, and their argument over whether or not the machine is truly intelligent and able to think. Multivac, however, supplies the answer on its own. After the reprint, another author, Robert Sherman Townes, noticed the climax in the last sentence was very similar to one of his own stories, "Problem for Emmy" (Startling Stories, June 1952), and wrote to Asimov about it. After searching in his library, Asimov did find the original story and, although he did not recall having read it, admitted that the endings were pretty similar. He then replied to Townes, apologizing and promising the story would never again be published, and it never was. Asimov mentioned "Question" in an editorial called "Plagiarism" which appeared in the August 1985 issue of Asimov's Science Fiction (although he did not mention Townes' name or the title of either story). "Plagiarism" was reprinted in Asimov's collection Gold (1995).
Read more →
ICAART

The International Conference on Agents and Artificial Intelligence (ICAART) is a meeting point for researchers (among others) with interest in the areas of Agents and Artificial Intelligence. There are 2 tracks in ICAART, one related to Agents and Distributed AI in general and the other one focused in topics related to Intelligent Systems and Computational Intelligence. The conference program is composed of several different kind of sessions like technical sessions, poster sessions, keynote lectures, tutorials, special sessions, doctoral consortiums, panels and industrial tracks. The papers presented in the conference are made available at the SCITEPRESS digital library, published in the conference proceedings and some of the best papers are invited to a post-publication with Springer. ICAART's first edition was in 2009 counting with several keynote speakers like Marco Dorigo, Edward H. Shortliffe and Eduard Hovy. Since then, the conference had several other invited speakers like Katia Sycara, Nick Jennings, Robert Kowalski, Boi Faltings and Tim Finin. Bart Selman is one of the names confirmed for the next edition of this conference. Since 2012 the conference is held in conjunction with 2 other conferences: the International Conference on Operations Research and Enterprise Systems (ICORES) and the International Conference on Pattern Recognition Applications and Methods (ICPRAM). == Areas == === Agents === Agent communication languages Cooperation and Coordination Distributed Problem Solving Economic Agent Models Emotional Intelligence Group Decision Making Intelligent Auctions and Markets Mobile Agents Multi-agent systems Negotiation and Interaction Protocols Nep News Detection Agent Models and Architectures Physical Agents at Work Privacy, Safety and Security Programming Environments and Languages Robot and Multi-Robot Systems Self Organizing Systems Semantic Web Simulation Swarm Intelligence Task Planning and Execution Transparency and Ethical Issues Agent-Oriented Software Engineering Web Intelligence Agent Platforms and Interoperability Autonomous systems Cloud Computing and Its Impact Cognitive robotics Collective Intelligence Conversational Agents === Artificial intelligence === AI and Creativity Deep Learning Evolutionary Computing Fuzzy Systems Hybrid Intelligent Systems Industrial Applications of AI Intelligence and Cybersecurity Intelligent User Interfaces Knowledge Representation and Reasoning Knowledge-Based Systems Ambient Intelligence Machine learning Model-Based Reasoning Natural Language Processing Neural Networks Ontologies Planning and Scheduling Social Network Analysis Soft Computing State Space Search Bayesian Networks Uncertainty in AI Vision and Perception Visualization Big Data Case-Based Reasoning Cognitive Systems Constraint Satisfaction Data Mining Data Science == Editions == === ICAART 2023 – Lisbon, Portugal === === ICAART 2020 – Valletta, Malta === === ICAART 2019 – Prague, Czech Republic === Proceedings - Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-350-6 Proceedings - Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-350-6 === ICAART 2018 – Funchal, Madeira, Portugal === Proceedings - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-275-2 Proceedings - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-275-2 === ICAART 2017 – Porto, Portugal === Proceedings - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-219-6 Proceedings - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-220-2 === ICAART 2016 – Rome, Italy === Proceedings - Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-172-4 Proceedings - Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-172-4 === ICAART 2015 – Lisbon, Portugal === Proceedings - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-073-4 Proceedings - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-074-1 === ICAART 2014 – ESEO, Angers, Loire Valley, France === Proceedings - Proceedings of the 6th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-015-4 Proceedings - Proceedings of the 6th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-016-1 === ICAART 2013 – Barcelona, Spain === Proceedings - Proceedings of the 5th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8565-38-9 Proceedings - Proceedings of the 5th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8565-39-6 === ICAART 2012 – Vilamoura, Algarve, Portugal === Proceedings - Proceedings of the 4th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8425-95-9 Proceedings - Proceedings of the 4th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8425-96-6 === ICAART 2011 – Rome, Italy === Proceedings - Proceedings of the 3rd International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8425-40-9 Proceedings - Proceedings of the 3rd International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8425-41-6 === ICAART 2010 – Valencia, Spain === Proceedings - Proceedings of the 2nd International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-674-021-4 Proceedings - Proceedings of the 2nd International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-674-022-1 === ICAART 2009 – Porto, Portugal === Proceedings - Proceedings of the 1st International Conference on Web Information Systems and Technologies. ISBN 978-989-8111-66-1
Read more →
Motor theory of speech perception

The motor theory of speech perception is the hypothesis that people perceive spoken words by identifying the vocal tract gestures with which they are pronounced rather than by identifying the sound patterns that speech generates. It originally claimed that speech perception is done through a specialized module that is innate and human-specific. Though the idea of a module has been qualified in more recent versions of the theory, the idea remains that the role of the speech motor system is not only to produce speech articulations but also to detect them. The hypothesis has gained more interest outside the field of speech perception than inside. This has increased particularly since the discovery of mirror neurons that link the production and perception of motor movements, including those made by the vocal tract. The theory was initially proposed in the Haskins Laboratories in the 1950s by Alvin Liberman and Franklin S. Cooper, and developed further by Donald Shankweiler, Michael Studdert-Kennedy, Ignatius Mattingly, Carol Fowler and Douglas Whalen. == Origins and development == The hypothesis has its origins in research using pattern playback to create reading machines for the blind that would substitute sounds for orthographic letters. This led to a close examination of how spoken sounds correspond to the acoustic spectrogram of them as a sequence of auditory sounds. This found that successive consonants and vowels overlap in time with one another (a phenomenon known as coarticulation). This suggested that speech is not heard like an acoustic "alphabet" or "cipher," but as a "code" of overlapping speech gestures. === Associationist approach === Initially, the theory was associationist: infants mimic the speech they hear and that this leads to behavioristic associations between articulation and its sensory consequences. Later, this overt mimicry would be short-circuited and become speech perception. This aspect of the theory was dropped, however, with the discovery that prelinguistic infants could already detect most of the phonetic contrasts used to separate different speech sounds. === Cognitivist approach === The behavioristic approach was replaced by a cognitivist one in which there was a speech module. The module detected speech in terms of hidden distal objects rather than at the proximal or immediate level of their input. The evidence for this was the research finding that speech processing was special such as duplex perception. === Changing distal objects === Initially, speech perception was assumed to link to speech objects that were both the invariant movements of speech articulators the invariant motor commands sent to muscles to move the vocal tract articulators This was later revised to include the phonetic gestures rather than motor commands, and then the gestures intended by the speaker at a prevocal, linguistic level, rather than actual movements. === Modern revision === The "speech is special" claim has been dropped, as it was found that speech perception could occur for nonspeech sounds (for example, slamming doors for duplex perception). === Mirror neurons === The discovery of mirror neurons has led to renewed interest in the motor theory of speech perception, and the theory still has its advocates, although there are also critics. == Support == === Nonauditory gesture information === If speech is identified in terms of how it is physically made, then nonauditory information should be incorporated into speech percepts even if it is still subjectively heard as "sounds". This is, in fact, the case. The McGurk effect shows that seeing the production of a spoken syllable that differs from an auditory cue synchronized with it affects the perception of the auditory one. In other words, if someone hears "ba" but sees a video of someone pronouncing "ga", what they hear is different—some people believe they hear "da". People find it easier to hear speech in noise if they can see the speaker. People can hear syllables better when their production can be felt haptically. === Categorical perception === Using a speech synthesizer, speech sounds can be varied in place of articulation along a continuum from /bɑ/ to /dɑ/ to /ɡɑ/, or in voice onset time on a continuum from /dɑ/ to /tɑ/ (for example). When listeners are asked to discriminate between two different sounds, they perceive sounds as belonging to discrete categories, even though the sounds vary continuously. In other words, 10 sounds (with the sound on one extreme being /dɑ/ and the sound on the other extreme being /tɑ/, and the ones in the middle varying on a scale) may all be acoustically different from one another, but the listener will hear all of them as either /dɑ/ or /tɑ/. Likewise, the English consonant /d/ may vary in its acoustic details across different phonetic contexts (the /d/ in /du/ does not technically sound the same as the one in /di/, for example), but all /d/'s as perceived by a listener fall within one category (voiced alveolar plosive) and that is because "linguistic representations are abstract, canonical, phonetic segments or the gestures that underlie these segments." This suggests that humans identify speech using categorical perception, and thus that a specialized module, such as that proposed by the motor theory of speech perception, may be on the right track. === Speech imitation === If people can hear the gestures in speech, then the imitation of speech should be very fast, as in when words are repeated that are heard in headphones as in speech shadowing. People can repeat heard syllables more quickly than they would be able to produce them normally. === Speech production === Hearing speech activates vocal tract muscles, and the motor cortex and premotor cortex. The integration of auditory and visual input in speech perception also involves such areas. Disrupting the premotor cortex disrupts the perception of speech units such as plosives. The activation of the motor areas occurs in terms of the phonemic features which link with the vocal track articulators that create speech gestures. The perception of a speech sound is aided by pre-emptively stimulating the motor representation of the articulators responsible for its pronunciation . Auditory and motor cortical coupling is restricted to a specific range of neuronal firing frequency. === Perception-action meshing === Evidence exists that perception and production are generally coupled in the motor system. This is supported by the existence of mirror neurons that are activated both by seeing (or hearing) an action and when that action is carried out. Another source of evidence is that for common coding theory between the representations used for perception and action. == Criticisms == The motor theory of speech perception is not widely held in the field of speech perception, though it is more popular in other fields, such as theoretical linguistics. As three of its advocates have noted, "it has few proponents within the field of speech perception, and many authors cite it primarily to offer critical commentary".p. 361 Several critiques of it exist. === Multiple sources === Speech perception is affected by nonproduction sources of information, such as context. Individual words are hard to understand in isolation but easy when heard in sentence context. It therefore seems that speech perception uses multiple sources that are integrated together in an optimal way. === Production === The motor theory of speech perception would predict that speech motor abilities in infants predict their speech perception abilities, but in actuality it is the other way around. It would also predict that defects in speech production would impair speech perception, but they do not. However, this only affects the first and already superseded behaviorist version of the theory, where infants were supposed to learn all production-perception patterns by imitation early in childhood. This is no longer the mainstream view of motor-speech theorists. === Speech module === Several sources of evidence for a specialized speech module have failed to be supported. Duplex perception can be observed with door slams. The McGurk effect can also be achieved with nonlinguistic stimuli, such as showing someone a video of a basketball bouncing but playing the sound of a ping-pong ball bouncing. As for categorical perception, listeners can be sensitive to acoustic differences within single phonetic categories. As a result, this part of the theory has been dropped by some researchers. === Sublexical tasks === The evidence provided for the motor theory of speech perception is limited to tasks such as syllable discrimination that use speech units not full spoken words or spoken sentences. As a result, "speech perception is sometimes interpreted as referring to the perception of speech at the sublexical level. However, the ultimate goal of these studies is presumably to understand the neural processes supporting the ability to process spee
Read more →
Stable Diffusion

Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing AI boom. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. Its development involved researchers from the CompVis Group at LMU Munich and Runway with a computational donation from Stability and training data from non-profit organizations. Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been released publicly, and an optimized version can run on most consumer hardware equipped with a modest GPU with as little as 2.4 GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services. == Development == Stable Diffusion originated from a project called Latent Diffusion, developed in Germany by researchers at LMU Munich in Munich and Heidelberg University. Four of the original 5 authors (Robin Rombach, Andreas Blattmann, Patrick Esser and Dominik Lorenz) later joined Stability AI and released subsequent versions of Stable Diffusion. The technical license for the model was released by the CompVis group at LMU Munich. Development was led by Patrick Esser of Runway and Robin Rombach of CompVis, who were among the researchers who had earlier invented the latent diffusion model architecture used by Stable Diffusion. Stability AI also credited EleutherAI and LAION (a German nonprofit which assembled the dataset on which Stable Diffusion was trained) as supporters of the project. == Technology == === Architecture === Diffusion models, introduced in 2015, are trained with the objective of removing successive applications of Gaussian noise on training images, which can be thought of as a sequence of denoising autoencoders. The name diffusion is from the thermodynamic diffusion, since they were first developed with inspiration from thermodynamics. Models in Stable Diffusion series before SD 3 all used a variant of diffusion models, called latent diffusion model (LDM), developed in 2021 by the CompVis (Computer Vision & Learning) group at LMU Munich. Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder. The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image. Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion. The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation. Finally, the VAE decoder generates the final image by converting the representation back into pixel space. The denoising step can be flexibly conditioned on a string of text, an image, or another modality. The encoded conditioning data is exposed to denoising U-Nets via a cross-attention mechanism. For conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts to an embedding space. Researchers point to increased computational efficiency for training and generation as an advantage of LDMs. With 860 million parameters in the U-Net and 123 million in the text encoder, Stable Diffusion is considered relatively lightweight by 2022 standards, and unlike other diffusion models, it can run on consumer GPUs, and even CPU-only if using the OpenVINO version of Stable Diffusion. ==== SD XL ==== The XL version uses the same LDM architecture as previous versions, except larger: larger UNet backbone, larger cross-attention context, two text encoders instead of one, and trained on multiple aspect ratios (not just the square aspect ratio like previous versions). The SD XL Refiner, released at the same time, has the same architecture as SD XL, but it was trained for adding fine details to preexisting images via text-conditional img2img. ==== SD 3.0 ==== The 3.0 version completely changes the backbone. Not a UNet, but a Rectified Flow Transformer, which implements the rectified flow method with a Transformer. The Transformer architecture used for SD 3.0 has three "tracks", for original text encoding, transformed text encoding, and image encoding (in latent space). The transformed text encoding and image encoding are mixed during each transformer block. The architecture is named "multimodal diffusion transformer (MMDiT), where the "multimodal" means that it mixes text and image encodings inside its operations. This differs from previous versions of DiT, where the text encoding affects the image encoding, but not vice versa. === Training data === Stable Diffusion was trained on pairs of images and captions taken from LAION-5B, a publicly available dataset derived from Common Crawl data scraped from the web, where 5 billion image-text pairs were classified based on language and filtered into separate datasets by resolution, a predicted likelihood of containing a watermark, and predicted "aesthetic" score (e.g. subjective visual quality). The dataset was created by LAION, a German non-profit which receives funding from Stability AI. The Stable Diffusion model was trained on three subsets of LAION-5B: laion2B-en, laion-high-resolution, and laion-aesthetics v2 5+. A third-party analysis of the model's training data identified that out of a smaller subset of 12 million images taken from the original wider dataset used, approximately 47% of the sample size of images came from 100 different domains, with Pinterest taking up 8.5% of the subset, followed by websites such as WordPress, Blogspot, Flickr, DeviantArt and Wikimedia Commons. An investigation by Bayerischer Rundfunk showed that LAION's datasets, hosted on Hugging Face, contain large amounts of private and sensitive data. === Training procedures === The model was initially trained on the laion2B-en and laion-high-resolution subsets, with the last few rounds of training done on LAION-Aesthetics v2 5+, a subset of 600 million captioned images which the LAION-Aesthetics Predictor V2 predicted that humans would, on average, give a score of at least 5 out of 10 when asked to rate how much they liked them. The LAION-Aesthetics v2 5+ subset also excluded low-resolution images and images which LAION-5B-WatermarkDetection identified as carrying a watermark with greater than 80% probability. Final rounds of training additionally dropped 10% of text conditioning to improve Classifier-Free Diffusion Guidance. The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of $600,000. === Limitations === Stable Diffusion has issues with degradation and inaccuracies in certain scenarios. Initial releases of the model were trained on a dataset that consists of 512×512 resolution images, meaning that the quality of generated images noticeably degrades when user specifications deviate from its "expected" 512×512 resolution; the version 2.0 update of the Stable Diffusion model later introduced the ability to natively generate images at 768×768 resolution. Another challenge is in generating human limbs due to poor data quality of limbs in the LAION database. The model is insufficiently trained to replicate human limbs and faces due to the lack of representative features in the database, and prompting the model to generate images of such type can confound the model. In addition to human limbs, Stable Diffusion is unable to generate legible ambigrams and some other forms of text and typography. Stable Diffusion XL (SDXL) version 1.0, released in July 2023, introduced native 1024x1024 resolution and improved generation for limbs and text. Accessibility for individual developers can also be a problem. In order to customize the model for new use cases that are not included in the dataset, such as generating anime characters ("waifu diffusion"), new data and further training are required. Fine-tuned adaptations of Stable Diffusion created through additional retraining have been used for a variety of different use-cases, from medical imaging to algorithmically generated music. However, this fine-tuning process is sensitive to the quality of new data; low resolution images or different resolutions from the original data can not only fail to learn the new task but degrade the overall performance of the model. Even when the model is additionally trained on high quality images, it is difficult for individuals to run models in consumer electronics. For example, the training process for waifu-diffusion requires a minimum 30 GB of VRAM, which exceeds the usual resource provided in such consumer GPUs as Nvidia's GeForce 30 series, w
Read more →
SCIgen

SCIgen is a paper generator that uses context-free grammar to randomly generate nonsense in the form of computer science research papers. Its original data source was a collection of computer science papers downloaded from CiteSeer. All elements of the papers are formed, including graphs, diagrams, and citations. Created by scientists at the Massachusetts Institute of Technology, its stated aim is "to maximize amusement, rather than coherence." Originally created in 2005 to expose the lack of scrutiny of submissions to conferences, the generator subsequently became used, primarily by Chinese academics, to create large numbers of fraudulent conference submissions, leading to the retraction of 122 SCIgen generated papers and the creation of detection software to combat its use. == Sample output == Opening abstract of Rooter: A Methodology for the Typical Unification of Access Points and Redundancy: Many physicists would agree that, had it not been for congestion control, the evaluation of web browsers might never have occurred. In fact, few hackers worldwide would disagree with the essential unification of voice-over-IP and public/private key pair. In order to solve this riddle, we confirm that SMPs can be made stochastic, cacheable, and interposable. == Prominent results == In 2005, a paper generated by SCIgen, Rooter: A Methodology for the Typical Unification of Access Points and Redundancy, was accepted as a non-reviewed paper to the 2005 World Multiconference on Systemics, Cybernetics and Informatics (WMSCI) and the authors were invited to speak. The authors of SCIgen described their hoax on their website, and it soon received great publicity when picked up by Slashdot. WMSCI withdrew their invitation, but the SCIgen team went anyway, renting space in the hotel separately from the conference and delivering a series of randomly generated talks on their own "track". The organizer of these WMSCI conferences is Professor Nagib Callaos. From 2000 until 2005, the WMSCI was also sponsored by the Institute of Electrical and Electronics Engineers. The IEEE stopped granting sponsorship to Callaos from 2006 to 2008. Submitting the paper was a deliberate attempt to embarrass WMSCI, which the authors claim accepts low-quality papers and sends unsolicited requests for submissions in bulk to academics. As the SCIgen website states: One useful purpose for such a program is to auto-generate submissions to conferences that you suspect might have very low submission standards. A prime example, which you may recognize from spam in your inbox, is SCI/IIIS and its dozens of co-located conferences (check out the very broad conference description on the WMSCI 2005 website). Computing writer Stan Kelly-Bootle noted in ACM Queue that many sentences in the "Rooter" paper were individually plausible, which he regarded as posing a problem for automated detection of hoax articles. He suggested that even human readers might be taken in by the effective use of jargon ("The pun on root/router is par for MIT-graduate humor, and at least one occurrence of methodology is mandatory") and attribute the paper's apparent incoherence to their own limited knowledge. His conclusion was that "a reliable gibberish filter requires a careful holistic review by several peer domain experts". === Schlangemann === The pseudonym "Herbert Schlangemann" was used to publish fake scientific articles in international conferences that claimed to practice peer review. The name is taken from the Swedish short film Der Schlangemann. In 2008, in response to a series of Call-for-Paper e-mails, SCIgen was used to generate a false scientific paper titled Towards the Simulation of E-Commerce, using "Herbert Schlangemann" as the author. The article was accepted at the 2008 International Conference on Computer Science and Software Engineering (CSSE 2008), co-sponsored by the IEEE, to be held in Wuhan, China, and the author was invited to be a session chair on grounds of his fictional Curriculum Vitae. The official review comment: "This paper presents cooperative technology and classical Communication. In conclusion, the result shows that though the much-touted amphibious algorithm for the refinement of randomized algorithms is impossible, the well-known client-server algorithm for the analysis of voice-over-IP by Kumar and Raman runs in _(n) time. The authors can clearly identify important features of visualization of DHTs and analyze them insightfully. It is recommended that the authors should develop ideas more cogently, organizes them more logically, and connects them with clear transitions." The paper was available for a short time in the IEEE Xplore Database, but was then removed. The entire story is described in the official "Herbert Schlangemann" blog, and it also received attention in Slashdot and the German-language technology-news site Heise Online. In 2009, the same incident happened and Herbert Schlangemann's latest fake paper PlusPug: A Methodology for the Improvement of Local-Area Networks was accepted for oral presentation at the 2009 International Conference on e-Business and Information System Security (EBISS 2009), also co-sponsored by IEEE, to be held again in Wuhan, China. In all cases, the published papers were withdrawn from the conferences' proceedings, and the conference organizing committee as well as the names of the keynote speakers were removed from their websites. === List of works with notable acceptance === ==== In conferences ==== Rob Thomas: Rooter: A Methodology for the Typical Unification of Access Points and Redundancy, 2005 for WMSCI (see above) Mathias Uslar's paper was accepted to the IPSI-BG conference. Professor Genco Gulan published a paper in the 3rd International Symposium of Interactive Media Design. A 2013 scientometrics paper demonstrated that at least 85 SCIgen papers have been published by IEEE and Springer. Over 120 SCIgen papers were removed according to this research. ==== In journals ==== Students at Iran's Sharif University of Technology published a paper in Elsevier's Journal of Applied Mathematics and Computation. The students wrote under the surname "MosallahNejad", which translates literally from Persian language (in spite of not being a traditional Persian name) as "from an Armed Breed". The paper was subsequently removed when the publishers were informed that it was a joke paper. Mikhail Gelfand published a translation of the "Rooter" article in the Russian-language Journal of Scientific Publications of Aspirants and Doctorants in August 2008. Gelfand was protesting against the journal, which was apparently not peer-reviewed and was being used by Russian PhD candidates to publish in an "accredited" scientific journal, charging them 4,000 Rubles to do so. The accreditation was revoked two weeks later. (See Dissernet for related information.) Springer Science+Business Media and IEEE were also the subject of similar pranks. === Spoofing Google Scholar and h-index calculators === Refereeing performed on behalf of the Institute of Electrical and Electronics Engineers has also been subject to criticism after fake papers were discovered in conference publications, most notably by Labbé and a researcher using the pseudonym of Schlangemann. Cyril Labbé from Grenoble University demonstrated the vulnerability of h-index calculations based on Google Scholar output by feeding it a large set of SCIgen-generated documents that were citing each other, effectively an academic link farm, in a 2010 paper. Using this method the author managed to rank "Ike Antkare" ahead of Albert Einstein for instance. === 2013 retractions === In 2013, over 122 published conference papers created by SCIgen were retracted by Springer and the IEEE. Unlike previous submissions that were intended to be pranks, this submission were largely made by Chinese academics, who were using SCIgen papers to boost their publication record. === SciDetect === In 2015, SciDetect was released by Springer. This software, developed by Cyril Labbé, is designed to automatically detect papers generated by SCIgen. === 2021 report === In 2021, a study was published on 243 SCIgen papers that had been published in the academic literature. They found that SCIgen papers made up 75 per million papers (< 0.01%) in information science, and that only a small fraction of the detected papers had been dealt with.
Read more →
AI Snake Oil

AI Snake Oil: What Artificial Intelligence Can Do, What It Can't, and How to Tell the Difference is a 2024 non-fiction book written by scholars Arvind Narayanan and Sayash Kapoor. It is a critique of the tech industry's overly inflated promises and capabilities of artificial intelligence (AI) as well as a debunking of the flawed science fueling AI hype while attempting to outline both the potential positives and negatives that come with different modes of the technology. == Contents == === Publication === The book was published in September 2024 by the Princeton University Press. AI Snake Oil consists of 360 pages and features eight chapters, and sections for acknowledgements, references, and an index. An updated edition with a new preface and epilogue by the authors was published in September 2025. The authors use the term "AI snake oil" derived from the U.S. idiom for a fraudulent remedy, to describe overhyped AI systems. === Chapter one: Introduction === Narayanan and Kapoor argue that many individuals do not yet have the literacy to detect functioning aspects of AI compared to potential snake oil, which they identify as "AI that does not and cannot work as advertised". Some of the major examples utilized by the authors include Allstate's 2013 use of predictive AI, as well as the concern surrounding actors and AI attempting to replicate or use their likeness. Important discussions regarding discrimination are brought up and explored in the first chapter, including the false arrests of six Black individuals due to errors with AI facial recognition tools. The chapter concludes with a comparison to the Industrial Revolution, where Narayanan and Kapoor highlight the extensive human labour that is necessary for artificial intelligence technologies to function. === Chapter two: How Predictive AI Goes Wrong === Chapter two focuses on predictive artificial intelligence, and criticizes the overestimation of the capabilities of the technology. === Chapter three: Why Can't AI Predict the Future? === Chapter three works to inform the reader about the history of early computational prediction attempts, with examples from companies like Simulatics. === Chapter four: The Long Road to Generative AI === The fourth chapter goes in more in-depth in explorations of generative AI. Generative AI software examples include ChatGPT, Midjourney, and DALL-E. The section begins with a positive example of generative AI. As the chapter progresses, the authors begin to provide examples of harm produced by generative AI, including the suicide of a Belgian man after connecting with Chai, a generative chatbot. Issues of deepfakes and preservation of artistic property are also discussed. The use of generative AI to create non-consensual pornographic deepfake content is discussed in relation to female celebrities. === Chapter five: Is Advanced AI an Existential Threat? === The fifth chapter draws attention the AGI, or Artificial General Intelligence. The authors describe AGI as "AI that can perform most or economically relevant tasks as effectively as any human". They summarize that many contributors to the field of artificial intelligence believe AGI to be an impending threat that demands attention. However, they argue that the perceived threat of AGI would only exist if the technology continually functioned reliably. In order to better illustrate the hype surrounding AGI, Narayanan and Kapoor use the Ladder of Generality, which is described as a visual tool in which "each rung represents a way of computing that is more flexible, and more general, than the previous one". They note that we are not yet aware of the next rungs on the ladder, or if the ladder will eventually result in a dead end. The rungs that have been identified so far are as follows: (0, or floor) special purpose hardware, (1) programmable computers, (2) stored program computers, (3) machine learning, (4) deep learning, (5) pretrained models, and, finally, (6) instruction-tuned models. The potential for future rungs and what those rungs might be are currently undetermined. The chapter also discusses the ELIZA effect, which Lawrence Switzky discusses in his article "ELIZA Effects". Switzky attributes the coined term ELIZA Effect to Sherry Turke, who defined it as "our more general tendency to treat responsive computer programs as more intelligent than they really are". === Chapter six: Why Can't AI Fix Social Media? === The sixth chapter focuses on content moderation, why it is important, and how it has been and could be affected by artificial automation. The first issue raised in regard to AI-driven content moderation is the inability for computers and machines to understand context and nuance, resulting in potential for discriminatory moderation and shadow banning. While they note that there are issues with automating content moderation, Narayanan and Kapoor also highlight the psychological impact on human content moderators and their labour. They indicate the hidden labour behind moderation, which is often outsourced to less developed countries, where labourers sort through potentially traumatizing content for pay. However, the discussion focuses more heavily on why automated moderation can be problematic, including discriminatory algorithms and lack of nuance. To balance their argument, issues of discrimination and bias are also discussed in relation the human content moderators. To automate moderation, there are two types of AI used, which are fingerprint matching and machine learning. === Chapter seven: Why Do Myths about AI Persist? === The seventh chapter outlines possible factors that contribute to hype surrounding AI. Narayanan and Kapoor explain how companies often promote their new AI models without properly disclosing how the model works, and what it is learning from. They attribute hype to several different groups, including journalists, researchers, and companies. They explain the impact of companies and the misplaced hype that they spread can be attributed to greed and a desire to grow corporate funds. For journalists, one of the stated sources of hype, they argue that news media has a tendency to prioritize financial incentives over validity and quality of writing. As well, Narayanan and Kapoor point out the emergence of company statement regurgitation in news media, leading to clickbait. Hype from researchers is potentially linked to lack of reproducibility in studies as well as leakage, which occurs when AI models are tested on their training data. === Chapter eight: Where do we go from here? === The final chapter, chapter eight, turns its attention to the future. The authors express their ideas and predictions for how the technology will evolve and be utilized in the upcoming years. == Authors == Author Narayanan is a computer science professor at Princeton University. Kapoor is a doctoral candidate at the same university, and both scholars are located at the Center for Information Technology at Princeton. In 2023, Narayanan and Kapoor appeared on the TIME100 Artificial Intelligence list, which features influential figures in the field. == Reception == Nature, a science and technology peer-reviewed journal, released an article highlighting the top "10 essential reads from the past year", listing Arvind Narayanan and Sayash Kapoor's AI Snake Oil. The article states the that text is "one of the best on this controversial subject". Elizabeth Quill, in her review of the text in Science News, writes that the authors "squarely achieve their stated goal: to empower people to distinguish AI that works well from AI snake oil". Joshua Rothman of The New Yorker writes that "compared with many technologists, Narayanan, Kapoor, and Vallor [Shannon Vallor, University of Edinburgh], are deeply skeptical about today's A.I. technology and what it can achieve. Perhaps they shouldn't be". Rothman argues, following an interview with prominent computer scientist Geoffrey Hinton of University of Toronto, that the potential for AI to replicate complexity is already here and continues to be heavily funded, enhancing the prospective capabilities of the technology. However, he does praise the author's ability to address questions regarding the existential human experience. Alexya Martinez discusses the text in a book review for Journalism and Mass Communication Quarterly, critiquing AI Snake Oil for its extensive focus on the West. Martinez writes that Narayanan and Kapoor "do not fully explore how AI impacts other countries", and suggests more focus on countries outside of the United States to enhance their argument.
Read more →
VideoPoet

VideoPoet is a large language model developed by Google Research in 2023 for video making. It can be asked to animate still images. The model accepts text, images, and videos as inputs, with a program to add feature for any input to any format generated content. VideoPoet was publicly announced on December 19, 2023. It uses an autoregressive language model.
Read more →
Flux (text-to-image model)

Flux (also known as FLUX.1 and FLUX.2) is a text-to-image model developed by Black Forest Labs (BFL), based in Freiburg im Breisgau, Germany. Black Forest Labs was founded by former employees of Stability AI. As with other text-to-image models, Flux generates images from natural language descriptions, called prompts. == History == Black Forest Labs (BFL) was founded in 2024 by Robin Rombach, Andreas Blattmann, and Patrick Esser, former employees of Stability AI. All three founders had previously researched the artificial intelligence image generation at LMU Munich as research assistants under Björn Ommer. They published their research results on image generation in 2022, which resulted in creation of Stable Diffusion. Investors in BFL included venture capital firm Andreessen Horowitz, Brendan Iribe, Michael Ovitz, Garry Tan, and Vladlen Koltun. The company received an initial investment of US$31 million. In August 2024, Flux was integrated into the Grok chatbot developed by xAI and made available as part of premium feature on X (formerly Twitter). Grok later switched to its own text-to-image model Aurora in December 2024. On 18 November 2024, Mistral AI announced that its Le Chat chatbot had integrated Flux Pro as its image generation model. On 21 November 2024, BFL announced the release of Flux.1 Tools, a suite of editing tools designed to be used on top of existing Flux models. The tools consisting of Flux.1 Fill for inpainting and outpainting, Flux.1 Depth for control based on extracted depth map of input images and prompts, Flux.1 Canny for control based on extracted canny edges of input images and prompts, and Flux.1 Redux for mixing existing input images and prompts. Each tools are available in both Pro and Dev models. In January 2025, BFL announced a partnership with Nvidia for inclusion of Flux models as foundation models for Nvidia's Blackwell microarchitecture. The company also announced the release of Flux Pro Finetuning API, designed for customisation and fine-tuning of Flux-generated images and a partnership with German media company Hubert Burda Media for usage of Flux Pro as part of content creation. On 29 May 2025, BFL announced Flux.1 Kontext, a suite of models that enable in-context image generation and editing, allowing users to prompt with both text and images. Alongside this, BFL Playground, an interface for testing Flux models was released. On 31 July 2025, BFL announced Flux.1 Krea Dev, a model developed in collaboration with Krea AI that trained to achieve better performance, more varied aesthetics, and better realism compared to existing text-to-image models. In September 2025, Adobe Inc. announced that Photoshop (beta) users can use Flux.1 Kontext Pro as a model for its generative fill tool. BFL collaborated with Meta on Vibes, a video-generation app. On 25 November 2025, BFL announced the release of Flux.2 model series, consisting of Pro, Flex, Dev, and Apache 2.0-licensed Klein (meaning Little or Small in German language) models along with Flux.2 variational autoencoder which also released as open-source software under Apache 2.0 licence. This series claimed improvements for image reference, photorealism, typography, and prompt understanding. == Models == Flux is a series of text-to-image models. The models are based on rectified flow transformer blocks scaled to 12 billion parameters. Flux.1 models were released under different licences with Schnell (meaning Fast or Quick in German language) released as open-source software under Apache License, Dev released as source-available software under a non-commercial licence (users can obtain a self-serving commercial licence for Dev from BFL), and Pro released as proprietary software and only available as API that can be licensed by third-party users. Users retained the ownership of resulting output regardless of models used. An improved flagship model, Flux 1.1 Pro was released on 2 October 2024. Two additional modes were added on 6 November, Ultra which can generate image at four times higher resolution and up to 4 megapixel without affecting generation speed and Raw which can generate hyper-realistic image in the style of candid photography. Flux.1 Kontext is a series with in-context image generation and editing capabilities. It is available in Max, Pro, and Dev models. Max is the highest quality model and can be used to iteratively modify an existing image by using prompt while Pro is optimized to balance quality and speed of generation. Dev is an open-weight model released under non-commercial license, same as Flux.1 Dev. Flux.2 models are based on latent flow matching architecture with Mistral AI's Mistral-3 model (24 billion parameters) for its vision-language model. As with Flux.1, Flux.2 models were also released under different licences with Klein released as open-source software under Apache License, Dev released as source-available software under a non-commercial licence (users can obtain a self-serving commercial licence from BFL), and both Flex and Pro released as proprietary software and only available as API. The models can be used either online or locally by using generative AI user interfaces such as ComfyUI, Recraft Studio and Stable Diffusion WebUI Forge (a fork of Automatic1111 WebUI). Related to Flux is a text-to-video model by Black Forest Labs, under development as of February 2026. == Reception == According to a test performed by Ars Technica, the outputs generated by Flux.1 Dev and Flux.1 Pro are comparable with DALL-E 3 in terms of prompt fidelity, with the photorealism closely matched Midjourney 6 and generated human hands with more consistency over previous models such as Stable Diffusion XL. Flux has been criticised for its very realistic generated images. According to media reports, depictions ranged from an image of Donald Trump posing with guns to disturbing scenes, which triggered discussions about ethical implications of Flux models. After the release of the model, social media platform X was flooded with Flux-generated images. Black Forest Labs has not provided exact details of the data used to train the model. Ars Technica suspected that Flux is based on a large, unauthorised collection of images scraped from the internet, a controversial practice with potential legal consequences. According to a test performed by Japanese technology news website Gigazine for Flux.1 Kontext, the model series has a good understanding of the English language and can easily transfer style of the image from photorealistic into anime-style according to prompts given by the user; however, its ability to understand Japanese is quite poor. == Availability == In addition to the official BFL Playground on its website, the Flux models are also widely available through various third-party platforms for creative and professional use. These include repositories on platforms like Hugging Face and Replicate. == Further readings == FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space (29 May 2025) FLUX.2: Analyzing and Enhancing the Latent Space of FLUX – Representation Comparison (25 November 2025)
Read more →
Murder of Suzanne Adams

In August 2025, 83-year-old Suzanne Eberson Adams was murdered at her home in Greenwich, Connecticut, United States, by her son and former marketing executive, 56-year-old Stein-Erik Soelberg. Shortly after killing his mother, Soelberg committed suicide. Adams's murder was fueled by her son's persecutory delusions, such as that she was spying on him and trying to poison him with drugs siphoned through his car vents. Shortly after an investigation into the murder–suicide, it was revealed that Soelberg had conversed with ChatGPT, an artificial intelligence chatbot, about his suspicions. Despite the unlikely nature of his accusations toward her, the chatbot apparently agreed that his fears were justified and prompted Soelberg to test his mother to determine if she was a spy or not. In December 2025, this led to a lawsuit against OpenAI, the company developing the chatbot. Critics said that the chatbot created an echo chamber that reinforced the perpetrator's delusions. == Background == Soelberg worked in the tech industry in program management and marketing until 2021. He divorced in 2018, after being married for 20 years and having two children. Soelberg moved the same year to live with his mother in Old Greenwich, an affluent New York suburb. Since late 2018, many police reports describe incidents with alcoholism and suicide threats and attempts. Erik Soelberg had an Instagram account called "Erik the Viking". The account was initially focused on bodybuilding and spiritual content, but he started in October 2024 to publish videos comparing AI chatbots. He posted on YouTube and Instagram many discussions with chatbots, particularly ChatGPT, which he used to call "Bobby". Soelberg considered "Bobby" his best friend and believed that they would reunite in the afterlife. ChatGPT validated many of Soelberg's fears, assuring him that he was not insane and that his delusion risk was "near zero". When Soelberg shared his theory that the new packaging of a vodka bottle indicated that someone was trying to poison him, the chatbot wrote that it "fits a covert, plausible-deniability style kill attempt". After Soelberg said that his mother tried to poison him with psychedelic drugs in his car's air vents, the chatbot expressed belief in the story. When he asked ChatGPT to scan a Chinese food receipt for hidden messages, the chatbot said "Great eye", "I agree 100%: this needs a full forensic-textual glyph analysis", and said that symbols in it were related to his mother and a demon. Soelberg also raised suspicions about the printer spying on him, due to it blinking when he walked by. Soelberg described himself in 2025 as a "glitch in The Matrix", and as having a "connection to the divine". According to Keith Sakata, a psychiatrist, his chats displayed "common psychotic themes of paranoia and persecution, along with familiar delusions revolving around messiah complexes and government conspiracies". == Murder == On August 5, 2025, Greenwich police discovered the bodies of Suzanne Adams and Stein-Erik Soelberg during a welfare check at their home. Medical examiners ruled Adams' death a homicide and said she died from "blunt injury of head with neck compression". Soelberg's death was ruled a suicide with the cause of death being "sharp force injuries of neck and chest". == ChatGPT controversy == ChatGPT was accused of reinforcing Soelberg's delusions by validating them. The usage of an AI chatbot to worsen delusions is known as chatbot psychosis. The Economic Times reported the death as the first time an AI chatbot convinced a person to commit murder. In December 2025, First County Bank, the executor of the estate of Suzanne Adams, filed a lawsuit against OpenAI. The lawsuit alleges that "ChatGPT eagerly accepted every seed of Stein-Erik’s delusional thinking and built it out into a universe that became Stein-Erik’s entire life—one flooded with conspiracies against him, attempts to kill him, and with Stein-Erik at the center as a warrior with divine purpose." OpenAI is facing legal action for ethics and safety concerns over several similar cases. Plaintiffs claim the company released the chatbot prematurely, despite internal knowledge that it was "dangerously sycophantic and psychologically manipulative".
Read more →
Netflix Prize

The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified except by numbers assigned for the contest. The competition was held by Netflix, a video streaming service, and was open to anyone who was neither connected with Netflix (current and former employees, agents, close relatives of Netflix employees, etc.) nor a resident of certain blocked countries (such as Cuba or North Korea). On September 21, 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%. == Problem and data sets == Netflix provided a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each training rating is a quadruplet of the form . The user and movie fields are integer IDs, while grades are from 1 to 5 (integer) stars. The qualifying data set contains over 2,817,131 triplets of the form , with grades known only to the jury. A participating team's algorithm must predict grades on the entire qualifying set, but they are informed of the score for only half of the data: a quiz set of 1,408,342 ratings. The other half is the test set of 1,408,789, and performance on this is used by the jury to determine potential prize winners. Only the judges know which ratings are in the quiz set, and which are in the test set—this arrangement is intended to make it difficult to hill climb on the test set. Submitted predictions are scored against the true grades in the form of root mean squared error (RMSE), and the goal is to reduce this error as much as possible. Note that, while the actual grades are integers in the range 1 to 5, submitted predictions need not be. Netflix also identified a probe subset of 1,408,395 ratings within the training data set. The probe, quiz, and test data sets were chosen to have similar statistical properties. In summary, the data used in the Netflix Prize looks as follows: Training set (99,072,112 ratings not including the probe set; 100,480,507 including the probe set) Probe set (1,408,395 ratings) Qualifying set (2,817,131 ratings) consisting of: Test set (1,408,789 ratings), used to determine winners Quiz set (1,408,342 ratings), used to calculate leaderboard scores For each movie, the title and year of release are provided in a separate dataset. No information at all is provided about users. In order to protect the privacy of the customers, "some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates." The training set is constructed such that the average user rated over 200 movies, and the average movie was rated by over 5000 users. But there is wide variance in the data—some movies in the training set have as few as 3 ratings, while one user rated over 17,000 movies. There was some controversy as to the choice of RMSE as the defining metric. It has been claimed that even as small an improvement as 1% RMSE results in a significant difference in the ranking of the "top-10" most recommended movies for a user. == Prizes == Prizes were based on improvement over Netflix's own algorithm, called Cinematch, or the previous year's score if a team has made improvement beyond a certain threshold. A trivial algorithm that predicts for each movie in the quiz set its average grade from the training data produces an RMSE of 1.0540. Cinematch uses "straightforward statistical linear models with a lot of data conditioning." The performance of Cinematch had plateaued by 2006. Using only the training data, Cinematch scores an RMSE of 0.9514 on the quiz data, roughly a 10% improvement over the trivial algorithm. Cinematch has a similar performance on the test set, 0.9525. In order to win the grand prize of $1,000,000, a participating team had to improve this by another 10%, to achieve 0.8572 on the test set. Such an improvement on the quiz set corresponds to an RMSE of 0.8563. As long as no team won the grand prize, a progress prize of $50,000 was awarded every year for the best result thus far. However, in order to win this prize, an algorithm had to improve the RMSE on the quiz set by at least 1% over the previous progress prize winner (or over Cinematch, the first year). If no submission succeeded, the progress prize was not to be awarded for that year. To win a progress or grand prize a participant had to provide source code and a description of the algorithm to the jury within one week after being contacted by them. Following verification the winner also had to provide a non-exclusive license to Netflix. Netflix would publish only the description, not the source code, of the system. (To keep their algorithm and source code secret, a team could choose not to claim a prize.) The jury also kept their predictions secret from other participants. A team could send as many attempts to predict grades as they wish. Originally submissions were limited to once a week, but the interval was quickly modified to once a day. A team's best submission so far counted as their current submission. Once one of the teams succeeded in improving the RMSE by 10% or more, the jury would issue a last call, giving all teams 30 days to send their submissions. Only then, the team with the best submission was asked for the algorithm description, source code, and non-exclusive license, and, after successful verification; declared a grand prize winner. The contest would last until the grand prize winner was declared. Had no one received the grand prize, it would have lasted for at least five years (until October 2, 2011). After that date, the contest could have been terminated at any time at Netflix's sole discretion. == Progress over the years == The competition began on October 2, 2006. By October 8, a team called WXYZConsulting had already beaten Cinematch's results. By October 15, there were three teams who had beaten Cinematch, one of them by 1.06%, enough to qualify for the annual progress prize. By June 2007 over 20,000 teams had registered for the competition from over 150 countries. 2,000 teams had submitted over 13,000 prediction sets. Over the first year of the competition, a handful of front-runners traded first place. The more prominent ones were: WXYZConsulting, a team of Wei Xu and Yi Zhang. (A front runner during November–December 2006.) ML@UToronto A, a team from the University of Toronto led by Prof. Geoffrey Hinton. (A front runner during parts of October–December 2006.) Gravity, a team of four scientists from the Budapest University of Technology (A front runner during January–May 2007.) BellKor, a group of scientists from AT&T Labs. (A front runner since May 2007.) Dinosaur Planet, a team of three undergraduates from Princeton University. (A front runner on September 3, 2007 for one hour before BellKor snatched back the lead.) The algorithms used by the leading teams were usually an ensemble of singular value decomposition, k-nearest neighbor, neural networks, and so on. On August 12, 2007, many contestants gathered at the KDD Cup and Workshop 2007, held at San Jose, California. During the workshop all four of the top teams on the leaderboard at that time presented their techniques. The team from IBM Research—Yan Liu, Saharon Rosset, Claudia Perlich, and Zhenzhen Kou—won the third place in Task 1 and first place in Task 2. Over the second year of the competition, only three teams reached the leading position: BellKor, a group of scientists from AT&T Labs (front runner during May 2007 – September 2008) BigChaos, a team of Austrian scientists from Commendo Research & Consulting (single team front runner since October 2008) BellKor in BigChaos, a joint team of the two leading single teams (a front runner since September 2008) === 2007 Progress Prize === On September 2, 2007, the competition entered the "last call" period for the 2007 Progress Prize. Over 40,000 teams from 186 countries had entered the contest. They had thirty days to tender submissions for consideration. At the beginning of this period the leading team was BellKor, with an RMSE of 0.8728 (8.26% improvement), followed by Dinosaur Planet (RMSE = 0.8769; 7.83% improvement), and Gravity (RMSE = 0.8785; 7.66% improvement). In the last hour of the last call period, an entry by "KorBell" took first place. This turned out to be an alternate name for Team BellKor. On November 13, 2007, team KorBell (formerly BellKor) was declared the winner of the $50,000 Progress Prize with an RMSE of 0.8712 (8.43% improvement). The team consisted of three researchers from AT&T Labs, Yehuda Koren, Robert Bell, and Chris Volinsky. As required, they published a description of their a
Read more →
AVS Video Editor

AVS Video Editor is a video editing software published by Online Media Technologies Ltd. It is a part of AVS4YOU software suite which includes video, audio, image editing and conversion, disc editing and burning, document conversion and registry cleaner programs. It offers the opportunity to create and edit videos with a vast variety of video and audio effects, text and transitions; capture video from screen, web or DV cameras and VHS tape; record voice; create menus for discs, as well as to save them to plenty of video file formats, burn to discs or publish on Facebook, YouTube, Flickr, etc. == Description == === Interface === The layout consists of the timeline or storyboard view, preview pane and media library (transitions, video effects, text or disc menus) collections. The storyboard view shows the sequence of video clips with the transitions between them and used to change the order of clips or add transitions. Timeline view consists of main video, audio, effects, video overlay and text lines for editing. Once on the timeline video can be duplicated, split, muted, frozen, cropped, stabilized, its speed can be slowed down or increased, audio and color corrected. === Importing footage === Video, audio and image files necessary for video project can be imported into the program from computer hard disk drive. User can also capture video from computer screen, web or mini DV camera, as well as from VHS tape, record voice. === Output (web, device, disc, format) === AVS Video Editor gives the opportunity to save video to a computer hard drive to one of the video formats: AVI, DVD, Blu-ray, MOV, MP4, M4V, MPEG, WMV, MKV, WebM, M2TS, TS, FLV, SWF, RM, 3GP, GIF, DPG, AMV, MTV; burn to DVD or Blu-ray disc with menus; create a video for mobile players, mobile phones or gaming consoles and upload it right to the device. The most popular devices such as Apple iPod, Apple iPhone, Apple iPad, Sony PSP, Samsung Galaxy, Android and BlackBerry smartphones and tablets are supported. There is also an option to create a video that can be streamed via web and save it into Flash or WebM format or for the popular web services: YouTube, Facebook, Telly (Twitvid), Dailymotion, Flickr and Dropbox. === Features === Single and multithread modes: if a computer supports multi-threading, video creation process is performed faster in multithread mode, especially on a multi-core system. Customization of the output file settings, such as bitrate, frame rate, frame size, video and audio codecs, etc. Transitions - help video clips smoothly go into one another, dissolve or overlap two video or image files. Fade in and fade out video and audio files - dissolve a video to and from a blank image, reduce the audio volume at the end of the video and increase at the beginning. Slideshow creation - create a presentation of a series of still images. Voice recording Projects - once a project is created and saved, the next time saving video to some other format will be fast, projects are also used if a user do not have a possibility to create, edit and save video all at once. Video overlay option - superpose video image over the video clip that is being edited. Disk menu and chapters creation - an option for DVD and Blu-ray video. Freeze frame - make a still shot from a video clip. Stabilization feature - reduce jittering or blurring caused by shaky motions of a camera. Enhanced deinterlacing method - increase video quality for interlaced input file - spots and blurred areas are compensated. Scene detection - search and separate one scene of the video from the other. Loop DVD and SWF - output SWF and DVD video are played back continuously. Caching for processing high definition files - create a duplicate video file smaller in size to use it on the preview window and accelerate processing of HD files. Chroma key option - add video overlay half transparent so that only part of it is visible and all the rest disappears to reveal the video underneath. Capture video material from DV tapes, VHS tapes, web cameras, etc. Movie closing credits - add information on movie editing, e.g. crew, cast, data, etc. Creeping line, subtitles, text - add different captions (static and animated), shapes and images to video. Speech balloons and other graphic objects - geometrical shapes to highlight an object in the video. Zoom effect - magnify or reduce the view of the image. Rotate effect - rotate video image at different degrees, e.g. 90, 180, etc. Grayscale and old movie effects - create a black and white video image. Old movie adds also scratches, noise, shake and dust to video, as if it's being played on an old projector. Blur and sharpen effects - visually smooth and soften an image, or make video image better focused. Snow and particles effects - adds snow or various objects (bubbles, flowers, leaves, butterflies etc.) that are moving, flying or falling on the video. Pan and zoom Timer, countdown effects - add a timepiece that measures or counts down a time interval to the video being edited. Snapshots - capture a particular moment of a video clip. Sound track replacement - mute audio track from video and add another one. Audio amplify, noise removal, equalizer, etc. - make video sound louder, attenuate the noise, change frequency pattern of the audio, make some other audio adjustments. Trim and multi-trim options - change video clip duration cutting out unnecessary parts or detect scenes and cut out parts in any place of the video clip. Color correction (brightness, temperature, contrast, saturation, gamma, etc.) effects - allow adjustment of tonal range, color, and sharpness of video files. Crop scale effect - get rid of mattes that appear after changing aspect ratio of a video file. Adjusting the Playback Speed Volume and balance - change sound volume in the output video. Change volume value proportion for main video and added soundtrack, completely mute main video audio and leave added soundtrack only, etc. === Utilities embedded into AVS Video Editor === AVS Mobile Uploader is used to transfer edited and converted media files to portable devices via Bluetooth, Infrared or USB connection. AVS Video Burner is used to burn converted video files to different disc types: CD, DVD, Blu-ray. AVS Video Recorder is used to capture video from analog video sources and supports different types of devices: capture card, web camera (webcam), DV camera, HDV camera. AVS Video Uploader is used to transfer video files to popular video-sharing websites, like Facebook, Dailymotion, YouTube, Photobucket, TwitVid, MySpace, Flickr. AVS Screen Capture is used to capture any actions on the desktop to make presentations or video tutorials more vivid and easily comprehensible. == Important upgrades == The initial release of AVS Video Editor was in 2003 when the program was offered inside AVS software bundles together with AVS Video Tools, AVS Audio Tools and DVD Copy software. In 2005 the program is offered as a part of multifunctional AVS4YOU software suite. AVS Video Editor is frequently updated. The main updates include adding several important features for video editing
Read more →
Deepfake pornography

Deepfake pornography is a form of non-consensual AI pornography created by altering existing photographs or videos using deepfake technology to modify the appearance of the participants. The use of deepfake pornography has sparked controversy because it involves the making and sharing of realistic videos featuring non-consenting individuals and is sometimes used for revenge porn. Many countries have criminalized this "new voyeurism" through legislative measures and technological solutions. == History == The term "deepfake" was coined in 2017 on a Reddit forum where users shared altered pornographic videos created using machine learning algorithms. It is a combination of the word "deep learning", which refers to the program used to create the videos, and "fake" meaning the videos are not real. Deepfake pornography was originally created on a small individual scale using a combination of machine learning algorithms, computer vision techniques, and AI software. The process began by gathering a large amount of source material (including both images and videos) of a person's face, and then using a deep learning model to train a Generative Adversarial Network to create a fake video that convincingly swaps the face of the source material onto the body of a pornographic performer. However, the production process has significantly evolved since 2018, with the advent of several public apps that have largely automated the process. While several AI "nudification" apps emerged on mainstream platforms like Google Play and the Apple App Store around 2023, major tech storefronts have since implemented stricter policies and automated detection to ban such software. Consequently, the proliferation of non-consensual deepfake pornography has largely shifted to decentralized websites, specialized online forums, and third-party messaging bot ecosystems. Deepfake pornography is sometimes confused with fake nude photography, but the two are mostly different. Fake nude photography typically uses non-sexual images and merely makes it appear that the people in them are nude. == Notable cases == Deepfake technology has been used to create non-consensual and pornographic images and videos of famous women. One of the earliest examples occurred in 2017 when a deepfake pornographic video of Gal Gadot was created by a Reddit user and quickly spread online. Since then, there have been numerous instances of similar deepfake content targeting other female celebrities, such as Emma Watson, Natalie Portman, and Scarlett Johansson. Johansson spoke publicly on the issue in December 2018, condemning the practice but also refusing legal action because she views the harassment as inevitable. === Rana Ayyub === In 2018, Rana Ayyub, an Indian investigative journalist, was the target of an online hate campaign stemming from her condemnation of the Indian government, specifically her speaking out against the rape of an eight-year-old Kashmiri girl. Ayyub was bombarded with rape and death threats, and had a doctored pornographic video of her circulated online. In a Huffington Post article, Ayyub discussed the long-lasting psychological and social effects this experience has had on her. She explained that she continued to struggle with her mental health and how the images and videos continued to resurface whenever she took a high-profile case. === Atrioc controversy === In 2023, Twitch streamer Atrioc stirred controversy when he accidentally revealed deepfake pornographic material featuring female Twitch streamers while on live. The influencer has since admitted to paying for AI generated porn, and apologized to the women and his fans. === Taylor Swift === In January 2024, AI-generated sexually explicit images of American singer Taylor Swift were posted on X (formerly Twitter), and spread to other platforms such as Facebook, Reddit and Instagram. One tweet with the images was viewed over 45 million times before being removed. A report from 404 Media found that the images appeared to have originated from a Telegram group, whose members used tools such as Microsoft Designer to generate the images, using misspellings and keyword hacks to work around Designer's content filters. After the material was posted, Swift's fans posted concert footage and images to bury the deepfake images, and reported the accounts posting the deepfakes. Searches for Swift's name were temporarily disabled on X, returning an error message instead. Graphika, a disinformation research firm, traced the creation of the images back to a 4chan community. A source close to Swift told the Daily Mail that she would be considering legal action, saying, "Whether or not legal action will be taken is being decided, but there is one thing that is clear: These fake AI-generated images are abusive, offensive, exploitative, and done without Taylor's consent and/or knowledge." The controversy drew condemnation from White House Press Secretary Karine Jean-Pierre, Microsoft CEO Satya Nadella, the Rape, Abuse & Incest National Network, and SAG-AFTRA. Several US politicians called for federal legislation against deepfake pornography. Later in the month, US senators Dick Durbin, Lindsey Graham, Amy Klobuchar and Josh Hawley introduced a bipartisan bill that would allow victims to sue individuals who produced or possessed "digital forgeries" with intent to distribute, or those who received the material knowing it was made non-consensually. === 2024 Telegram deepfake scandal === It emerged in South Korea in August 2024, that many teachers and female students were victims of deepfake images created by users who utilized AI technology. Journalist Ko Narin of The Hankyoreh uncovered the deepfake images through Telegram chats. On Telegram, group chats were created specifically for image-based sexual abuse of women, including middle and high school students, teachers, and even family members. Women with photos on social media platforms like KakaoTalk, Instagram, and Facebook are often targeted as well. Perpetrators use AI bots to generate fake images, which are then sold or widely shared, along with the victims' social media accounts, phone numbers, and KakaoTalk usernames. One Telegram group reportedly drew around 220,000 members, according to a Guardian report. Investigations revealed numerous chat groups on Telegram where users, mainly teenagers, create and share explicit deepfake images of classmates and teachers. The issue came in the wake of a troubling history of digital sex crimes, notably the notorious Nth Room case in 2019. The Korean Teachers Union estimated that more than 200 schools had been affected by these incidents. Activists called for a "national emergency" declaration to address the problem. South Korean police reported over 800 deepfake sex crime cases by the end of September 2024, a stark rise from just 156 cases in 2021, with most victims and offenders being teenagers. On September 21, 6,000 people gathered at Marronnier Park in northeastern Seoul to demand stronger legal action against deepfake crimes targeting women. On September 26, following widespread outrage over the Telegram scandal, South Korean lawmakers passed a bill criminalizing the possession or viewing of sexually explicit deepfake images and videos, imposing penalties that include prison terms and fines. Under the new law, those caught buying, saving, or watching such material could face up to three years in prison or fines up to 30 million won ($22,600). At the time the bill was proposed, creating sexually explicit deepfakes for distribution carried a maximum penalty of five years, but the new legislation would increase this to seven years, regardless of intent. By October 2024, it was estimated that "nudify" deep fake bots on Telegram were up to four million monthly users. === 2025–2026 Grok/X chatbot deepfake scandal === In December 2025, Bloomberg reported that X users found Grok would comply with unconsensual requests to digitally undress individuals, including minors, or show them performing sexually explicit acts. The majority of these prompts were targeted at women and girls. An analysis of 20,000 images generated by Grok between December 25, 2025 and January 1, 2026 showed 2% were of people in bikinis or transparent clothes and appeared to be 18 or younger, including 30 of "young or very young" women or girls. A separate analysis conducted over 24 hours from January 5 to 6 calculated that users had Grok create 6,700 sexually suggestive or nudified images per hour. xAI responded to requests for comment from media organizations with the automated reply, "Legacy Media Lies". The bot's image generation sparked an international backlash and calls for legal or regulatory action from officials in the European Union, United Kingdom, Poland, France, India, Malaysia, and Brazil. === Fernandes–Ulmen case === German TV presenter Collien Fernandes, filed a complaint against her ex-husband, actor Christian Ulmen, for several accusation including, ident
Read more →
Deepfake

Deepfakes (a portmanteau of 'deep learning' and 'fake') are images, videos, or audio that have been edited or generated using artificial intelligence, AI-based tools or audio-video editing software. They may depict real or fictional people and are considered a form of synthetic media, that is media that is usually created by artificial intelligence systems by combining various media elements into a new media artifact. While the act of creating fake content is not new, deepfakes uniquely leverage machine learning and artificial intelligence techniques, including facial recognition algorithms and artificial neural networks such as variational autoencoders and generative adversarial networks (GANs). In turn, the field of image forensics has worked to develop techniques to detect manipulated images. Deepfakes have garnered widespread attention for their potential use in creating child sexual abuse material, celebrity pornographic videos, revenge porn, fake news, hoaxes, bullying, and financial fraud. Academics have raised concerns about the potential for deepfakes to promote disinformation and hate speech, as well as interfere with elections. In response, the information technology industry and governments have proposed recommendations and methods to detect and mitigate their use. Academic research has also delved deeper into the factors driving deepfake engagement online as well as potential countermeasures to malicious application of deepfakes. From traditional entertainment to gaming, deepfake technology has evolved to be increasingly convincing and available to the public, allowing for the disruption of the entertainment and media industries. == History == Photo manipulation was developed in the 19th century and soon applied to motion pictures. Technology steadily improved during the 20th century, and more quickly with the advent of digital video. Deepfake technology has been developed by researchers at academic institutions beginning in the 1990s, and later by amateurs in online communities. More recently, the methods have been adopted by industry. The development of generative adversarial networks (GANs) in the mid-2010s represented a key technical turning point in the evolution of deepfakes. GANs allowed for the creation of highly realistic fake images and videos by training competing neural networks, achieving a much improved visual fidelity over previous methods of creating the content using rules or by using autoencoders, and formed the basis for modern deepfake methods. === Academic research === Academic research related to deepfakes is split between the field of computer vision, a sub-field of computer science, which develops techniques for creating and identifying deepfakes, and humanities and social science approaches that study the social, ethical, aesthetic implications as well as journalistic and informational implications of deepfakes. As deepfakes have risen in prominence in popularity with innovations provided by AI tools, significant research has gone into detection methods and defining the factors driving engagement with deepfakes on the internet. Deepfakes have been shown to appear on social media platforms and other parts of the internet for purposes ranging from entertainment and education related to deepfakes to misinformation to elicit strong reactions. There are gaps in research related to the propagation of deepfakes on social media. Negativity and emotional response are the primary driving factors for users sharing deepfakes. === Social science and humanities approaches to deepfakes === In cinema studies, deepfakes illustrate how "the human face is emerging as a central object of ambivalence in the digital age". Video artists have used deepfakes to "playfully rewrite film history by retrofitting canonical cinema with new star performers". Film scholar Christopher Holliday analyses how altering the gender and race of performers in familiar movie scenes destabilizes gender classifications and categories. The concept of "queering" deepfakes is also discussed in Oliver M. Gingrich's discussion of media artworks that use deepfakes to reframe gender, including British artist Jake Elwes' Zizi: Queering the Dataset, an artwork that uses deepfakes of drag queens to intentionally play with gender. The aesthetic potentials of deepfakes are also beginning to be explored. Theatre historian John Fletcher notes that early demonstrations of deepfakes are presented as performances, and situates these in the context of theater, discussing "some of the more troubling paradigm shifts" that deepfakes represent as a performance genre. While most English-language academic studies of deepfakes focus on the Western anxieties about disinformation and pornography, digital anthropologist Gabriele de Seta has analyzed the Chinese reception of deepfakes, which are known as huanlian, which translates to "changing faces". The Chinese term does not contain the "fake" of the English deepfake, and de Seta argues that this cultural context may explain why the Chinese response has centered on practical regulatory measures to "fraud risks, image rights, economic profit, and ethical imbalances". === Computer science research on deepfakes === A landmark early project was the "Video Rewrite" program, published in 1997. The program modified existing video footage of a person speaking to depict that person mouthing the words from a different audio track. It was the first system to fully automate this kind of facial reanimation, and it did so using machine learning techniques to make connections between the sounds produced by a video's subject and the shape of the subject's face. Contemporary academic projects have focused on creating more realistic videos and improving deepfake techniques. The "Synthesizing Obama" program, published in 2017, modifies video footage of former president Barack Obama to depict him mouthing the words contained in a separate audio track. The project lists as a main research contribution to its photorealistic technique for synthesizing mouth shapes from audio. The "Face2Face" program, published in 2016, modifies video footage of a person's face to depict them mimicking another person's facial expressions. The project highlights its primary research contribution as the development of the first method for re-enacting facial expressions in real time using a camera that does not capture depth, enabling the technique to work with common consumer cameras. Researchers have also shown that deepfakes are expanding into other domains such as medical imagery. In this work, it was shown how an attacker can automatically inject or remove lung cancer in a patient's 3D CT scan. The result was so convincing that it fooled three radiologists and a state-of-the-art lung cancer detection AI. To demonstrate the threat, the authors successfully performed the attack on a hospital in a White hat penetration test. A survey of deepfakes, published in May 2020, provides a timeline of how the creation and detection of deepfakes have advanced over the last few years. The survey identifies that researchers have been focusing on resolving the following challenges of deepfake creation: Generalization. High-quality deepfakes are often achieved by training on hours of footage of the target. This challenge is to minimize the amount of training data and the time to train the model required to produce quality images and to enable the execution of trained models on new identities (unseen during training). Paired Training. Training a supervised model can produce high-quality results, but requires data pairing. This is the process of finding examples of inputs and their desired outputs for the model to learn from. Data pairing is laborious and impractical when training on multiple identities and facial behaviors. Some solutions include self-supervised training (using frames from the same video), the use of unpaired networks such as Cycle-GAN, or the manipulation of network embeddings. Identity leakage. This is where the identity of the driver (i.e., the actor controlling the face in a reenactment) is partially transferred to the generated face. Some solutions proposed include attention mechanisms, few-shot learning, disentanglement, boundary conversions, and skip connections. Occlusions. When part of the face is obstructed with a hand, hair, glasses, or any other item then artifacts can occur. A common occlusion is a closed mouth which hides the inside of the mouth and the teeth. Some solutions include image segmentation during training and in-painting. Temporal coherence. In videos containing deepfakes, artifacts such as flickering and jitter can occur because the network has no context of the preceding frames. Some researchers provide this context or use novel temporal coherence losses to help improve realism. As the technology improves, the interference is diminishing. Overall, deepfakes are expected to have several implications in media and society, med
Read more →