AI App Zen Riddle

AI App Zen Riddle — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Robotic process automation

    Robotic process automation

    Robotic process automation (RPA) is a form of business process automation that is based on software robots (bots) or artificial intelligence (AI) agents. RPA should not be confused with artificial intelligence as it is based on automation technology following a predefined workflow. It is sometimes referred to as software robotics (not to be confused with robot software). In traditional workflow automation tools, a software developer produces a list of actions to automate a task and interface to the back end system using internal application programming interfaces (APIs) or dedicated scripting language. In contrast, RPA systems develop the action list by watching the user perform that task in the application's graphical user interface (GUI) and then perform the automation by repeating those tasks directly in the GUI. This can lower the barrier to the use of automation in products that might not otherwise feature APIs for this purpose. RPA tools have strong technical similarities to graphical user interface testing tools. These tools also automate interactions with the GUI, and often do so by repeating a set of demonstration actions performed by a user. RPA tools differ from such systems in that they allow data to be handled in and between multiple applications, for instance, receiving email containing an invoice, extracting the data, and then typing that into a bookkeeping system. == Historic evolution == As a form of automation, the concept has been around for a long time in the form of screen scraping, so long that to early PC users the reminder of it often blurs with the idea of malware infection. Yet compared to screen scraping, RPA is much more extensible, consisting of API integration into other enterprise applications, connectors into ITSM systems, terminal services and even some types of AI (e.g. machine learning) services such as image recognition. It is considered to be a significant technological evolution in the sense that new software platforms are emerging which are sufficiently mature, resilient, scalable and reliable to make this approach viable for use in large enterprises (who would otherwise be reluctant due to perceived risks to quality and reputation). == Use == The hosting of RPA services also aligns with the metaphor of a software robot, with each robotic instance having its own virtual workstation, much like a human worker. The robot uses keyboard and mouse controls to take actions and execute automations. Normally, all of these actions take place in a virtual environment and not on screen; the robot does not need a physical screen to operate, rather it interprets the screen display electronically. The scalability of modern solutions based on architectures such as these owes much to the advent of virtualization technology, without which the scalability of large deployments would be limited by the available capacity to manage physical hardware and by the associated costs. The implementation of RPA in business enterprises has shown dramatic cost savings when compared to traditional non-RPA solutions. === RPA actual use === Banking and finance process automation Mortgage and lending processes Customer care automation eCommerce merchandising operations Social media marketing Optical character recognition applications Data extraction process Fixed automation process Manual and repetitive tasks automation Voice recognition and digital dictation software linked to join up business processes for straight through processing without manual intervention Specialised remote infrastructure management software featuring automated investigation and resolution of problems, using robots for the first line IT support Chatbots used by internet retailers and service providers to service customer requests for information. Also used by companies to service employee requests for information from internal databases Presentation layer automation software, increasingly used by business process outsourcers to displace human labour Interactive voice response (IVR) systems incorporating intelligent interaction with callers == Impact on employment == According to Harvard Business Review, most operations groups adopting RPA have promised their employees that automation would not result in layoffs. Instead, workers have been redeployed to do more interesting work. One academic study highlighted that knowledge workers did not feel threatened by automation: they embraced it and viewed the robots as team-mates. The same study highlighted that, rather than resulting in a lower "headcount", the technology was deployed in such a way as to achieve more work and greater productivity with the same number of people. Conversely, however, some analysts proffer that RPA represents a threat to the business process outsourcing (BPO) industry. The thesis behind this notion is that RPA will enable enterprises to "repatriate" processes from offshore locations into local data centers, with the benefit of this new technology. The effect, if true, will be to create high-value jobs for skilled process designers in onshore locations (and within the associated supply chain of IT hardware, data center management, etc.) but to decrease the available opportunity to low-skilled workers offshore. On the other hand, this discussion appears to be healthy ground for debate as another academic study was at pains to counter the so-called "myth" that RPA will bring back many jobs from offshore. === Impact on society === Academic studies project that RPA, among other technological trends, is expected to drive a new wave of productivity and efficiency gains in the global labour market. Although not directly attributable to RPA alone, Oxford University conjectures that up to 35% of all jobs might be automated by 2035. There are geographic implications to the trend in robotic automation. In the example above where an offshored process is "repatriated" under the control of the client organization (or even displaced by a business process outsourcer) from an offshore location to a data centre, the impact will be a deficit in economic activity to the offshore location and an economic benefit to the originating economy. On this basis, developed economies – with skills and technological infrastructure to develop and support a robotic automation capability – can be expected to achieve a net benefit from the trend. In a TEDx talk hosted by University College London (UCL), entrepreneur David Moss explains that digital labour in the form of RPA is likely to revolutionize the cost model of the services industry by driving the price of products and services down, while simultaneously improving the quality of outcomes and creating increased opportunity for the personalization of services. In a separate TEDx in 2019 talk, Japanese business executive, and former CIO of Barclays bank, Koichi Hasegawa noted that digital robots can be a positive effect on society if we start using a robot with empathy to help every person. He provides a case study of the Japanese insurance companies – Sompo Japan and Aioi – both of whom introduced bots to speed up the process of insurance pay-outs in past massive disaster incidents. Meanwhile, Professor Willcocks, author of the LSE paper cited above, speaks of increased job satisfaction and intellectual stimulation, characterising the technology as having the ability to "take the robot out of the human", a reference to the notion that robots will take over the mundane and repetitive portions of people's daily workload, leaving them to be used in more interpersonal roles or to concentrate on the remaining, more meaningful, portions of their day. It was also found in a 2021 study observing the effects of robotization in Europe that, the gender pay gap increased at a rate of .18% for every 1% increase in robotization of a given industry. == Unassisted RPA == Unassisted RPA, or RPAAI, is the next generation of RPA related technologies. Technological advancements around artificial intelligence allow a process to be run on a computer without needing input from a user. == Hyperautomation == Hyperautomation is the application of advanced technologies like RPA, artificial intelligence, machine learning (ML) and process mining to augment workers and automate processes in ways that are significantly more impactful than traditional automation capabilities. Hyperautomation is the combination of technologies that allow faster application authorship (like low-code and no-code) with automation technologies that coordinate different worker types (i.e. human and artificial) for intelligent and strategic workflow optimization. Gartner's report notes that this trend was kicked off with robotic process automation (RPA). The report notes that, "RPA alone is not hyperautomation. Hyperautomation requires a combination of tools to help support replicating pieces of where the human is involved in a task." == Outsourcing == Back office clerical processes outsourced by large organisations

    Read more →
  • AI takeover

    AI takeover

    An AI takeover is a theorized future event, often depicted in fiction, in which autonomous artificial intelligence systems acquire the capability to supersede human decisions. This could occur through economic manipulation, infrastructure control, or direct intervention, leading to de facto governance. Scenarios range from gradual economic dominance, as automation supplants the human workforce, up to a sudden or aggressive global takeover by a robot uprising or other forms of rogue AI. Stories of AI takeovers have been popular throughout science fiction. Commentators argue that recent advancements in the field have heightened concern about such scenarios. In public debate, prominent figures such as Stephen Hawking have advocated research into precautionary measures to ensure future superintelligent machines remain under human control. == Types == === Automation of the economy === The traditional consensus among economists has been that technological progress does not cause long-term unemployment. However, recent innovation in the fields of robotics and artificial intelligence has raised worries that human labor will become obsolete, leaving workers in some sectors without employment. Many small and medium-sized firms may also be forced to close if they cannot afford or license the latest robotic and AI technology, and may need to focus on areas or services that cannot easily be replaced for continued viability in the face of such technology. ==== Technologies that may displace workers ==== While these technologies have replaced some traditional workers, they also create new opportunities. Industries that are most susceptible to AI-driven automation include transportation, retail, and the military. AI military technologies, for example, can reduce risk by enabling remote operation. A study in 2024 highlights AI's ability to perform routine and repetitive tasks poses significant risks of job displacement, especially in sectors like manufacturing and administrative support. Author Dave Bond argues that as AI technologies continue to develop and expand, the relationship between humans and robots will change; they will become closely integrated in several aspects of life. AI will likely displace some workers while creating opportunities for new jobs in other sectors, especially in fields where tasks are repeatable. Researchers from Stanford's Digital Economy Lab reported in 2025 that since the widespread adoption of generative AI in late 2022, early-career workers (ages 22–25) in the most AI-exposed occupations have experienced a 13 percent relative decline in employment—even after controlling for firm-level shocks—while overall employment has continued to grow robustly. The study further finds that job losses are concentrated in roles where AI automates routine tasks, whereas occupations that leverage AI to augment human work have seen stable or increasing employment. ==== Computer-integrated manufacturing ==== Computer-integrated manufacturing uses computers to control the production process. This allows individual processes to exchange information with each other and initiate actions. Although manufacturing can be faster and less error-prone through the integration of computers, the main advantage is the ability to create automated manufacturing processes. Computer-integrated manufacturing is used in automotive, aviation, space, and shipbuilding industries. ==== White-collar machines ==== The 21st century has seen a variety of skilled tasks partially taken over by machines, including translation, legal research, and journalism. Care work, entertainment, and other tasks requiring empathy, previously thought safe from automation, are increasingly performed by robots and AI systems. ==== Autonomous cars ==== An autonomous car is a vehicle that is capable of sensing its environment and navigating without human input. Many such vehicles are operational and others are being developed, with legislation rapidly expanding to allow their use. Obstacles to widespread adoption of autonomous vehicles have included concerns about the resulting loss of driving-related jobs in the road transport industry, and safety concerns. On March 18, 2018, a pedestrian was struck and killed in Tempe, Arizona by an Uber self-driving car. ==== AI-generated content ==== In the 2020s, automated content became more relevant due to technological advancements in AI models, such as ChatGPT, DALL-E, and Stable Diffusion. In most cases, AI-generated content such as imagery, literature, and music are produced through text prompts. These AI models are sometimes integrated into creative programs. AI-generated art may sample and conglomerate existing creative works, producing results that appear similar to human-made content. Low-quality AI-generated visual artwork can be informally referred to as AI slop. Some artists use a tool called Nightshade that alters images to make them detrimental to the training of text-to-image models if scraped without permission, while still looking normal to humans. AI-generated images are a potential tool for scammers and those looking to gain followers on social media, either to impersonate a famous individual or group or to monetize their audience. The New York Times has sued OpenAI, alleging copyright infringement related to the training and outputs of its AI models. === Eradication === Scientists such as Stephen Hawking are confident that superhuman artificial intelligence is physically possible, stating "there is no physical law precluding particles from being organised in ways that perform even more advanced computations than the arrangements of particles in human brains". According to Nick Bostrom, a superintelligent machine would not necessarily be motivated by the same emotional desire to collect power that often drives human beings but might rather treat power as a means toward attaining its ultimate goals; taking over the world would both increase its access to resources and help to prevent other agents from stopping the machine's plans. As a simplified example, a paperclip maximizer designed solely to create as many paperclips as possible would want to take over the world so that it can use all of the world's resources to create as many paperclips as possible, and, additionally, prevent humans from shutting it down or using those resources on things other than paperclips. There are debates on how realistic AI takeover scenarios are. According to a 2026 research paper, many of the arguments about existential risks are based on speculative assumptions about how intelligent AI systems could become, how they would behave and what goals they might develop over time. A 2023 Reuters/Ipsos survey showed that 61% of American adults feared AI could pose a threat to civilization. Philosopher Niels Wilde refutes the common thread that artificial intelligence inherently presents a looming threat to humanity, stating that these fears stem from perceived intelligence and lack of transparency in AI systems that more closely reflects the human aspects of it rather than those of a machine. AI alignment research studies how to design AI systems so that they follow intended objectives. == Debate == Physicist Stephen Hawking, Microsoft founder Bill Gates, and SpaceX founder Elon Musk have expressed concerns about the possibility that AI could develop to the point that humans could not control it, with Hawking theorizing that this could "spell the end of the human race". Stephen Hawking said in 2014 that "Success in creating AI would be the biggest event in human history. Unfortunately, it might also be the last, unless we learn how to avoid the risks." Hawking believed that in the coming decades, AI could offer "incalculable benefits and risks" such as "technology outsmarting financial markets, out-inventing human researchers, out-manipulating human leaders, and developing weapons we cannot even understand." In January 2015, Nick Bostrom joined Stephen Hawking, Max Tegmark, Elon Musk, Lord Martin Rees, Jaan Tallinn, and numerous AI researchers in signing the Future of Life Institute's open letter speaking to the potential risks and benefits associated with artificial intelligence. The signatories "believe that research on how to make AI systems robust and beneficial is both important and timely, and that there are concrete research directions that can be pursued today." Some focus has been placed on the development of trustworthy AI. Three statements have been posed as to why AI is not inherently trustworthy: 1. An entity X is trustworthy only if X has the right motivations, goodwill and/or adheres to moral obligations towards the trustor; 2. AI systems lack motivations, goodwill, and moral obligations; 3. Therefore, AI systems cannot be trustworthy. There are additional considerations within this framework of trustworthy AI that go further into the fields of explainable artificial intelligence and respect for human privacy. Zanotti and colleagues

    Read more →
  • Shy Girl

    Shy Girl

    Shy Girl is a horror novel initially self-published in February 2025 by Mia Ballard. Publishing rights for the book were acquired by Hachette Book Group, which released the book in the United Kingdom in November 2025 and planned to publish it in the United States in 2026. Its US release was cancelled and its UK release was discontinued after it faced accusations of being created with generative AI. Ballard denied having personally used AI in the book's writing, claiming that a freelance editor had introduced AI-generated changes. She also stated that she would take legal action against the editor. == Premise == The novel follows Gia, a depressed woman with obsessive–compulsive disorder, who encounters a mysterious man named Nathan while looking for a sugar daddy to ease her financial troubles. Nathan offers to erase all of Gia's debts in exchange for her agreeing to live as his pet. Living like an animal convinces her that she is becoming an animal, making her behave like one. == Publication and cancellation == Shy Girl was first self-published online by Mia Ballard in February 2025. Marketing material described the book as a "buzzy BookTok sensation" and "bloody and unforgiving". The self-published edition of the book was highly successful and had over 4,900 ratings on Goodreads and an average score of 3.52 stars. In an interview, Ballard described her writing style as lyrical, feverish, and introspective, and stated she was more interested in "what it feels like to live inside a body" than in plot-driven storylines. Publishing rights were acquired by Hachette Book Group and it was published by its Wildfire imprint in the United Kingdom in November 2025. By March 2026, the book had sold 1,800 copies in the United Kingdom. A US release was planned for 2026 by the imprint Orbit Books. After the British publication, critics and readers began to make claims that the book appeared to have been written by generative AI. A January 2026 post on Reddit claimed that the book had many of the hallmarks of having been written with a large language model, and stated that it was "repulsive" that the book was accepted by Hachette. A two-and-a-half-hour video essay covering the book, titled "i'm pretty sure this book is ai slop", received 1.2 million views on YouTube by March 2026. In response, Hachette Book Group announced in March 2026 that it would cancel the book's US publication and discontinue its UK publication. It told The Wall Street Journal that it had made "a lengthy investigation" before deciding to cancel the book. Ballard told The New York Times that she had not used AI when writing the book, but that AI-generated elements were added by a freelance editor without her knowledge. She also stated that she could not elaborate on her claim because she was pursuing legal action against the editor. Writer Andrea Bartz opined that the situation "raises many concerns about trust, authenticity and publishing's readiness for a new, A.I.-assisted world", but that "readers made it abundantly clear they want books by humans, not machines".

    Read more →
  • Argument Interchange Format

    Argument Interchange Format

    The Argument Interchange Format (AIF) is an international effort to develop a representational mechanism for exchanging argument resources between research groups, tools, and domains using a semantically rich language. AIF traces its history back to a 2005 colloquium in Budapest. The result of the work in Budapest was first published as a draft description in 2006. Building on this foundation, further work then used the AIF to build foundations for the Argument Web. AIF-RDF is the extended ontology represented in the Resource Description Framework Schema (RDFS) semantic language. The Argument Interchange Format introduces a small set of ontological concepts that aim to capture a common understanding of argument -- one that works in multiple domains (both domains of argumentation and also domains of academic research), so that data can be shared and re-used across different projects in different areas. These ontological concepts are: Information (I-nodes) Applications of Rules of Inference (RA-nodes) Applications of Rules of Conflict (CA-nodes) Applications of Rules of Preference (PA-nodes) extended by: Schematic Forms (F-nodes) that are instantiated by RA, CA and PA nodes The AIF has reifications in a variety of development environments and implementation languages including MySQL database schema RDF Prolog JSON as well as translations to visual languages such as DOT and SVG. AIF data can be accessed online at AIFdb.

    Read more →
  • Glow (app)

    Glow (app)

    Glow is a fertility awareness and period-tracking app. It is part of a suite of mobile apps focused on women's reproductive health and childcare, which includes Eve by Glow (a dedicated period tracker), Glow Nurture (a pregnancy tracker), and Glow Baby (a baby development tracker). The Glow company also operates an online shop that sells several fertility-related products, including ovulation test strips, pregnancy tests, and wearable breast pumps. In 2024, Glow was reported to have approximately 25 million users across its various apps and community message boards. == History == Glow debuted in August 2013 as an iOS app. It was founded by Michael Huang and Max Levchin and launched with $6 million in Series A funding from venture capital firms Founders Fund and Andreesen Horowitz. In 2014, Glow raised an additional $17 million in Series B funding, with Formation 8 joining existing investors. In 2015, Glow launched Ruby, an app dedicated to sexual health. That year, Wired reported that the company had added features to their apps allowing men to monitor their fertility. Glow subsequently released an additional set of apps focused on pregnancy tracking and infant development. In 2016, Glow reported that it had a total of approximately 3 million users; by 2018, this had grown to 15 million. Vox described it as one of the “big two” period and fertility tracking apps and the one that had started the “boom” in the femtech space. == Application and features == Glow was initially described as a fertility application that applied data-driven methods to menstrual and ovulation tracking. Core features include cycle logging, ovulation prediction, and symptom tracking. The app also provides educational content related to reproductive health and childcare, as well as a set of online message boards that allow individuals to share experiences and seek peer support. == Privacy and legal issues == Glow has received significant media attention for its privacy and security practices. In 2016, Consumer Reports identified potential exploits in the Glow app that they claimed could have exposed private user data to hackers. Glow subsequently reported that it had fixed the vulnerabilities and told The Washington Post they had no evidence that user data had been compromised. In September 2020, the California Attorney General announced a settlement with Glow related to Consumer Reports’ findings, which included a $250,000 civil penalty. Following the US Supreme Court's 2022 Dobbs v. Jackson ruling, which legalized state-level bans on abortion, Glow (and other fertility trackers, such as Clue and Flo) came under additional scrutiny over concerns that user data on abortions could be reported to law enforcement. After this surge of media interest, a research team affiliated with the University of New South Wales conducted an investigation into the privacy practices of several popular fertility apps, including Glow. Their review of Glow was mixed, noting that they provided several privacy settings and de-identified sensitive data, but that user information could still be disclosed in the future if the app was sold. Glow rejected that claim, telling the Australian Associated Press that it "did not share" personal data. The company also cited several internal security measures it had implemented and its apps' offline data protection setting, which allows users to permanently delete their health-related data. == Reception == In 2014, Fast Company reported that 20,000 women had used Glow to conceive. Later that year, The Guardian included Glow Nurture on its list of the best iPhone apps of 2014. Media coverage often praised Glow's array of menstrual tracking options, although some reviews also noted that fertility apps are not birth control tools and cautioned against relying on them for that purpose. In 2019, Cosmopolitan singled Glow's community of users as one of its standout features.

    Read more →
  • Science Fiction Thinking Machines

    Science Fiction Thinking Machines

    Science Fiction Thinking Machines: Robots, Androids, Computers is an anthology of science fiction short stories edited by American anthologist Groff Conklin. It was first published in hardcover by Vanguard Press in May 1954. An abridged paperback edition titled, Selections from Science Fiction Thinking Machines was later published by Bantam Books in August 1955 and was reprinted in September 1964. The book consists of twenty-two novelettes and short stories by various science fiction authors, together with an introduction and bibliography by the editor. The stories were previously published from 1899-1954, in various science fiction and other magazines. == Contents == Note: stories also appearing in the abridged edition annotated A. "Introduction" (Groff Conklin) "Automata: I" (S. Fowler Wright) "Moxon's Master" (Ambrose Bierce) "Robbie" (Isaac Asimov) A "The Scarab" (Raymond Z. Gallun) "The Mechanical Bride" (Fritz Leiber) "Virtuoso" (Herbert Goldstone) A "Automata: II" (S. Fowler Wright) "Boomerang" (Eric Frank Russell) A "The Jester" (William Tenn) A "R. U. R." (Karel Čapek) "Skirmish" (Clifford D. Simak) A "Soldier Boy" (Michael Shaara) "Automata: III" (S. Fowler Wright) "Men Are Different" (Alan Bloch) A "Letter to Ellen" (Chan Davis) A "Sculptors of Life" (Wallace West) "The Golden Egg" (Theodore Sturgeon) A "Dead End" (Wallace Macfarlane) A "Answer" (Hal Clement) "Sam Hall" (Poul Anderson) A "Dumb Waiter" (Walter M. Miller Jr.) A "Problem for Emmy" (Robert Sherman Townes) A "Selected List of Tales About Robots, Androids, and Computers" (Groff Conklin)

    Read more →
  • Generative adversarial network

    Generative adversarial network

    A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative artificial intelligence. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks compete with each other in the form of a zero-sum game, where one agent's gain is another agent's loss. Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics. Though originally proposed as a form of generative model for unsupervised learning, GANs have also proved useful for semi-supervised learning, fully supervised learning, and reinforcement learning. The core idea of a GAN is based on the "indirect" training through the discriminator, another neural network that can tell how "realistic" the input seems, which itself is also being updated dynamically. This means that the generator is not trained to minimize the distance to a specific image, but rather to fool the discriminator. This enables the model to learn in an unsupervised manner. GANs are similar to mimicry in evolutionary biology, with an evolutionary arms race between both networks. == Definition == === Mathematical === The original GAN is defined as the following game: Each probability space ( Ω , μ ref ) {\displaystyle (\Omega ,\mu _{\text{ref}})} defines a GAN game. There are 2 players: generator and discriminator. The generator's strategy set is P ( Ω ) {\displaystyle {\mathcal {P}}(\Omega )} , the set of all probability measures μ G {\displaystyle \mu _{G}} on Ω {\displaystyle \Omega } . The discriminator's strategy set is the set of Markov kernels μ D : Ω → P [ 0 , 1 ] {\displaystyle \mu _{D}:\Omega \to {\mathcal {P}}[0,1]} , where P [ 0 , 1 ] {\displaystyle {\mathcal {P}}[0,1]} is the set of probability measures on [ 0 , 1 ] {\displaystyle [0,1]} . The GAN game is a zero-sum game, with objective function L ( μ G , μ D ) := E x ∼ μ ref , y ∼ μ D ( x ) ⁡ [ ln ⁡ y ] + E x ∼ μ G , y ∼ μ D ( x ) ⁡ [ ln ⁡ ( 1 − y ) ] . {\displaystyle L(\mu _{G},\mu _{D}):=\operatorname {E} _{x\sim \mu _{\text{ref}},y\sim \mu _{D}(x)}[\ln y]+\operatorname {E} _{x\sim \mu _{G},y\sim \mu _{D}(x)}[\ln(1-y)].} The generator aims to minimize the objective, and the discriminator aims to maximize the objective. The generator's task is to approach μ G ≈ μ ref {\displaystyle \mu _{G}\approx \mu _{\text{ref}}} , that is, to match its own output distribution as closely as possible to the reference distribution. The discriminator's task is to output a value close to 1 when the input appears to be from the reference distribution, and to output a value close to 0 when the input looks like it came from the generator distribution. === In practice === The generative network generates candidates while the discriminative network evaluates them. This creates a contest based on data distributions, where the generator learns to map from a latent space to the true data distribution, aiming to produce candidates that the discriminator cannot distinguish from real data. The discriminator's goal is to correctly identify these candidates, but as the generator improves, its task becomes more challenging, increasing the discriminator's error rate. A known dataset serves as the initial training data for the discriminator. Training involves presenting it with samples from the training dataset until it achieves acceptable accuracy. The generator is trained based on whether it succeeds in fooling the discriminator. Typically, the generator is seeded with randomized input that is sampled from a predefined latent space (e.g. a multivariate normal distribution). Thereafter, candidates synthesized by the generator are evaluated by the discriminator. Independent backpropagation procedures are applied to both networks so that the generator produces better samples, while the discriminator becomes more skilled at flagging synthetic samples. When used for image generation, the generator is typically a deconvolutional neural network, and the discriminator is a convolutional neural network. === Relation to other statistical machine learning methods === GANs are implicit generative models, which means that they do not explicitly model the likelihood function nor provide a means for finding the latent variable corresponding to a given sample, unlike alternatives such as flow-based generative model. Compared to fully visible belief networks such as WaveNet and PixelRNN and autoregressive models in general, GANs can generate one complete sample in one pass, rather than multiple passes through the network. Compared to Boltzmann machines and linear ICA, there is no restriction on the type of function used by the network. Since neural networks are universal approximators, GANs are asymptotically consistent. Variational autoencoders might be universal approximators, but it is not proven as of 2017. == Mathematical properties == === Measure-theoretic considerations === This section provides some of the mathematical theory behind these methods. In modern probability theory based on measure theory, a probability space also needs to be equipped with a σ-algebra. As a result, a more rigorous definition of the GAN game would make the following changes:Each probability space ( Ω , B , μ ref ) {\displaystyle (\Omega ,{\mathcal {B}},\mu _{\text{ref}})} defines a GAN game. The generator's strategy set is P ( Ω , B ) {\displaystyle {\mathcal {P}}(\Omega ,{\mathcal {B}})} , the set of all probability measures μ G {\displaystyle \mu _{G}} on the measure-space ( Ω , B ) {\displaystyle (\Omega ,{\mathcal {B}})} . The discriminator's strategy set is the set of Markov kernels μ D : ( Ω , B ) → P ( [ 0 , 1 ] , B ( [ 0 , 1 ] ) ) {\displaystyle \mu _{D}:(\Omega ,{\mathcal {B}})\to {\mathcal {P}}([0,1],{\mathcal {B}}([0,1]))} , where B ( [ 0 , 1 ] ) {\displaystyle {\mathcal {B}}([0,1])} is the Borel σ-algebra on [ 0 , 1 ] {\displaystyle [0,1]} .Since issues of measurability never arise in practice, these will not concern us further. === Choice of the strategy set === In the most generic version of the GAN game described above, the strategy set for the discriminator contains all Markov kernels μ D : Ω → P [ 0 , 1 ] {\displaystyle \mu _{D}:\Omega \to {\mathcal {P}}[0,1]} , and the strategy set for the generator contains arbitrary probability distributions μ G {\displaystyle \mu _{G}} on Ω {\displaystyle \Omega } . However, as shown below, the optimal discriminator strategy against any μ G {\displaystyle \mu _{G}} is deterministic, so there is no loss of generality in restricting the discriminator's strategies to deterministic functions D : Ω → [ 0 , 1 ] {\displaystyle D:\Omega \to [0,1]} . In most applications, D {\displaystyle D} is a deep neural network function. As for the generator, while μ G {\displaystyle \mu _{G}} could theoretically be any computable probability distribution, in practice, it is usually implemented as a pushforward: μ G = μ Z ∘ G − 1 {\displaystyle \mu _{G}=\mu _{Z}\circ G^{-1}} . That is, start with a random variable z ∼ μ Z {\displaystyle z\sim \mu _{Z}} , where μ Z {\displaystyle \mu _{Z}} is a probability distribution that is easy to compute (such as the uniform distribution, or the Gaussian distribution), then define a function G : Ω Z → Ω {\displaystyle G:\Omega _{Z}\to \Omega } . Then the distribution μ G {\displaystyle \mu _{G}} is the distribution of G ( z ) {\displaystyle G(z)} . Consequently, the generator's strategy is usually defined as just G {\displaystyle G} , leaving z ∼ μ Z {\displaystyle z\sim \mu _{Z}} implicit. In this formalism, the GAN game objective is L ( G , D ) := E x ∼ μ ref ⁡ [ ln ⁡ D ( x ) ] + E z ∼ μ Z ⁡ [ ln ⁡ ( 1 − D ( G ( z ) ) ) ] . {\displaystyle L(G,D):=\operatorname {E} _{x\sim \mu _{\text{ref}}}[\ln D(x)]+\operatorname {E} _{z\sim \mu _{Z}}[\ln(1-D(G(z)))].} === Generative reparametrization === The GAN architecture has two main components. One is casting optimization into a game, of form min G max D L ( G , D ) {\displaystyle \min _{G}\max _{D}L(G,D)} , which is different from the usual kind of optimization, of form min θ L ( θ ) {\displaystyle \min _{\theta }L(\theta )} . The other is the decomposition of μ G {\displaystyle \mu _{G}} into μ Z ∘ G − 1 {\displaystyle \mu _{Z}\circ G^{-1}} , which can be understood as a reparametrization trick. To see its significance, one must compare GAN with previous methods for learning generative models, which were plagued with "intractable probabilistic computations that arise in maximum likelihood estimation and related strategies". At the same time, Kingma and Welling and Rezende et al. developed the same idea of reparametrization into a general stochastic backpropagation method. Among its first applications was the variational autoencoder. === Move order and st

    Read more →
  • Stable Diffusion

    Stable Diffusion

    Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing AI boom. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. Its development involved researchers from the CompVis Group at LMU Munich and Runway with a computational donation from Stability and training data from non-profit organizations. Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been released publicly, and an optimized version can run on most consumer hardware equipped with a modest GPU with as little as 2.4 GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services. == Development == Stable Diffusion originated from a project called Latent Diffusion, developed in Germany by researchers at LMU Munich in Munich and Heidelberg University. Four of the original 5 authors (Robin Rombach, Andreas Blattmann, Patrick Esser and Dominik Lorenz) later joined Stability AI and released subsequent versions of Stable Diffusion. The technical license for the model was released by the CompVis group at LMU Munich. Development was led by Patrick Esser of Runway and Robin Rombach of CompVis, who were among the researchers who had earlier invented the latent diffusion model architecture used by Stable Diffusion. Stability AI also credited EleutherAI and LAION (a German nonprofit which assembled the dataset on which Stable Diffusion was trained) as supporters of the project. == Technology == === Architecture === Diffusion models, introduced in 2015, are trained with the objective of removing successive applications of Gaussian noise on training images, which can be thought of as a sequence of denoising autoencoders. The name diffusion is from the thermodynamic diffusion, since they were first developed with inspiration from thermodynamics. Models in Stable Diffusion series before SD 3 all used a variant of diffusion models, called latent diffusion model (LDM), developed in 2021 by the CompVis (Computer Vision & Learning) group at LMU Munich. Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder. The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image. Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion. The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation. Finally, the VAE decoder generates the final image by converting the representation back into pixel space. The denoising step can be flexibly conditioned on a string of text, an image, or another modality. The encoded conditioning data is exposed to denoising U-Nets via a cross-attention mechanism. For conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts to an embedding space. Researchers point to increased computational efficiency for training and generation as an advantage of LDMs. With 860 million parameters in the U-Net and 123 million in the text encoder, Stable Diffusion is considered relatively lightweight by 2022 standards, and unlike other diffusion models, it can run on consumer GPUs, and even CPU-only if using the OpenVINO version of Stable Diffusion. ==== SD XL ==== The XL version uses the same LDM architecture as previous versions, except larger: larger UNet backbone, larger cross-attention context, two text encoders instead of one, and trained on multiple aspect ratios (not just the square aspect ratio like previous versions). The SD XL Refiner, released at the same time, has the same architecture as SD XL, but it was trained for adding fine details to preexisting images via text-conditional img2img. ==== SD 3.0 ==== The 3.0 version completely changes the backbone. Not a UNet, but a Rectified Flow Transformer, which implements the rectified flow method with a Transformer. The Transformer architecture used for SD 3.0 has three "tracks", for original text encoding, transformed text encoding, and image encoding (in latent space). The transformed text encoding and image encoding are mixed during each transformer block. The architecture is named "multimodal diffusion transformer (MMDiT), where the "multimodal" means that it mixes text and image encodings inside its operations. This differs from previous versions of DiT, where the text encoding affects the image encoding, but not vice versa. === Training data === Stable Diffusion was trained on pairs of images and captions taken from LAION-5B, a publicly available dataset derived from Common Crawl data scraped from the web, where 5 billion image-text pairs were classified based on language and filtered into separate datasets by resolution, a predicted likelihood of containing a watermark, and predicted "aesthetic" score (e.g. subjective visual quality). The dataset was created by LAION, a German non-profit which receives funding from Stability AI. The Stable Diffusion model was trained on three subsets of LAION-5B: laion2B-en, laion-high-resolution, and laion-aesthetics v2 5+. A third-party analysis of the model's training data identified that out of a smaller subset of 12 million images taken from the original wider dataset used, approximately 47% of the sample size of images came from 100 different domains, with Pinterest taking up 8.5% of the subset, followed by websites such as WordPress, Blogspot, Flickr, DeviantArt and Wikimedia Commons. An investigation by Bayerischer Rundfunk showed that LAION's datasets, hosted on Hugging Face, contain large amounts of private and sensitive data. === Training procedures === The model was initially trained on the laion2B-en and laion-high-resolution subsets, with the last few rounds of training done on LAION-Aesthetics v2 5+, a subset of 600 million captioned images which the LAION-Aesthetics Predictor V2 predicted that humans would, on average, give a score of at least 5 out of 10 when asked to rate how much they liked them. The LAION-Aesthetics v2 5+ subset also excluded low-resolution images and images which LAION-5B-WatermarkDetection identified as carrying a watermark with greater than 80% probability. Final rounds of training additionally dropped 10% of text conditioning to improve Classifier-Free Diffusion Guidance. The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of $600,000. === Limitations === Stable Diffusion has issues with degradation and inaccuracies in certain scenarios. Initial releases of the model were trained on a dataset that consists of 512×512 resolution images, meaning that the quality of generated images noticeably degrades when user specifications deviate from its "expected" 512×512 resolution; the version 2.0 update of the Stable Diffusion model later introduced the ability to natively generate images at 768×768 resolution. Another challenge is in generating human limbs due to poor data quality of limbs in the LAION database. The model is insufficiently trained to replicate human limbs and faces due to the lack of representative features in the database, and prompting the model to generate images of such type can confound the model. In addition to human limbs, Stable Diffusion is unable to generate legible ambigrams and some other forms of text and typography. Stable Diffusion XL (SDXL) version 1.0, released in July 2023, introduced native 1024x1024 resolution and improved generation for limbs and text. Accessibility for individual developers can also be a problem. In order to customize the model for new use cases that are not included in the dataset, such as generating anime characters ("waifu diffusion"), new data and further training are required. Fine-tuned adaptations of Stable Diffusion created through additional retraining have been used for a variety of different use-cases, from medical imaging to algorithmically generated music. However, this fine-tuning process is sensitive to the quality of new data; low resolution images or different resolutions from the original data can not only fail to learn the new task but degrade the overall performance of the model. Even when the model is additionally trained on high quality images, it is difficult for individuals to run models in consumer electronics. For example, the training process for waifu-diffusion requires a minimum 30 GB of VRAM, which exceeds the usual resource provided in such consumer GPUs as Nvidia's GeForce 30 series, w

    Read more →
  • Kernel embedding of distributions

    Kernel embedding of distributions

    In machine learning, the kernel embedding of distributions (also called the kernel mean or mean map) comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space Ω {\displaystyle \Omega } on which a sensible kernel function (measuring similarity between elements of Ω {\displaystyle \Omega } ) may be defined. For example, various kernels have been proposed for learning from data which are: vectors in R d {\displaystyle \mathbb {R} ^{d}} , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song, Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in. The analysis of distributions is fundamental in machine learning and statistics, and many algorithms in these fields rely on information theoretic approaches such as entropy, mutual information, or Kullback–Leibler divergence. However, to estimate these quantities, one must first either perform density estimation, or employ sophisticated space-partitioning/bias-correction strategies which are typically infeasible for high-dimensional data. Commonly, methods for modeling complex distributions rely on parametric assumptions that may be unfounded or computationally challenging (e.g. Gaussian mixture models), while nonparametric methods like kernel density estimation (Note: the smoothing kernels in this context have a different interpretation than the kernels discussed here) or characteristic function representation (via the Fourier transform of the distribution) break down in high-dimensional settings. Methods based on the kernel embedding of distributions sidestep these problems and also possess the following advantages: Data may be modeled without restrictive assumptions about the form of the distributions and relationships between variables Intermediate density estimation is not needed Practitioners may specify the properties of a distribution most relevant for their problem (incorporating prior knowledge via choice of the kernel) If a characteristic kernel is used, then the embedding can uniquely preserve all information about a distribution, while thanks to the kernel trick, computations on the potentially infinite-dimensional RKHS can be implemented in practice as simple Gram matrix operations Dimensionality-independent rates of convergence for the empirical kernel mean (estimated using samples from the distribution) to the kernel embedding of the true underlying distribution can be proven. Learning algorithms based on this framework exhibit good generalization ability and finite sample convergence, while often being simpler and more effective than information theoretic methods Thus, learning via the kernel embedding of distributions offers a principled drop-in replacement for information theoretic approaches and is a framework which not only subsumes many popular methods in machine learning and statistics as special cases, but also can lead to entirely new learning algorithms. == Definitions == Let X {\displaystyle X} denote a random variable with domain Ω {\displaystyle \Omega } and distribution P {\displaystyle P} . Given a symmetric, positive-definite kernel k : Ω × Ω → R {\displaystyle k:\Omega \times \Omega \rightarrow \mathbb {R} } the Moore–Aronszajn theorem asserts the existence of a unique RKHS H {\displaystyle {\mathcal {H}}} on Ω {\displaystyle \Omega } (a Hilbert space of functions f : Ω → R {\displaystyle f:\Omega \to \mathbb {R} } equipped with an inner product ⟨ ⋅ , ⋅ ⟩ H {\displaystyle \langle \cdot ,\cdot \rangle _{\mathcal {H}}} and a norm ‖ ⋅ ‖ H {\displaystyle \|\cdot \|_{\mathcal {H}}} ) for which k {\displaystyle k} is a reproducing kernel, i.e., in which the element k ( x , ⋅ ) {\displaystyle k(x,\cdot )} satisfies the reproducing property ⟨ f , k ( x , ⋅ ) ⟩ H = f ( x ) ∀ f ∈ H , ∀ x ∈ Ω . {\displaystyle \langle f,k(x,\cdot )\rangle _{\mathcal {H}}=f(x)\qquad \forall f\in {\mathcal {H}},\quad \forall x\in \Omega .} One may alternatively consider x ↦ k ( x , ⋅ ) {\displaystyle x\mapsto k(x,\cdot )} as an implicit feature mapping φ : Ω → H {\displaystyle \varphi :\Omega \rightarrow {\mathcal {H}}} (which is therefore also called the feature space), so that k ( x , x ′ ) = ⟨ φ ( x ) , φ ( x ′ ) ⟩ H {\displaystyle k(x,x')=\langle \varphi (x),\varphi (x')\rangle _{\mathcal {H}}} can be viewed as a measure of similarity between points x , x ′ ∈ Ω . {\displaystyle x,x'\in \Omega .} While the similarity measure is linear in the feature space, it may be highly nonlinear in the original space depending on the choice of kernel. === Kernel embedding === The kernel embedding of the distribution P {\displaystyle P} in H {\displaystyle {\mathcal {H}}} (also called the kernel mean or mean map) is given by: μ X := E [ k ( X , ⋅ ) ] = E [ φ ( X ) ] = ∫ Ω φ ( x ) d P ( x ) {\displaystyle \mu _{X}:=\mathbb {E} [k(X,\cdot )]=\mathbb {E} [\varphi (X)]=\int _{\Omega }\varphi (x)\ \mathrm {d} P(x)} If P {\displaystyle P} allows a square integrable density p {\displaystyle p} , then μ X = E k p {\displaystyle \mu _{X}={\mathcal {E}}_{k}p} , where E k {\displaystyle {\mathcal {E}}_{k}} is the Hilbert–Schmidt integral operator. A kernel is characteristic if the mean embedding μ : { family of distributions over Ω } → H {\displaystyle \mu :\{{\text{family of distributions over }}\Omega \}\to {\mathcal {H}}} is injective. Each distribution can thus be uniquely represented in the RKHS and all statistical features of distributions are preserved by the kernel embedding if a characteristic kernel is used. === Empirical kernel embedding === Given n {\displaystyle n} training examples { x 1 , … , x n } {\displaystyle \{x_{1},\ldots ,x_{n}\}} drawn independently and identically distributed (i.i.d.) from P , {\displaystyle P,} the kernel embedding of P {\displaystyle P} can be empirically estimated as μ ^ X = 1 n ∑ i = 1 n φ ( x i ) {\displaystyle {\widehat {\mu }}_{X}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})} === Joint distribution embedding === If Y {\displaystyle Y} denotes another random variable (for simplicity, assume the co-domain of Y {\displaystyle Y} is also Ω {\displaystyle \Omega } with the same kernel k {\displaystyle k} which satisfies ⟨ φ ( x ) ⊗ φ ( y ) , φ ( x ′ ) ⊗ φ ( y ′ ) ⟩ = k ( x , x ′ ) k ( y , y ′ ) {\displaystyle \langle \varphi (x)\otimes \varphi (y),\varphi (x')\otimes \varphi (y')\rangle =k(x,x')k(y,y')} ), then the joint distribution P ( x , y ) ) {\displaystyle P(x,y))} can be mapped into a tensor product feature space H ⊗ H {\displaystyle {\mathcal {H}}\otimes {\mathcal {H}}} via C X Y = E [ φ ( X ) ⊗ φ ( Y ) ] = ∫ Ω × Ω φ ( x ) ⊗ φ ( y ) d P ( x , y ) {\displaystyle {\mathcal {C}}_{XY}=\mathbb {E} [\varphi (X)\otimes \varphi (Y)]=\int _{\Omega \times \Omega }\varphi (x)\otimes \varphi (y)\ \mathrm {d} P(x,y)} By the equivalence between a tensor and a linear map, this joint embedding may be interpreted as an uncentered cross-covariance operator C X Y : H → H {\displaystyle {\mathcal {C}}_{XY}:{\mathcal {H}}\to {\mathcal {H}}} from which the cross-covariance of functions f , g ∈ H {\displaystyle f,g\in {\mathcal {H}}} can be computed as Cov ⁡ ( f ( X ) , g ( Y ) ) := E [ f ( X ) g ( Y ) ] − E [ f ( X ) ] E [ g ( Y ) ] = ⟨ f , C X Y g ⟩ H = ⟨ f ⊗ g , C X Y ⟩ H ⊗ H {\displaystyle \operatorname {Cov} (f(X),g(Y)):=\mathbb {E} [f(X)g(Y)]-\mathbb {E} [f(X)]\mathbb {E} [g(Y)]=\langle f,{\mathcal {C}}_{XY}g\rangle _{\mathcal {H}}=\langle f\otimes g,{\mathcal {C}}_{XY}\rangle _{{\mathcal {H}}\otimes {\mathcal {H}}}} Given n {\displaystyle n} pairs of training examples { ( x 1 , y 1 ) , … , ( x n , y n ) } {\displaystyle \{(x_{1},y_{1}),\dots ,(x_{n},y_{n})\}} drawn i.i.d. from P {\displaystyle P} , we can also empirically estimate the joint distribution kernel embedding via C ^ X Y = 1 n ∑ i = 1 n φ ( x i ) ⊗ φ ( y i ) {\displaystyle {\widehat {\mathcal {C}}}_{XY}={\frac {1}{n}}\sum _{i=1}^{n}\varphi (x_{i})\otimes \varphi (y_{i})} === Conditional distribution embedding === Given a conditional distribution P ( y ∣ x ) , {\displaystyle P(y\mid x),} one can define the corresponding RKHS embedding as μ Y ∣ x = E [ φ ( Y ) ∣ X ] = ∫ Ω φ ( y ) d P ( y ∣ x ) {\displaystyle \mu _{Y\mid x}=\mathbb {E} [\varphi (Y)\mid X]=\int _{\Omega

    Read more →
  • Conference on Neural Information Processing Systems

    Conference on Neural Information Processing Systems

    The Conference on Neural Information Processing Systems (abbreviated as NeurIPS and formerly NIPS) is a machine learning and computational neuroscience conference held annually in December. Along with ICLR and ICML, it is one of the three primary conferences of high impact in machine learning and artificial intelligence research. The conference includes three days of invited talks along with oral and poster presentations of refereed papers, followed by two days of workshops and competitions. == History == The NeurIPS meeting was first proposed in 1986 at the annual invitation-only Snowbird Meeting on Neural Networks for Computing organized by The California Institute of Technology and Bell Laboratories. NeurIPS was designed as a complementary open interdisciplinary meeting for researchers exploring biological and artificial Neural Networks. Reflecting this multidisciplinary approach, NeurIPS began in 1987 with information theorist Ed Posner as the conference president and learning theorist Yaser Abu-Mostafa as program chairman. Research presented in the early NeurIPS meetings included a wide range of topics from efforts to solve purely engineering problems to the use of computer models as a tool for understanding biological nervous systems. Since then, the biological and artificial systems research streams have diverged, and recent NeurIPS proceedings have been dominated by papers on machine learning, artificial intelligence and statistics. From 1987 until 2000 NeurIPS was held in Denver, United States. Since then, the conference was held in Vancouver, Canada (2001–2010), Granada, Spain (2011), and Lake Tahoe, United States (2012–2013). In 2014 and 2015, the conference was held in Montreal, Canada, in Barcelona, Spain in 2016, in Long Beach, United States in 2017, in Montreal, Canada in 2018 and Vancouver, Canada in 2019. Reflecting its origins at Snowbird, Utah, the meeting was accompanied by workshops organized at a nearby ski resort up until 2013, when it outgrew ski resorts. The first NeurIPS Conference was sponsored by the IEEE. The following NeurIPS Conferences have been organized by the NeurIPS Foundation, established by Ed Posner. Terrence Sejnowski has been the president of the NeurIPS Foundation since Posner's death in 1993. The board of trustees consists of previous general chairs of the NeurIPS Conference. The first proceedings was published in book form by the American Institute of Physics in 1987, and was entitled Neural Information Processing Systems, then the proceedings from the following conferences have been published by Morgan Kaufmann (1988–1993), MIT Press (1994–2004) and Curran Associates (2005–present) under the name Advances in Neural Information Processing Systems. The conference was originally abbreviated as "NIPS". By 2018 a few commentators were criticizing the abbreviation as encouraging sexism due to its association with the word nipples, and as being a slur against Japanese. The board changed the abbreviation to "NeurIPS" in November 2018. == Topics == Along with machine learning and neuroscience, other fields represented at NeurIPS include cognitive science, psychology, computer vision, statistical linguistics, and information theory. Over the years, NeurIPS became a premier conference on machine learning and although the 'Neural' in the NeurIPS acronym had become something of a historical relic, the resurgence of deep learning in neural networks since 2012, fueled by faster computers and big data, has led to achievements in speech recognition, object recognition in images, image captioning, language translation and world championship performance in the game of Go, based on neural architectures inspired by the hierarchy of areas in the visual cortex (ConvNet) and reinforcement learning inspired by the basal ganglia (Temporal difference learning). Notable affinity groups have emerged from the NeurIPS conference and displayed diversity, including Black in AI (in 2017), Queer in AI (in 2016), and others. === Named lectures === In addition to invited talks and symposia, NeurIPS also organizes two named lectureships to recognize distinguished researchers. The NeurIPS Board introduced the Posner Lectureship in honor of NeurIPS founder Ed Posner; two Posner Lectures were given each year up to 2015. Past lecturers have included: 2010 – Josh Tenenbaum and Michael I. Jordan 2011 – Rich Sutton and Bernhard Schölkopf 2012 – Thomas Dietterich and Terry Sejnowski 2013 – Daphne Koller and Peter Dayan 2014 – Michael Kearns and John Hopfield 2015 – Zoubin Ghahramani and Vladimir Vapnik 2016 – Yann LeCun 2017 – John Platt 2018 – Joëlle Pineau 2019 – Yoshua Bengio 2020 – Christopher Bishop 2021 – Peter Bartlett In 2015, the NeurIPS Board introduced the Breiman Lectureship to highlight work in statistics relevant to conference topics. The lectureship was named for statistician Leo Breiman, who served on the NeurIPS Board from 1994 to 2005. Past lecturers have included: 2015 – Robert Tibshirani 2016 – Susan Holmes 2017 – Yee Whye Teh 2018 – David Spiegelhalter 2019 – Bin Yu 2020 – Marloes Maathuis 2021 – Gabor Lugosi 2022 – Emmanuel Candes 2023 – Susan Murphy 2024 – Arnaud Doucet == NeurIPS consistency experiment == In NIPS 2014, the program chairs duplicated 10% of all submissions and sent them through separate reviewers to evaluate randomness in the reviewing process. Several researchers interpreted the result. Regarding whether the decision in NIPS is completely random or not, John Langford writes: "Clearly not—a purely random decision would have arbitrariness of ~78%. It is, however, quite notable that 60% is much closer to 78% than 0%." He concludes that the result of the reviewing process is mostly arbitrary. In NeurIPS 2021, the program chairs repeated the 2014 experiment and found similar levels of review inconsistency; 23% of duplicated submissions received different accept/reject decisions, and 50.6% of accepted papers would have been rejected under re-review. == Locations == 1987–2000: Denver, Colorado, United States 2001–2010: Vancouver, British Columbia, Canada 2011: Granada, Spain 2012 & 2013: Stateline, Nevada, United States 2014 & 2015: Montréal, Quebec, Canada 2016: Barcelona, Spain 2017: Long Beach, California, United States 2018: Montréal, Quebec, Canada 2019: Vancouver, British Columbia, Canada 2020: Vancouver, British Columbia, Canada (virtual conference) 2021: Virtual conference 2022 & 2023: New Orleans, Louisiana, United States 2024: Vancouver, British Columbia, Canada 2025: San Diego, California, United States and Mexico City, Mexico 2026: Sydney, New South Wales, Australia, with satellite events in Atlanta and Paris

    Read more →
  • Distributed artificial intelligence

    Distributed artificial intelligence

    Distributed Artificial Intelligence (DAI) (also called Decentralized Artificial Intelligence) is a melding of artificial intelligence with distributed computing. From artificial intelligence comes the theory and technology for constructing or analyzing an intelligent system. But where artificial intelligence uses psychology as a source of ideas, inspiration, and metaphor, DAI uses sociology, economics, and management science for inspiration. Where the focus of artificial intelligence is on the individual, the focus of DAI is on the group. Distributed computing provides the computational substrate on which this group focus can occur. Using techniques from artificial intelligence, communication theory, control theory, and interaction theory, it produces a cooperative solution to problems by a decentralized group of computational entities (agents). DAI is closely related to and a predecessor of the field of multi-agent systems. They are distinguished generally by multi-agent systems being open, where the entities might arise from different interests and have individual goals, and distributed artificial-intelligence systems, where the entities have common goals. There are numerous applications and tools. == Definition == Distributed Artificial Intelligence (DAI) is an approach to solving complex learning, planning, and decision-making problems. It is embarrassingly parallel, thus able to exploit large scale computation and spatial distribution of computing resources. These properties allow it to solve problems that require the processing of very large data sets. DAI systems consist of autonomous learning processing nodes (agents), that are distributed, often at a very large scale. DAI nodes can act independently, and partial solutions are integrated by communication between nodes, often asynchronously. By virtue of their scale, DAI systems are robust and elastic, and by necessity, loosely coupled. Furthermore, DAI systems are built to be adaptive to changes in the problem definition or underlying data sets due to the scale and difficulty in redeployment. DAI systems do not require all the relevant data to be aggregated in a single location, in contrast to monolithic or centralized Artificial Intelligence systems, which have tightly coupled and geographically close processing nodes. Therefore, DAI systems often operate on sub-samples or hashed impressions of very large datasets. In addition, the source dataset may change or be updated during the course of the execution of a DAI system. == Development == In 1975 distributed artificial intelligence emerged as a subfield of artificial intelligence that dealt with interactions of intelligent agents. As a scientific discipline, it progressed through a series of workshops in the USA (International Workshop on Distributed Artificial Intelligence, held in 13 editions from 1978 - 1994), Europe (Workshop on Modelling Autonomous Agents in a Multi-Agent World https://link.springer.com/conference/maamaw), and Asia (Multi-Agent and Cooperative Computation Workshop (MACC) https://sites.google.com/view/sig-macc/macc-workshop?authuser=0). Distributed artificial intelligence systems were conceived as a group of intelligent entities, called agents, that interacted by cooperation, by coexistence, or by competition. DAI is categorized into multi-agent systems and distributed problem solving. In multi-agent systems the main focus is how agents coordinate their knowledge and activities. For distributed problem solving the major focus is how the problem is decomposed and the solutions are synthesized. == Goals == The objectives of Distributed Artificial Intelligence are to solve the reasoning, planning, learning and perception problems of artificial intelligence, especially if they require large data, by distributing the problem to autonomous processing nodes (agents). To reach the objective, DAI requires: A distributed system with robust and elastic computation on unreliable and failing resources that are loosely coupled Coordination of the actions and communication of the nodes Subsamples of large data sets and online machine learning There are many reasons for wanting to distribute intelligence or cope with multi-agent systems. Mainstream problems in DAI research include the following: Parallel problem solving: mainly deals with how classic artificial intelligence concepts can be modified, so that multiprocessor systems and clusters of computers can be used to speed up calculation. Distributed problem solving (DPS): the concept of agent, autonomous entities that can communicate with each other, was developed to serve as an abstraction for developing DPS systems. See below for further details. Multi-Agent Based Simulation (MABS): a branch of DAI that builds the foundation for simulations that need to analyze not only phenomena at macro level but also at micro level, as it is in many social simulation scenarios. == Approaches == Two types of DAI has emerged: In Multi-agent systems agents coordinate their knowledge and activities and reason about the processes of coordination. Agents are physical or virtual entities that can act, perceive their environment, and communicate with other agents. An agent is autonomous and has skills to achieve goals. The agents change the state of their environment by their actions. There are a number of different coordination techniques. In distributed problem solving the work is divided among nodes and the knowledge is shared. The main concerns are task decomposition and synthesis of the knowledge and solutions. DAI can apply a bottom-up approach to AI, similar to the subsumption architecture as well as the traditional top-down approach of AI. In addition, DAI can also be a vehicle for emergence. === Challenges === The challenges in Distributed AI are: How to carry out communication and interaction of agents and which communication language or protocols should be used. How to ensure the coherency of agents. How to synthesise the results among 'intelligent agents' group by formulation, description, decomposition and allocation. == Applications and tools == Areas where DAI have been applied are: Electronic commerce, e.g. for trading strategies the DAI system learns financial trading rules from subsamples of very large samples of financial data Networks, e.g. in telecommunications the DAI system controls the cooperative resources in a WLAN network Routing, e.g. model vehicle flow in transport networks Scheduling, e.g. flow shop scheduling where the resource management entity ensures local optimization and cooperation for global and local consistency Search engines, e.g. in LLM federated search like Ithy where document retrieval and analysis are distributed to DAI agents before aggregation Multi-Agent systems, e.g. artificial life, the study of simulated life Electric power systems, e.g. Condition Monitoring Multi-Agent System (COMMAS) applied to transformer condition monitoring, and IntelliTEAM II Automatic Restoration System DAI integration in tools has included: ECStar is a distributed rule-based learning system. == Agents == === Systems: Agents and multi-agents === Notion of Agents: Agents can be described as distinct entities with standard boundaries and interfaces designed for problem solving. Notion of Multi-Agents: Multi-Agent system is defined as a network of agents which are loosely coupled working as a single entity like society for problem solving that an individual agent cannot solve. === Software agents === The key concept used in DPS and MABS is the abstraction called software agents. An agent is a virtual (or physical) autonomous entity that has an understanding of its environment and acts upon it. An agent is usually able to communicate with other agents in the same system to achieve a common goal, that one agent alone could not achieve. This communication system uses an agent communication language. A first classification that is useful is to divide agents into: reactive agent – A reactive agent is not much more than an automaton that receives input, processes it and produces an output. deliberative agent – A deliberative agent in contrast should have an internal view of its environment and is able to follow its own plans. hybrid agent – A hybrid agent is a mixture of reactive and deliberative, that follows its own plans, but also sometimes directly reacts to external events without deliberation. Well-recognized agent architectures that describe how an agent is internally structured are: ASMO (emergence of distributed modules) BDI (Believe Desire Intention, a general architecture that describes how plans are made) InterRAP (A three-layer architecture, with a reactive, a deliberative and a social layer) PECS (Physics, Emotion, Cognition, Social, describes how those four parts influences the agents behavior). Soar (a rule-based approach)

    Read more →
  • Galatea (video game)

    Galatea (video game)

    Galatea is an interactive fiction video game by Emily Short featuring a modern rendition of the Greek myth of Galatea, the sculpture of a woman that gained life. It took "Best of Show" in the 2000 IF Art Show and won a XYZZY Award for Best non-player character. The game displays an unusually rich approach to non-player character dialogue and diverts from the typical puzzle-solving in interactive fiction: gameplay consists entirely of interacting with a single character in a single room. Galatea is licensed under the Creative Commons BY-NC-ND 3.0 US license. == Gameplay == Galatea alters the typical interactive fiction game mechanics by concentrating instead on the player's interactions with a single non-player character (NPC), the eponymous Galatea. Much of the interest of the piece derives from the ambiguous nature of the player–NPC dialogue: the form of the conversation and, indeed, the nature of Galatea herself shift depending on the focus the player places on certain aspects of the character's personality. Numerous endings are possible. Gameplay centers around the developing dialogue between Galatea and the player when asking about topics in the previous conversation. Two commands, "think about" and "recap", are provided to keep track of what has already been said; the former is also used to advance the storyline, as the player character draws conclusions about the story as it has unfolded to that point. The game also encourages using sensory commands ("touch", "listen to", "look at"), adding immersion to the experience. == Plot == Galatea is loosely based on the myth of Pygmalion, who carved the sculpture of a woman. In the myth, he falls in love with the statue, named Galatea or Elise in different versions, and the goddess Venus brings her to life. The story begins at the opening of an exhibition of artificial intelligences. The player, alone, discovers Galatea displayed on a pedestal with a small information placard. She is illuminated by a spotlight and wears an emerald dress. Seeing the player about to turn away, Galatea says, "They told me you were coming." From this point, the story may proceed in a number of ways depending on the player's words and actions. === Multilinear interactive fiction === Short describes this as "multilinear interactive fiction": while interactive fiction in general allows the player to find their own way through the story, this leads in most cases to a single ending (or at least a single desired 'correct' ending). With Galatea, Short presents a story with around 70 different endings and hundreds of possible ways of reaching them. The plot is thus designed to appear open-ended with the development of the story entirely dependent on what the player decides to talk or ask about or what actions they choose to perform. Thus the original author and the player share in the creation of a work of fiction. == Development == In interviews, Emily Short has explained that Galatea arose out of her efforts to develop advanced dialog coding for interactive fiction engines. Although code for simple conversational programs like ELIZA have existed since the 1960s, and limited dialog options have existed in interactive fiction since the 1970s, Short's efforts to develop chatterbot-like dialog required her to produce a simple test case scenario to test NPC interaction. Thus the single-room, single-occupant Galatea was a natural result. Development of the game progressed organically with Short engaging in test runs and drafting new dialog options for every conversational dead-end that arose. The game's multiple endings also arose in a similar fashion although Short had intended that there be multiple endings from the start. Although the nature of the game's development as well as its minimalist final form has led to questions regarding whether it is really a game and not just an experimental conversational program, Short has suggested that to her the definition of interactive fiction requires nothing more than a world model and a parser, and "anything you can cook up with those features counts as IF." Short has acknowledged the helpful influence of the close-knit IF community and the "atmosphere in which experimentation is valued" as leading to the success of her works like Galatea. == Reception == Galatea was well received, achieving critical acclaim from interactive fiction reviewers and literary scholars. The game is considered to aspire to a new level of art in interactive fiction, and thereby to have revolutionized the genre, establishing its author, Emily Short, as one of the key figures in the modern interactive fiction scene. Fellow award-winning IF author, Adam Cadre has called Galatea "the best NPC ever"—a view that was echoed by Joystiq's John Bardinelli. Cadre also describes the game as an example of an alternative kind of puzzle where "interactivity comes in deciding where to go, what to see, what to say. Rather than having to open gates along a path, you discover that they're all open at first, but stepping through one causes others to close." Galatea was described in 2007 by Indiegames.com as a "fascinating journey." In a 2009 article, Rock, Paper, Shotgun praised the depth and detail of the game, the complexities of the character design and its "masterful balance between intricacy and simplicity", and "Galatea's emotional turmoil" that is "encoded sweetly into the subtext of what's going on. By simply interacting in a logical manner, you learn more about this character than any cut-scene or info-dump could ever hope to convey." This was reiterated in a 2010 1UP.com article that listed Galatea as #2 in its "Top 5 Introductory Interactive Fiction Games" feature, describing it as intriguingly replayable, and as a "surprisingly rich game for its apparent minimalism". In 2011, PC Gamer highlighted Galatea as an example of the artistic and literary aspects of the interactive fiction genre. The titular character, Galatea, has been compared to the 2007 Portal character GLaDOS due to similarities in the personalities of the characters.

    Read more →
  • Function representation

    Function representation

    Function Representation (FRep or F-Rep) is used in solid modeling, volume modeling and computer graphics. FRep was introduced in "Function representation in geometric modeling: concepts, implementation and applications" as a uniform representation of multidimensional geometric objects (shapes). An object as a point set in multidimensional space is defined by a single continuous real-valued function f ( X ) {\displaystyle f(X)} of point coordinates X [ x 1 , x 2 , . . . , x n ] {\displaystyle X[x_{1},x_{2},...,x_{n}]} which is evaluated at the given point by a procedure traversing a tree structure with primitives in the leaves and operations in the nodes of the tree. The points with f ( x 1 , x 2 , . . . , x n ) ≥ 0 {\displaystyle f(x_{1},x_{2},...,x_{n})\geq 0} belong to the object, and the points with f ( x 1 , x 2 , . . . , x n ) < 0 {\displaystyle f(x_{1},x_{2},...,x_{n})<0} are outside of the object. The point set with f ( x 1 , x 2 , . . . , x n ) = 0 {\displaystyle f(x_{1},x_{2},...,x_{n})=0} is called an isosurface. == Geometric domain == The geometric domain of FRep in 3D space includes solids with non-manifold models and lower-dimensional entities (surfaces, curves, points) defined by zero value of the function. A primitive can be defined by an equation or by a "black box" procedure converting point coordinates into the function value. Solids bounded by algebraic surfaces, skeleton-based implicit surfaces, and convolution surfaces, as well as procedural objects (such as solid noise), and voxel objects can be used as primitives (leaves of the construction tree). In the case of a voxel object (discrete field), it should be converted to a continuous real function, for example, by applying the trilinear or higher-order interpolation. Many operations such as set-theoretic, blending, offsetting, projection, non-linear deformations, metamorphosis, sweeping, hypertexturing, and others, have been formulated for this representation in such a manner that they yield continuous real-valued functions as output, thus guaranteeing the closure property of the representation. R-functions originally introduced in V.L. Rvachev's "On the analytical description of some geometric objects", provide C k {\displaystyle C^{k}} continuity for the functions exactly defining the set-theoretic operations (min/max functions are a particular case). Because of this property, the result of any supported operation can be treated as the input for a subsequent operation; thus very complex models can be created in this way from a single functional expression. FRep modeling is supported by the special-purpose language HyperFun. == Shape Models == FRep combines and generalizes different shape models like algebraic surfaces skeleton based "implicit" surfaces set-theoretic solids or CSG (Constructive Solid Geometry) sweeps volumetric objects parametric models procedural models A more general "constructive hypervolume" allows for modeling multidimensional point sets with attributes (volume models in 3D case). Point set geometry and attributes have independent representations but are treated uniformly. A point set in a geometric space of an arbitrary dimension is an FRep based geometric model of a real object. An attribute that is also represented by a real-valued function (not necessarily continuous) is a mathematical model of an object property of an arbitrary nature (material, photometric, physical, medicine, etc.). The concept of "implicit complex" proposed in "Cellular-functional modeling of heterogeneous objects" provides a framework for including geometric elements of different dimensionality by combining polygonal, parametric, and FRep components into a single cellular-functional model of a heterogeneous object.

    Read more →
  • Stable Diffusion

    Stable Diffusion

    Stable Diffusion is a deep learning, text-to-image model released in 2022 based on diffusion techniques. The generative artificial intelligence technology is the premier product of Stability AI and is considered to be a part of the ongoing AI boom. It is primarily used to generate detailed images conditioned on text descriptions, though it can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by a text prompt. Its development involved researchers from the CompVis Group at LMU Munich and Runway with a computational donation from Stability and training data from non-profit organizations. Stable Diffusion is a latent diffusion model, a kind of deep generative artificial neural network. Its code and model weights have been released publicly, and an optimized version can run on most consumer hardware equipped with a modest GPU with as little as 2.4 GB VRAM. This marked a departure from previous proprietary text-to-image models such as DALL-E and Midjourney which were accessible only via cloud services. == Development == Stable Diffusion originated from a project called Latent Diffusion, developed in Germany by researchers at LMU Munich in Munich and Heidelberg University. Four of the original 5 authors (Robin Rombach, Andreas Blattmann, Patrick Esser and Dominik Lorenz) later joined Stability AI and released subsequent versions of Stable Diffusion. The technical license for the model was released by the CompVis group at LMU Munich. Development was led by Patrick Esser of Runway and Robin Rombach of CompVis, who were among the researchers who had earlier invented the latent diffusion model architecture used by Stable Diffusion. Stability AI also credited EleutherAI and LAION (a German nonprofit which assembled the dataset on which Stable Diffusion was trained) as supporters of the project. == Technology == === Architecture === Diffusion models, introduced in 2015, are trained with the objective of removing successive applications of Gaussian noise on training images, which can be thought of as a sequence of denoising autoencoders. The name diffusion is from the thermodynamic diffusion, since they were first developed with inspiration from thermodynamics. Models in Stable Diffusion series before SD 3 all used a variant of diffusion models, called latent diffusion model (LDM), developed in 2021 by the CompVis (Computer Vision & Learning) group at LMU Munich. Stable Diffusion consists of 3 parts: the variational autoencoder (VAE), U-Net, and an optional text encoder. The VAE encoder compresses the image from pixel space to a smaller dimensional latent space, capturing a more fundamental semantic meaning of the image. Gaussian noise is iteratively applied to the compressed latent representation during forward diffusion. The U-Net block, composed of a ResNet backbone, denoises the output from forward diffusion backwards to obtain a latent representation. Finally, the VAE decoder generates the final image by converting the representation back into pixel space. The denoising step can be flexibly conditioned on a string of text, an image, or another modality. The encoded conditioning data is exposed to denoising U-Nets via a cross-attention mechanism. For conditioning on text, the fixed, pretrained CLIP ViT-L/14 text encoder is used to transform text prompts to an embedding space. Researchers point to increased computational efficiency for training and generation as an advantage of LDMs. With 860 million parameters in the U-Net and 123 million in the text encoder, Stable Diffusion is considered relatively lightweight by 2022 standards, and unlike other diffusion models, it can run on consumer GPUs, and even CPU-only if using the OpenVINO version of Stable Diffusion. ==== SD XL ==== The XL version uses the same LDM architecture as previous versions, except larger: larger UNet backbone, larger cross-attention context, two text encoders instead of one, and trained on multiple aspect ratios (not just the square aspect ratio like previous versions). The SD XL Refiner, released at the same time, has the same architecture as SD XL, but it was trained for adding fine details to preexisting images via text-conditional img2img. ==== SD 3.0 ==== The 3.0 version completely changes the backbone. Not a UNet, but a Rectified Flow Transformer, which implements the rectified flow method with a Transformer. The Transformer architecture used for SD 3.0 has three "tracks", for original text encoding, transformed text encoding, and image encoding (in latent space). The transformed text encoding and image encoding are mixed during each transformer block. The architecture is named "multimodal diffusion transformer (MMDiT), where the "multimodal" means that it mixes text and image encodings inside its operations. This differs from previous versions of DiT, where the text encoding affects the image encoding, but not vice versa. === Training data === Stable Diffusion was trained on pairs of images and captions taken from LAION-5B, a publicly available dataset derived from Common Crawl data scraped from the web, where 5 billion image-text pairs were classified based on language and filtered into separate datasets by resolution, a predicted likelihood of containing a watermark, and predicted "aesthetic" score (e.g. subjective visual quality). The dataset was created by LAION, a German non-profit which receives funding from Stability AI. The Stable Diffusion model was trained on three subsets of LAION-5B: laion2B-en, laion-high-resolution, and laion-aesthetics v2 5+. A third-party analysis of the model's training data identified that out of a smaller subset of 12 million images taken from the original wider dataset used, approximately 47% of the sample size of images came from 100 different domains, with Pinterest taking up 8.5% of the subset, followed by websites such as WordPress, Blogspot, Flickr, DeviantArt and Wikimedia Commons. An investigation by Bayerischer Rundfunk showed that LAION's datasets, hosted on Hugging Face, contain large amounts of private and sensitive data. === Training procedures === The model was initially trained on the laion2B-en and laion-high-resolution subsets, with the last few rounds of training done on LAION-Aesthetics v2 5+, a subset of 600 million captioned images which the LAION-Aesthetics Predictor V2 predicted that humans would, on average, give a score of at least 5 out of 10 when asked to rate how much they liked them. The LAION-Aesthetics v2 5+ subset also excluded low-resolution images and images which LAION-5B-WatermarkDetection identified as carrying a watermark with greater than 80% probability. Final rounds of training additionally dropped 10% of text conditioning to improve Classifier-Free Diffusion Guidance. The model was trained using 256 Nvidia A100 GPUs on Amazon Web Services for a total of 150,000 GPU-hours, at a cost of $600,000. === Limitations === Stable Diffusion has issues with degradation and inaccuracies in certain scenarios. Initial releases of the model were trained on a dataset that consists of 512×512 resolution images, meaning that the quality of generated images noticeably degrades when user specifications deviate from its "expected" 512×512 resolution; the version 2.0 update of the Stable Diffusion model later introduced the ability to natively generate images at 768×768 resolution. Another challenge is in generating human limbs due to poor data quality of limbs in the LAION database. The model is insufficiently trained to replicate human limbs and faces due to the lack of representative features in the database, and prompting the model to generate images of such type can confound the model. In addition to human limbs, Stable Diffusion is unable to generate legible ambigrams and some other forms of text and typography. Stable Diffusion XL (SDXL) version 1.0, released in July 2023, introduced native 1024x1024 resolution and improved generation for limbs and text. Accessibility for individual developers can also be a problem. In order to customize the model for new use cases that are not included in the dataset, such as generating anime characters ("waifu diffusion"), new data and further training are required. Fine-tuned adaptations of Stable Diffusion created through additional retraining have been used for a variety of different use-cases, from medical imaging to algorithmically generated music. However, this fine-tuning process is sensitive to the quality of new data; low resolution images or different resolutions from the original data can not only fail to learn the new task but degrade the overall performance of the model. Even when the model is additionally trained on high quality images, it is difficult for individuals to run models in consumer electronics. For example, the training process for waifu-diffusion requires a minimum 30 GB of VRAM, which exceeds the usual resource provided in such consumer GPUs as Nvidia's GeForce 30 series, w

    Read more →
  • Ghost in the Shell

    Ghost in the Shell

    Ghost in the Shell is a Japanese cyberpunk military science fiction media franchise that began with the eponymous manga series, written and illustrated by Masamune Shirow. The manga, first serialized from 1989 to 1991, is set in the mid-21st-century and follows the fictional counter-cyberterrorist organization Public Security Section 9, led by protagonist Major Motoko Kusanagi. Animation studio Production I.G has produced several anime adaptations of the series. These include the 1995 film of the same name and its 2004 sequel, Ghost in the Shell 2: Innocence; the 2002 television series Ghost in the Shell: Stand Alone Complex and its 2020 follow-up, Ghost in the Shell: SAC_2045; and the Ghost in the Shell: Arise original video animation series. In addition, an American-produced live-action film was released in March 2017. == Overview == === Title === The original editor Koichi Yuri says: At first, Ghost in the Shell came from Shirow, but when Yuri asked for "something more flashy", Shirow came up with "攻殻機動隊 Koukaku Kidou Tai (Shell Squad)" for Yuri. But Shirow was attached to including "Ghost in the Shell" as well even if in smaller type. === Setting === Primarily set in the mid-twenty-first century in the fictional Japanese city of Niihama, Niihama Prefecture (新浜県新浜市, Niihama-ken Niihama-shi), otherwise known as New Port City (ニューポートシティ, Nyū Pōto Shiti), the manga and the many anime adaptations follow the members of Public Security Section 9, a task-force consisting of various professionals skilled at solving and preventing crime, mostly with some sort of police background. Political intrigue and counter-terrorism operations are standard fare for Section 9, but the various actions of corrupt officials, companies, and cyber-criminals in each scenario are unique and require the diverse skills of Section 9's staff to prevent a series of incidents from escalating. In this post-cyberpunk iteration of a possible future, computer technology has advanced to the point that many members of the public possess cyberbrains, technology that allows them to interface their biological brain with various networks. The level of cyberization varies from simple minimal interfaces to almost complete replacement of the brain with cybernetic parts, in cases of severe trauma. This can also be combined with various levels of prostheses, with a fully prosthetic body enabling a person to become a cyborg. The main character of Ghost in the Shell, Major Motoko Kusanagi, is such a cyborg, having had a terrible accident befall her as a child that ultimately required her to use a full-body prosthesis to house her cyberbrain. This high level of cyberization, however, opens the brain up to attacks from highly skilled hackers, with the most dangerous being those who will hack a person to bend to their whims. == Media == === Literature === ==== Original manga ==== The original Ghost in the Shell manga ran in Japan from April 1989 to November 1990 in Kodansha's manga anthology Young Magazine, and was released in a tankōbon volume on October 2, 1991. Ghost in the Shell 2: Man-Machine Interface followed in 1997 for nine issues in Young Magazine, and was collected in the Ghost in the Shell: Solid Box on December 1, 2000. Then a standard version with modifications and new pages was published on June 26, 2001. Four stories from Man-Machine Interface that were not released in tankobon format from previous releases were later collected in Ghost in the Shell 1.5: Human-Error Processor, and published by Kodansha on July 17, 2003. Several art books have also been published for the manga. === Films === ==== Animated films ==== Two animated films based on the original manga have been released, both directed by Mamoru Oshii and animated by Production I.G. Ghost in the Shell was released in 1995 and follows the "Puppet Master" storyline from the manga. It was re-released in 2008 as Ghost in the Shell 2.0 with new audio and updated 3D computer graphics in certain scenes. Innocence, otherwise known as Ghost in the Shell 2: Innocence, was released in 2004, with its story based on a chapter from the first manga. ==== Live-action film ==== In 2008, DreamWorks and producer Steven Spielberg acquired the rights to a live-action film adaptation of the original Ghost in the Shell manga. On January 24, 2014, Rupert Sanders was announced as director, with a screenplay by William Wheeler. In April 2016, the full cast was announced, which included Juliette Binoche, Chin Han, Lasarus Ratuere and Kaori Momoi, and Scarlett Johansson in the lead role; the casting of Johansson drew accusations of whitewashing. Principal photography on the film began on location in Wellington, New Zealand, on February 1, 2016. Filming wrapped in June 2016. Ghost in the Shell premiered in Tokyo on March 16, 2017, and was released in the United States on March 31, 2017, in 2D, 3D and IMAX 3D. It received mixed reviews, with praise for its visuals and Johansson's performance but criticism for its script. === Television === ==== Stand Alone Complex TV series, film and ONA ==== In 2002, Ghost in the Shell: Stand Alone Complex premiered on Animax, presenting a new telling of Ghost in the Shell independent from the original manga, focusing on Section 9's investigation of the Laughing Man hacker. It was followed in 2004 by a second season titled Ghost in the Shell: S.A.C. 2nd GIG, which focused on the Individual Eleven terrorist group. The primary storylines of both seasons were compressed into OVAs broadcast as Ghost in the Shell: Stand Alone Complex The Laughing Man in 2005 and Ghost in the Shell: Stand Alone Complex Individual Eleven in 2006. Also in 2006, Ghost in the Shell: Stand Alone Complex - Solid State Society, featuring Section 9's confrontation with a hacker known as the Puppeteer, was broadcast, serving as a finale to the anime series. The extensive score for the series and its films was composed by Yoko Kanno. On April 7, 2017, Kodansha and Production I.G announced that Kenji Kamiyama and Shinji Aramaki would be co-directing a new Kōkaku Kidōtai anime production. On December 7, 2018, it was reported by Netflix that they had acquired the worldwide streaming rights to the original net animation (ONA) anime series, titled Ghost in the Shell: SAC_2045, and that it would premiere on April 23, 2020. The series is in 3DCG and Sola Digital Arts collaborated with Production I.G on the project. Ilya Kuvshinov handled character designs. The series had two seasons of 12 episodes each. In addition to the anime, a series of published books, two separate manga adaptations, and several video games for consoles and mobile phones have been released for Stand Alone Complex. ==== Arise OVA, TV series and film ==== In 2013, a new iteration of the series titled Ghost in the Shell: Arise premiered, taking an original look at the Ghost in the Shell world, set before the original manga. It was released as a series of four original video animation (OVA) episodes (with limited theatrical releases) from 2013 to 2014, then recompiled as a 10-episode television series under the title of Kōkaku Kidōtai: Arise - Alternative Architecture. An additional fifth OVA titled Pyrophoric Cult, originally premiering in the Alternative Architecture broadcast as two original episodes, was released on August 26, 2015. Kazuchika Kise served as the chief director of the series, with Tow Ubukata as head writer. Cornelius was brought onto the project to compose the score for the series, with the Major's new voice actress Maaya Sakamoto also providing vocals for certain tracks. Ghost in the Shell: The New Movie, also known as Ghost in the Shell: Arise − The Movie or New Ghost in the Shell, is a 2015 film directed by Kazuya Nomura that serves as a finale to the Ghost in the Shell: Arise story arc. The film is a continuation to the plot of the Pyrophoric Cult episode of Arise, and ties up loose ends from that arc. A manga adaptation was serialized in Kodansha's Young Magazine, which started on March 13 and ended on August 26, 2013. ==== 2026 anime ==== On May 25, 2024, it was announced that a new anime television series adaptation will be produced by Science Saru for a July 2026 premiere. Saru will be in a production committee with Bandai Namco Filmworks, Kodansha and Production I.G. The series will be directed by Monkochan, with a script by EnJoe Toh. === Video games === Ghost in the Shell was developed by Exact and released for the PlayStation on July 17, 1997, in Japan by Sony Computer Entertainment. It is a third-person shooter featuring an original storyline where the character plays a rookie member of Section 9. The video game's soundtrack Megatech Body features various techno artists, such as Takkyu Ishino, Scan X and Mijk Van Dijk. Several video games were also developed to tie into the Stand Alone Complex television series, in addition to a first-person shooter by Nexon and Neople titled Ghost in the Shell: Stand Alone Complex - First Assault Online,

    Read more →