AI Avatar Tools

AI Avatar Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Digital art

    Digital art

    Digital art, or the digital arts, is artistic work that uses digital technology as part of the creative or presentational process. It can also refer to computational art that uses and engages with digital media. Since the 1960s, various names have been used to describe digital art, including computer art, electronic art, multimedia art, and new media art. Digital art includes pieces stored on physical media, such as with digital painting, as well as digital galleries on websites. Digital art also extends to the field of visual computing. == History == In the early 1960s, John Whitney developed the first computer-generated art using mathematical operations. In 1963, Ivan Sutherland invented the first user interactive computer-graphics interface known as Sketchpad. Between 1974 and 1977, Salvador Dalí created two big canvases of Gala Contemplating the Mediterranean Sea which at a distance of 20 meters is transformed into the portrait of Abraham Lincoln (Homage to Rothko) and prints of Lincoln in Dalivision based on a portrait of Abraham Lincoln processed on a computer by Leon Harmon published in "The Recognition of Faces". The technique is similar to what later became known as photographic mosaics. Andy Warhol created digital art using an Amiga where the computer was publicly introduced at the Lincoln Center in July 1985. An image of Debbie Harry was captured in monochrome from a video camera and digitized into a graphics program called ProPaint. Warhol manipulated the image by adding color using flood fills. == Art made for digital media == Artwork that is highly computational, presented through digital media, and explicitly engages with digital technologies are categorized as "art made for digital media". This differs from art using digital tools, which incorporate digital technology in the creation process but may exist outside the digital world. Digital art historian Christiane Paul writes that it "is highly problematic to classify all art that makes use of digital technologies somewhere in its production and dissemination process as digital art since it makes it almost impossible to arrive at any unifying statement about the art form". == Art that uses digital tools == Digital art can be purely computer-generated (such as fractals and algorithmic art) or taken from other sources, such as a scanned photograph or an image drawn using vector graphics software using a mouse or graphics tablet. Artworks are considered digital paintings when created similarly to non-digital paintings but using software on a computer platform and digitally outputting the resulting image as painted on canvas. Despite differing viewpoints on digital technology's impact on the arts, a consensus exists within the digital art community about its significant contribution to expanding the creative domain, i.e., that it has greatly broadened the creative opportunities available to professional and non-professional artists alike. == Art theorists and art historians == Notable art theorists and historians in this field include: Oliver Grau, Jon Ippolito, Christiane Paul, Frank Popper, Jasia Reichardt, Mario Costa, Christine Buci-Glucksmann, Dominique Moulon, Roy Ascott, Catherine Perret, Margot Lovejoy, Edmond Couchot, Tina Rivers Ryan, Fred Forest and Edward A. Shanken. === Digital painting === Digital painting is either a physical painting made with the use of digital electronics and spray paint robotics within the digital art fine art context or pictorial art imagery made with pixels on a computer screen that mimics artworks from the traditional histories of painting and illustration. === Artificial intelligence art === Artists have used artificial intelligence to create artwork since at least the 1960s. Since their design in 2014, some artists have created artwork using a generative adversarial network (GAN), which is a machine learning framework that allows two "algorithms" to compete with each other and iterate. It can be used to generate pictures that have visual effects similar to traditional fine art. The essential idea of image generators is that people can use text descriptions to let AI convert their text into visual picture content. Anyone can turn their language into a painting through a picture generator. == Digital art education == Digital art education has become more common with the advancement of digital hardware and software. From hardware such as graphics tablets, styluses, tablets, 3D scanners, virtual reality headsets, and digital cameras; to software such as digital art software, 3D modeling software, 3D rendering, digital sculpting, 2D graphics software, digital painting, 3D terrain generation, 2D animation software, 3D animation software, raster graphics editors, vector graphics editors, mathematical art software, and video editing software. == Scholarship and archives == In addition to the creation of original art, research methods that utilize AI have been generated to quantitatively analyze digital art collections. This has been made possible due to the large-scale digitization of artwork in the past few decades. Although the main goal of digitization was to allow for accessibility and exploration of these collections, the use of AI in analyzing them has brought about new research perspectives. Two computational methods, close reading and distant viewing, are the typical approaches used to analyze digitized art. Close reading focuses on specific visual aspects of one piece. Some tasks performed by machines in close reading methods include computational artist authentication and analysis of brushstrokes or texture properties. In contrast, through distant viewing methods, the similarity across an entire collection for a specific feature can be statistically visualized. Common tasks relating to this method include automatic classification, object detection, multimodal tasks, knowledge discovery in art history, and computational aesthetics. Whereas distant viewing includes the analysis of large collections, close reading involves one piece of artwork. Whilst 2D and 3D digital art is beneficial as it allows the preservation of history that would otherwise have been destroyed by events like natural disasters and war, there is the issue of who should own these 3D scans – i.e., who should own the digital copyrights. === Computer demos === Computer demos are based on computer programs, usually non-interactive. It produces audiovisual presentations. They are a novel form of art, which emerged as a consequence of the home computer revolution in the early 1980s. In the classification of digital art, they can be best described as real-time procedurally generated animated audio-visuals. This form of art does not concentrate only on the aesthetics of the final presentation, but also on the complexities and skills involved in creating the presentation. As such, it can be fully enjoyed only by persons with a relatively high knowledge level of relevant computer technologies. An example is that, as said by Hua Jin and Jie Yang, Using computer-aided design software to present the class content in art design teaching," is not to advocate computer-aided design instead of hand-drawn performance, but to make it serve the profession earlier through a more reasonable course arrangement." On the other hand, many of the created pieces of art are primarily aesthetic or amusing, and those can be enjoyed by the general public. === Digital installation art === Digital installation art constitutes a broad field of artistic practices and a variety of forms. Some resemble video installations, especially large-scale works involving projections and live video capture. By using projection techniques that enhance an audience's impression of sensory envelopment, many digital installations attempt to create immersive environments. While others go even further and attempt to facilitate a complete immersion in virtual realms. This type of installation is generally site-specific, scalable, and without fixed dimensionality, meaning it can be reconfigured to accommodate different presentation spaces. Scott Snibbe's "Boundary Functions" is an example of augmented reality digital installation art, which responds to people who enter the installation by drawing lines between people, indicating their personal space.Noah Wardrip-Fruin's "Screen"(2003) utilizes a Cave Automatic Virtual Environment (CAVE) to create an interactive, text-based digital experience that engages the viewer in a multi-sensory interaction. === Internet art and net.art === Internet art is digital art that uses the specific characteristics of the Internet and is exhibited on the Internet. The term "internet art" is included by "net art" for which artists assume that network will be refreshed through history. So the term "post-internet art" is used to exclude artworks outside of the internet media. A representative example is Protocols for Achievements, which is a digital photo frame that confronts the aestheti

    Read more →
  • ICAART

    ICAART

    The International Conference on Agents and Artificial Intelligence (ICAART) is a meeting point for researchers (among others) with interest in the areas of Agents and Artificial Intelligence. There are 2 tracks in ICAART, one related to Agents and Distributed AI in general and the other one focused in topics related to Intelligent Systems and Computational Intelligence. The conference program is composed of several different kind of sessions like technical sessions, poster sessions, keynote lectures, tutorials, special sessions, doctoral consortiums, panels and industrial tracks. The papers presented in the conference are made available at the SCITEPRESS digital library, published in the conference proceedings and some of the best papers are invited to a post-publication with Springer. ICAART's first edition was in 2009 counting with several keynote speakers like Marco Dorigo, Edward H. Shortliffe and Eduard Hovy. Since then, the conference had several other invited speakers like Katia Sycara, Nick Jennings, Robert Kowalski, Boi Faltings and Tim Finin. Bart Selman is one of the names confirmed for the next edition of this conference. Since 2012 the conference is held in conjunction with 2 other conferences: the International Conference on Operations Research and Enterprise Systems (ICORES) and the International Conference on Pattern Recognition Applications and Methods (ICPRAM). == Areas == === Agents === Agent communication languages Cooperation and Coordination Distributed Problem Solving Economic Agent Models Emotional Intelligence Group Decision Making Intelligent Auctions and Markets Mobile Agents Multi-agent systems Negotiation and Interaction Protocols Nep News Detection Agent Models and Architectures Physical Agents at Work Privacy, Safety and Security Programming Environments and Languages Robot and Multi-Robot Systems Self Organizing Systems Semantic Web Simulation Swarm Intelligence Task Planning and Execution Transparency and Ethical Issues Agent-Oriented Software Engineering Web Intelligence Agent Platforms and Interoperability Autonomous systems Cloud Computing and Its Impact Cognitive robotics Collective Intelligence Conversational Agents === Artificial intelligence === AI and Creativity Deep Learning Evolutionary Computing Fuzzy Systems Hybrid Intelligent Systems Industrial Applications of AI Intelligence and Cybersecurity Intelligent User Interfaces Knowledge Representation and Reasoning Knowledge-Based Systems Ambient Intelligence Machine learning Model-Based Reasoning Natural Language Processing Neural Networks Ontologies Planning and Scheduling Social Network Analysis Soft Computing State Space Search Bayesian Networks Uncertainty in AI Vision and Perception Visualization Big Data Case-Based Reasoning Cognitive Systems Constraint Satisfaction Data Mining Data Science == Editions == === ICAART 2023 – Lisbon, Portugal === === ICAART 2020 – Valletta, Malta === === ICAART 2019 – Prague, Czech Republic === Proceedings - Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-350-6 Proceedings - Proceedings of the 11th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-350-6 === ICAART 2018 – Funchal, Madeira, Portugal === Proceedings - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-275-2 Proceedings - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-275-2 === ICAART 2017 – Porto, Portugal === Proceedings - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-219-6 Proceedings - Proceedings of the 9th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-220-2 === ICAART 2016 – Rome, Italy === Proceedings - Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-172-4 Proceedings - Proceedings of the 8th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-172-4 === ICAART 2015 – Lisbon, Portugal === Proceedings - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-073-4 Proceedings - Proceedings of the 7th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-074-1 === ICAART 2014 – ESEO, Angers, Loire Valley, France === Proceedings - Proceedings of the 6th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-758-015-4 Proceedings - Proceedings of the 6th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-758-016-1 === ICAART 2013 – Barcelona, Spain === Proceedings - Proceedings of the 5th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8565-38-9 Proceedings - Proceedings of the 5th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8565-39-6 === ICAART 2012 – Vilamoura, Algarve, Portugal === Proceedings - Proceedings of the 4th International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8425-95-9 Proceedings - Proceedings of the 4th International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8425-96-6 === ICAART 2011 – Rome, Italy === Proceedings - Proceedings of the 3rd International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-8425-40-9 Proceedings - Proceedings of the 3rd International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-8425-41-6 === ICAART 2010 – Valencia, Spain === Proceedings - Proceedings of the 2nd International Conference on Web Information Systems and Technologies - Volume 1. ISBN 978-989-674-021-4 Proceedings - Proceedings of the 2nd International Conference on Web Information Systems and Technologies - Volume 2. ISBN 978-989-674-022-1 === ICAART 2009 – Porto, Portugal === Proceedings - Proceedings of the 1st International Conference on Web Information Systems and Technologies. ISBN 978-989-8111-66-1

    Read more →
  • With Folded Hands ...

    With Folded Hands ...

    "With Folded Hands ..." is a 1947 science fiction novelette by American writer Jack Williamson (1908–2006). In writing it, Williamson was influenced by the aftermath of World War II, the atomic bombings of Hiroshima and Nagasaki, and his concern that "some of the technological creations we had developed with the best intentions might have disastrous consequences in the long run." The novelette first appeared in the July 1947 issue of Astounding Science Fiction and was later included in The Science Fiction Hall of Fame, Volume Two (1973) after being voted one of the best novellas up to 1965. In 1950, it was the first of several Astounding stories adapted for NBC's radio series Dimension X. == Rewrite and sequel == The 1947 publication was followed by a novel-length rewrite, with a different setting and inventor. At the behest of Astounding editor-in-chief John W. Campbell, a new ending had the robots defeated by means of what Williamson and Campbell would later christen "psionics". This novel was serialized, also in Astounding (March, April, May 1948), as ... And Searching Mind, and finally published in hardback book form as The Humanoids (1949). Much later, in 1980, Williamson followed with another sequel, The Humanoid Touch. == Plot summary == Underhill, a seller of "Mechanicals" (unthinking robots that perform menial tasks) in the small town of Two Rivers, is startled to find a competitor's store on his way home. The competitors are not humans but are small black robots who appear more advanced than anything Underhill has encountered before. They describe themselves as "humanoids". Disturbed at his encounter, Underhill rushes home to discover that his wife has taken in a new lodger, a mysterious old man named Sledge. In the course of the next day, the new Mechanicals have appeared everywhere in town. They state that they only follow the Prime Directive: "to serve and obey and guard men from harm". Offering their services free of charge, they replace humans as police officers, bank tellers, and more, and eventually drive Underhill out of business. Despite the humanoids' benign appearance and mission, Underhill soon realizes that, in the name of their Prime Directive, the mechanicals have essentially taken over every aspect of human life. No humans may engage in any behavior that might endanger them, and every human action is carefully scrutinized. Suicide is prohibited. Humans who resist the Prime Directive are taken away and lobotomized, so that they may live happily under the direction of the humanoids. Underhill learns that his lodger Sledge is the creator of the humanoids and is on the run from them. Sledge explains that 60 years earlier he had discovered the force of "rhodomagnetics" on the planet Wing IV and that his discovery resulted in a war that destroyed his planet. In his grief, Sledge designed the humanoids to help humanity and be invulnerable to human exploitation. However, he eventually realized that they had instead taken control of humanity, in the name of their Prime Directive, to make humans happy. The humanoids are spreading out from Wing IV to every human-occupied planet to implement their Prime Directive. Sledge and Underhill attempt to stop the humanoids by aiming a rhodomagnetic beam at Wing IV, but fail. The humanoids take Sledge away for surgery. He returns with no memory of his prior life, stating that he is now happy under the humanoids' care. Underhill is driven home by the humanoids, sitting "with folded hands," as there is nothing left to do. == Origins == In a 1991 interview, Williamson revealed how the story construction reflected events of his childhood in addition to technological extrapolations: I wrote "With Folded Hands" immediately after World War II, when the shadow of the atomic bomb had just fallen over SF and was just beginning to haunt the imaginations of people in the US. The story grows out of that general feeling that some of the technological creations we had developed with the best intentions might have disastrous consequences in the long run (that idea, of course, still seems relevant today). The notion I was consciously working on specifically came out of a fragment of a story I had worked on for a while about an astronaut in space who is accompanied by a robot obviously superior to him physically—i.e., the robot wasn't hurt by gravity, extremes of temperature, radiation, or whatever. Just looking at the fragment gave me the sense of how inferior humanity is in many ways to mechanical creations. That basic recognition was the essence of the story, and as I wrote it up in my notes the theme was that the perfect machine would prove to be perfectly destructive... It was only when I looked back at the story much later on that I was able to realize that the emotional reach of the story undoubtedly derived from my own early childhood, when people were attempting to protect me from all those hazardous things a kid is going to encounter in the isolated frontier setting I grew up in. As a result, I felt frustrated and over protected by people whom I couldn't hate because I loved them. A sort of psychological trap. Specifically, the first three years of my life were spent on a ranch at the top of the Sierra Madre Mountains on the headwaters of the Yaqui River in Sonora, Mexico. ... [My mother] was terrified by this environment. My father built a crib that became a psychological prison for me, particularly because my mother apparently kept me in it too long, when I needed to get out and crawl on the floor. ... In retrospect, I'm certain I projected my fears and suspicions of this kind of conditioning, and these projections became the governing emotional principle of "With Folded Hands" and The Humanoids. == Reception == In 2024, Robert Silverberg wrote an essay in which he asserted that "With Folded Hands..." is "probably the best story ever written about robots" and suggested that Elon Musk's Optimus Generation 2 is the realization of the "humanoids" along with their worst drawbacks.

    Read more →
  • Adaptive neuro fuzzy inference system

    Adaptive neuro fuzzy inference system

    An adaptive neuro-fuzzy inference system or adaptive network-based fuzzy inference system (ANFIS) is a kind of artificial neural network that is based on Takagi–Sugeno fuzzy inference system, a class of fuzzy models introduced by Tomohiro Takagi and Michio Sugeno for system identification and control. The technique was developed in the early 1990s. Since it integrates both neural networks and fuzzy logic principles, it has potential to capture the benefits of both in a single framework. Its inference system corresponds to a set of fuzzy IF–THEN rules that have learning capability to approximate nonlinear functions. Hence, ANFIS is considered to be a universal estimator. For using the ANFIS in a more efficient and optimal way, one can use the best parameters obtained by genetic algorithm. It has uses in intelligent situational aware energy management system. == ANFIS architecture == It is possible to identify two parts in the network structure, namely premise and consequence parts. In more details, the architecture is composed by five layers. The first layer takes the input values and determines the membership functions belonging to them. It is commonly called fuzzification layer. The membership degrees of each function are computed by using the premise parameter set, namely {a,b,c}. The second layer is responsible of generating the firing strengths for the rules. Due to its task, the second layer is denoted as "rule layer". The role of the third layer is to normalize the computed firing strengths, by dividing each value for the total firing strength. The fourth layer takes as input the normalized values and the consequence parameter set {p,q,r}. The values returned by this layer are the defuzzificated ones and those values are passed to the last layer to return the final output. === Fuzzification layer === The first layer of an ANFIS network describes the difference to a vanilla neural network. Neural networks in general are operating with a data pre-processing step, in which the features are converted into normalized values between 0 and 1. An ANFIS neural network doesn't need a sigmoid function, but it's doing the preprocessing step by converting numeric values into fuzzy values. Here is an example: Suppose, the network gets as input the distance between two points in the 2d space. The distance is measured in pixels and it can have values from 0 up to 500 pixels. Converting the numerical values into fuzzy numbers is done with the membership function which consists of semantic descriptions like near, middle and far. Each possible linguistic value is given by an individual neuron. The neuron “near” fires with a value from 0 until 1, if the distance is located within the category "near". While the neuron “middle” fires, if the distance in that category. The input value “distance in pixels” is split into three different neurons for near, middle and far.

    Read more →
  • Information schema

    Information schema

    In relational databases, the information schema (information_schema) is an ANSI-standard set of read-only views that provide information about all of the tables, views, columns, and procedures in a database. It can be used as a source of the information that some databases make available through non-standard commands, such as: the SHOW command of MySQL the DESCRIBE command of Oracle's SQLPlus the \d command in psql (PostgreSQL's default command-line program). => SELECT count(table_name) FROM information_schema.tables; count ------- 99 (1 row) => SELECT column_name, data_type, column_default, is_nullable FROM information_schema.columns WHERE table_name='alpha'; column_name | data_type | column_default | is_nullable -------------+-----------+----------------+------------- foo | integer | | YES bar | character | | YES (2 rows) => SELECT FROM information_schema.information_schema_catalog_name; catalog_name -------------- johnd (1 row) == Implementation == As a notable exception among major database systems, Oracle does not as of 2015 implement the information schema. An open-source project exists to address this. RDBMSs that support information_schema include: Amazon Redshift Apache Hive Microsoft SQL Server MonetDB Snowflake MySQL PostgreSQL H2 Database HSQLDB InterSystems Caché MariaDB SingleStore (formerly MemSQL) Mimer SQL Snowflake Trino Presto CrateDB ClickHouse CockroachDB Kinetica DB TiDB RDBMSs that do not support information_schema include: Apache Derby Apache Ignite Firebird Microsoft Access IBM Informix Ingres IBM Db2 Oracle Database SAP HANA SQLite Sybase ASE Sybase SQL Anywhere Teradata Vertica

    Read more →
  • Gundam Build Metaverse

    Gundam Build Metaverse

    Gundam Build Metaverse (Japanese: ガンダムビルドメタバース, Hepburn: Gandamu Birudo Metabāzu) is a Japanese original net animation anime mini-series produced by Sunrise Beyond, and the fifth series within the Gundam Build Series sub-series. The series celebrates the 10th anniversary of the Gundam Build franchise, including characters from the previous installments. == Plot == The story is set in the same universe of the Gundam Build series in an online metaverse space where users can use avatars to move around and interact with other users, including conducting Gunpla (Gundam plastic model) battles with them. The story centers on Rio Hōjō, a boy who lives in Hawaii, and who learns how to build Gunpla from a local hobbyist named Seria Urutsuki. In the metaverse, a figure known as Mask Lady teaches him the art of Gunpla battling, and he strives to get better at it every day. With his custom Lah Gundam, he seeks out ever stronger opponents. == Characters == === Main characters === Rio Hojo (ホウジョウ・リオ, Hōjō Rio) Voiced by: Chika Anzai A young boy from Hawaii who is an enthusiast of Gunpla Battle and is an apprentice of the mysterious Diver "Mask Lady". Rio's Gunpla is the Lah Gundam, modeled after an entry-grade RX-78-2 Gundam, from the original Mobile Suit Gundam anime series. Seria Urutsuki (ウルツキ・セリア, Urutsuki Seria) / Mask Lady (マスクレディー, Masuku Reidi) Voiced by: Rio Tsuchiya A clerk at a local hobby shop and the instructor at their Gunpla class, Seria becomes Rio's Gunpla mentor using the alias "Mask Lady". Seria's Gunpla is the ZGMF-X20A-PF Gundam Perfect Strike Freedom Rouge, based on both the MBF-02 Strike Rouge and the GAT-X105+AQM/E-YM1 Perfect Strike Gundam from Mobile Suit Gundam Seed and the ZGMF-X20A Strike Freedom Gundam from Mobile Suit Gundam Seed Destiny. === Returning characters === Fumina Hoshino (ホシノ・フミナ, Hoshino Fumina) Voiced by: Yui Makino A veteran Gunpla Battler from the early days of the sport and the Leader of "Team Try Fighters", she works as an advertiser and announcer within the Metaverse realm. Tatsuya Yuuki (ユウキ・タツヤ, Yūki Tatsuya) / Meijin Kawaguchi III (三代目メイジン・カワグチ, Sandaime Meijin Kawaguchi) Voiced by: Takuya Satō A builder and three-times Gunpla Battle world champion who inherited the name of the legendary Meijin Kawaguchi, known as "Meijin Kawaguchi III", and still the current title holder. His newest Gunpla is the Gundam Amazing Barbatos Lupus based on the ASW-G-08 Gundam Barbatos Lupus from Mobile Suit Gundam: Iron-Blooded Orphans. Riku Mikami (ミカミ・リク, Mikami Riku) / Riku (リク) Voiced by: Yūsuke Kobayashi The Founder and former leader of the legendary force, "Build Divers". His Gunpla is the Gundam 00 Diver Arc, the latest version of the original GN-0000DVR Gundam 00 Diver from Gundam Build Divers, incorporating elements from the 00 Gundam from Mobile Suit Gundam 00 and the Gundam AGE-FX from Mobile Suit Gundam AGE. Sarah (サラ, Sara) Voiced by: Haruka Terui An EL-Diver and member of the Build Divers. Momoka Yashiro (ヤシロ・モモカ, Yashiro Momoka) / Momo (モモ) Voiced by: Nene Hieda Member of Build Divers. Her gunpla is the MOMOKAPOOL (R×R), an upgraded version of her PEN-01M Momokapool from Gundam Build Divers Aya Fujisawa (フジサワ・アヤ, Fujisawa Aya) / Ayame (アヤメ) Voiced by: Manami Numakura Member of Build Divers. Her Gunpla is the F-Kunoichi Kai, an SD Gunpla based on the F91 Gundam F91 from Mobile Suit Gundam F91. Sei Iori (イオリ・セイ, Iori Sei) Voiced by: Mikako Komatsu A builder and one time Gunpla Battle World Champion. His current Gunpla is the GAT-X105B/EG Build Strike Exceed Galaxy, the latest version of the original GAT-X105B Build Strike Gundam from Gundam Build Fighters. Aria von Reiji Asuna (アリーア・フォン・レイジ・アスナ, Arīa fon Reiji Asuna) Voiced by: Sachi Kokuryu A prince from the country called Arian that exists within a space colony in another dimension, who became friends with Sei Iori and together won the Gunpla Battle World Championship. He somehow manages to log into the metaverse to reunite with his friend, piloting the SB-011 Star Burning Gundam. Sekai Kamiki (カミキ・セカイ, Kamiki Sekai) Voiced by: Kazumi Togashi A veteran builder and former member of Team Try Fighters. He is currently the Japanese National representative Champion. In the series he develops a rivalry relationship with Hiroto similar to that of Kyoya and Rommel. His current Gunpla is the Shin Burning Gundam, the latest version of the original KMK-B01 Kamiki Burning Gundam from Gundam Build Fighters Try which is based on the Burning Gundam and Master Gundam. Hiroto Kuga (クガ・ヒロト, Kuga Hiroto) / Hiroto (ヒロト, Hiroto) Voiced by: Chiaki Kobayashi A veteran diver, the one responsible for discovering more EL-Divers, and a former member of the legendary force "Avalon", who later joined the unofficial, "BUILD DiVERS" and eventually became the current Force Leader, and as well as the current title holder of "Hero of Gunpla". In the third episode he is the only Build Diver member who participates in the tournament, while his fellow force-mates are in the audience routing for him and Rio. His Gunpla is the Plutine Gundam, which is a combination of his Core Gundam II Plus, upgraded from the Core Gundam II featured in Gundam Build Divers Re:Rise equipped with the Pluto Armor. Magee (マギー, Magī) Voiced by: Taishi Murata A flamboyant veteran Diver who owns a shop in the metaverse and is an acquaintance of Seria's. Freddie (フレディ, Furedi) Voiced by: Ai Kakuma An alien anthropomorphic dog boy from planet Eldora, a support member to both Build Diver teams, who manages to access the metaverse from his home planet along his fellow Eldorans. Ogre (オーガ, Ōga) Voiced by: Wataru Hatano Kyoya Kisugi (キスギ・キョウヤ, Kisugi Kyōya) / Kyoya Kujo (クジョウ・キョウヤ, Kujō Kyōya) Voiced by: Jun Kasama Leader of the legendary force "Avalon" and the reigning and current title holder of "World Champion". He along with Hiroto Kuga, Maria Urutsuki, and Tatsuya Yuuki are currently at the top of the entire gunpla world community. His current gunpla is an recolored version of his AGE-TRYMAG Gundam TRY AGE Magnum from Gundam Build Divers Re:Rise. Susumu Sazaki (サザキ・ススム, Sazaki Susumu) Voiced by: Ryo Hirohashi Kaoruko Sazaki (サザキ・カオルコ, Sazaki Kaoruko) Voiced by: Ryo Hirohashi Mahiru Shigure (シグレ・マヒル, Shigure Mahiru) Voiced by: Rinko Natsuhi Keiko Sano (サノ・ケイコ, Sano Keiko) Voiced by: Ami Naito === Others === Maria Urutsuki (ウルツキ・マリア, Urutsuki Maria) / Mascarilla (マスカリージャ, Masukarīja) Voiced by: Ai Kakuma A mysterious masked woman with a harsh rivalry with Seria and a similar avatar as hers, she is later revealed as Seria's younger sister Maria, who began to loathe her sister after she quit on their dream to fight for the title of Lady Kawaguchi. She later obtains the title, becoming "Lady Kawaguchi VII". Jeff (ジェフさん, Jefu-san) Voiced by: Kenta Miyake A distant relative of Seria and Maria's and owner of the hobby shop where Seria lives. Mellow Neige (メロウ・ネージュ, Merō Nēju) Voiced by: Chikano Ibuki A sentient A.I. who is the current publicity face of the Gunpla Metaverse. == Episodes ==

    Read more →
  • Google AI Studio

    Google AI Studio

    Google AI Studio is a web-based integrated development environment developed by Google for prototyping applications using generative AI models. Released in December 2023 alongside the Gemini API, the platform provides access to Google's Gemini family of models and related tools for image, video, and audio generation. The service targets both developers and non-technical users for testing prompts and generating code for the Gemini API. == History == Google launched AI Studio on December 13, 2023, as the successor to Google MakerSuite. MakerSuite, introduced at Google I/O in May 2023, had provided similar functionality for Google's PaLM language models. The AI Studio was launched alongside the public release of the Gemini API. == Features == AI Studio's interface consists of a central prompt area and a settings panel for model selection and parameter adjustment. The platform supports chat prompts for multi-turn conversations and includes system instructions for defining model behavior, tone, or specific rules. Users can employ zero-shot and few-shot prompting techniques to guide the model's output format. The platform processes various media types including video, audio, and documents, and can generate images through Imagen models, videos through Veo models, and audio through text-to-speech functionality. Additional tools include real-time streaming for screen sharing and live analysis, code execution in a sandboxed Python environment, grounding with Google Search for current information, URL context for analyzing specific web pages, and a thinking mode for complex reasoning tasks. == Available models == The platform provides access to several Google AI models including the Gemini language models, Imagen for image generation, Veo for video generation, LearnLM for educational applications, and Gemma, Google's open-source model family. == Privacy and data usage == Google AI Studio's data handling differs between free and paid users. For free tier users, Google uses submitted prompts, uploaded files, and generated responses to improve its products and services, with human reviewers potentially reading and annotating the data after disconnection from user accounts. Google advises against submitting sensitive information on the free tier. Users who enable Google Cloud Billing are considered paid service users, and their data is not used for product improvement. Data is processed according to Google's Data Processing Addendum and retained temporarily for abuse monitoring. == Availability == The platform is available at no cost, with API usage subject to a free tier with daily and per-minute rate limits. Access is restricted to users aged 18 and older in specific countries and territories. The service was initially unavailable in the United Kingdom and European Economic Area due to regulatory concerns, which drew user complaints. == Reception == Reviews have noted the platform's accessibility and integration with Gemini models, with features such as real-time screen sharing and large context windows cited as notable capabilities. However, reviewers have raised concerns about the privacy implications for free tier users, whose data is used for model training. Some users have reported inconsistent performance with features like screen streaming and issues with folder uploads for large datasets. The initial geographic restrictions were a point of criticism among developers in affected regions.

    Read more →
  • AI@50

    AI@50

    AI@50, formally known as the "Dartmouth Artificial Intelligence Conference: The Next Fifty Years" (July 13–15, 2006), was a conference organized by James H. Moor, commemorating the 50th anniversary of the Dartmouth workshop which effectively inaugurated the history of artificial intelligence. Five of the original ten attendees were present: Marvin Minsky, Ray Solomonoff, Oliver Selfridge, Trenchard More, and John McCarthy. While sponsored by Dartmouth College, General Electric, and the Frederick Whittemore Foundation, a $200,000 grant from the Defense Advanced Research Projects Agency (DARPA) called for a report of the proceedings that would: Analyze progress on AI's original challenges during the first 50 years, and assess whether the challenges were "easier" or "harder" than originally thought and why Document what the AI@50 participants believe are the major research and development challenges facing this field over the next 50 years, and identify what breakthroughs will be needed to meet those challenges Relate those challenges and breakthroughs against developments and trends in other areas such as control theory, signal processing, information theory, statistics, and optimization theory. A summary report by the conference director, James H. Moor, was published in AI Magazine. == Conference Program and links to published papers == James H. Moor, conference Director, Introduction Carol Folt and Barry Scherr, Welcome Carey Heckman, Tonypandy and the Origins of Science === AI: Past, Present, Future === John McCarthy, What Was Expected, What We Did, and AI Today Marvin Minsky, The Emotion Machine === The Future Model of Thinking === Ron Brachman and Hector Levesque, A Large Part of Human Thought David Mumford, What is the Right Model for 'Thought'? Stuart Russell, The Approach of Modern AI === The Future of Network Models === Geoffrey Hinton & Simon Osindero, From Pandemonium to Graphical Models and Back Again Rick Granger, From Brain Circuits to Mind Manufacture === The Future of Learning & Search === Oliver Selfridge, Learning and Education for Software: New Approaches in Machine Learning Ray Solomonoff, Machine Learning — Past and Future Leslie Pack Kaelbling, Learning to be Intelligent Peter Norvig, Web Search as a Product of and Catalyst for AI === The Future of AI === Rod Brooks, Intelligence and Bodies Nils Nilsson, Routes to the Summit Eric Horvitz, In Pursuit of Artificial Intelligence: Reflections on Challenges and Trajectories === The Future of Vision === Eric Grimson, Intelligent Medical Image Analysis: Computer Assisted Surgery and Disease Monitoring Takeo Kanade, Artificial Intelligence Vision: Progress and Non-Progress Terry Sejnowski, A Critique of Pure Vision === The Future of Reasoning === Alan Bundy, Constructing, Selecting and Repairing Representations of Knowledge Edwina Rissland, The Exquisite Centrality of Examples Bart Selman, The Challenge and Promise of Automated Reasoning === The Future of Language and Cognition === Trenchard More The Birth of Array Theory and Nial Eugene Charniak, Why Natural Language Processing is Now Statistical Natural Language Processing Pat Langley, Intelligent Behavior in Humans and Machines === The Future of the Future === Ray Kurzweil, Why We Can Be Confident of Turing Test Capability Within a Quarter Century George Cybenko, The Future Trajectory of AI Charles J. Holland, DARPA's Perspective === AI and Games === Jonathan Schaeffer, Games as a Test-bed for Artificial Intelligence Research Danny Kopec, Chess and AI Shay Bushinsky, Principle Positions in Deep Junior's Development === Future Interactions with Intelligent Machines === Daniela Rus, Making Bodies Smart Sherry Turkle, From Building Intelligences to Nurturing Sensibilities === Selected Submitted Papers: Future Strategies for AI === J. Storrs Hall, Self-improving AI: An Analysis Selmer Bringsjord, The Logicist Manifesto Vincent C. Müller, Is There a Future for AI Without Representation? Kristinn R. Thórisson, Integrated A.I. Systems === Selected Submitted Papers: Future Possibilities for AI === Eric Steinhart, Survival as a Digital Ghost Colin T. A. Schmidt, Did You Leave That 'Contraption' Alone With Your Little Sister? Michael Anderson & Susan Leigh Anderson, The Status of Machine Ethics Marcello Guarini, Computation, Coherence, and Ethical Reasoning

    Read more →
  • Production (computer science)

    Production (computer science)

    In computer science, a production or production rule is a rewrite rule that replaces some symbols with other symbols. A finite set of productions P {\displaystyle P} is the main component in the specification of a formal grammar (specifically a generative grammar). In such grammars, a set of productions is a special case of relation on the set of strings V ∗ {\displaystyle V^{}} (where ∗ {\displaystyle {}^{}} is the Kleene star operator) over a finite set of symbols V {\displaystyle V} called a vocabulary that defines which non-empty strings can be substituted with others. The set of productions is thus a special kind subset P ⊂ V ∗ × V ∗ {\displaystyle P\subset V^{}\times V^{}} and productions are then written in the form u → v {\displaystyle u\to v} to mean that ( u , v ) ∈ P {\displaystyle (u,v)\in P} (not to be confused with → {\displaystyle \to } being used as function notation, since there may be multiple rules for the same u {\displaystyle u} ). Given two subsets A , B ⊂ V ∗ {\displaystyle A,B\subset V^{}} , productions can be restricted to satisfy P ⊂ A × B {\displaystyle P\subset A\times B} , in which case productions are said "to be of the form A → B {\displaystyle A\to B} . Different choices and constructions of A , B {\displaystyle A,B} lead to different types of grammars. In general, any production of the form u → ϵ , {\displaystyle u\to \epsilon ,} where ϵ {\displaystyle \epsilon } is the empty string (sometimes also denoted λ {\displaystyle \lambda } ), is called an erasing rule, while productions that would produce strings out of nowhere, namely of the form ϵ → v , {\displaystyle \epsilon \to v,} are never allowed. In order to allow the production rules to create meaningful sentences, the vocabulary is partitioned into (disjoint) sets Σ {\displaystyle \Sigma } and N {\displaystyle N} providing two different roles: Σ {\displaystyle \Sigma } denotes the terminal symbols known as an alphabet containing the symbols allowed in a sentence; N {\displaystyle N} denotes nonterminal symbols, containing a distinguished start symbol S ∈ N {\displaystyle S\in N} , that are needed together with the production rules to define how to build the sentences. In the most general case of an unrestricted grammar, a production u → v {\displaystyle u\to v} , is allowed to map arbitrary strings u {\displaystyle u} and v {\displaystyle v} in V {\displaystyle V} (terminals and nonterminals), as long as u {\displaystyle u} is not empty. So unrestricted grammars have productions of the form V ∗ ∖ { ϵ } → V ∗ {\displaystyle V^{}\setminus \{\epsilon \}\to V^{}} or if we want to disallow changing finished sentences V ∗ N V ∗ = ( V ∗ ∖ Σ ∗ ) → V ∗ {\displaystyle V^{}NV^{}=(V^{}\setminus \Sigma ^{})\to V^{}} , where V ∗ N V ∗ {\displaystyle V^{}NV^{}} indicates concatenation and forces a non-terminal symbol to always be present on the left-hand side of the productions, and ∖ {\displaystyle \setminus } denotes set minus or set difference. If we do not allow the start symbol to occur in v {\displaystyle v} (the word on the right side), we have to replace V ∗ {\displaystyle V^{}} with ( V ∖ { S } ) ∗ {\displaystyle (V\setminus \{S\})^{}} on the right-hand side. The other types of formal grammar in the Chomsky hierarchy impose additional restrictions on what constitutes a production. Notably in a context-free grammar, the left-hand side of a production must be a single nonterminal symbol. So productions are of the form: N → V ∗ {\displaystyle N\to V^{}} == Grammar generation == To generate a string in the language, one begins with a string consisting of only a single start symbol, and then successively applies the rules (any number of times, in any order) to rewrite this string. This stops when a string containing only terminals is obtained. The language consists of all the strings that can be generated in this manner. Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language. If there are multiple different ways of generating this single string, then the grammar is said to be ambiguous. For example, assume the alphabet consists of a {\displaystyle a} and b {\displaystyle b} , with the start symbol S {\displaystyle S} , and we have the following rules: 1. S → a S b {\displaystyle S\rightarrow aSb} 2. S → b a {\displaystyle S\rightarrow ba} then we start with S {\displaystyle S} , and can choose a rule to apply to it. If we choose rule 1, we replace S {\displaystyle S} with a S b {\displaystyle aSb} and obtain the string a S b {\displaystyle aSb} . If we choose rule 1 again, we replace S {\displaystyle S} with a S b {\displaystyle aSb} and obtain the string a a S b b {\displaystyle aaSbb} . This process is repeated until we only have symbols from the alphabet (i.e., a {\displaystyle a} and b {\displaystyle b} ). If we now choose rule 2, we replace S {\displaystyle S} with b a {\displaystyle ba} and obtain the string a a b a b b {\displaystyle aababb} , and are done. We can write this series of choices more briefly, using symbols: S ⇒ a S b ⇒ a a S b b ⇒ a a b a b b {\displaystyle S\Rightarrow aSb\Rightarrow aaSbb\Rightarrow aababb} . The language of the grammar is the set of all the strings that can be generated using this process: { b a , a b a b , a a b a b b , a a a b a b b b , … } {\displaystyle \{ba,abab,aababb,aaababbb,\dotsc \}} .

    Read more →
  • Whisper (speech recognition system)

    Whisper (speech recognition system)

    Whisper is a machine learning model for speech recognition and transcription, created by OpenAI and first released as open-source software in September 2022. It is capable of transcribing speech in English and multiple other languages, and can translate several non-English languages into English. Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. OpenAI claims that the combination of different training data and post-training filtering used in its development has led to improved recognition of accents, background noise, and jargon compared to previous approaches. While the model does not outperform larger, more specialized models and still experiences AI hallucination, it has been showed to be useful for general sound recognition and has many applications across different industries. == Background == Speech recognition has had a long history in research; the first approaches made use of statistical methods, such as dynamic time warping, and later hidden Markov models. At around the 2010s, deep neural network approaches became more common for speech recognition models, which were enabled by the availability of large datasets ("big data") and increased computational performance. Early approaches to deep learning in speech recognition included convolutional neural networks, which were limited due to their inability to capture sequential data, which later led to developments of Seq2seq approaches, which include recurrent neural networks, which made use of long short-term memory. Transformers, introduced in 2017 by Google, displaced many prior state-of-the-art approaches across a wide range in machine learning, and started becoming the core neural architecture in fields such as language modeling and computer vision. Weakly-supervised approaches to training acoustic models were recognized in the early 2020s as promising for speech recognition approaches using deep neural networks. According to a NYT report, in 2021 OpenAI believed they exhausted sources of higher-quality data to train their large language models and decided to complement scraped web text with transcriptions of YouTube videos and podcasts, and developed Whisper to solve this task. Whisper Large V2 was released on December 8, 2022, followed by Whisper Large V3 being released in November 2023, during the OpenAI Dev Day. In March 2025, OpenAI released new transcription models based on GPT-4o and GPT-4o mini, both of which have lower error rates than Whisper. == Architecture == The Whisper architecture is based on an encoder-decoder transformer. Input audio is resampled to 16,000 Hertz (Hz) and converted to an 80-channel Log-magnitude Mel spectrogram using 25 ms windows with a 10 ms stride. The spectrogram is then normalized to a [-1, 1] range with near-zero mean. The encoder takes this Mel spectrogram as input and processes it. It first passes through two convolutional layers. Sinusoidal positional embeddings are added. It is then processed by a series of Transformer encoder blocks (with pre-activation residual connections). The encoder's output is layer normalized. The decoder is a standard transformer decoder. It has the same width and Transformer blocks as the encoder. It uses learned positional embeddings and tied input-output token representations (using the same weight matrix for both the input and output embeddings). It uses a byte-pair encoding tokenizer, of the same kind as used in GPT-2. English-only models use the GPT-2 vocabulary, while multilingual models employ a re-trained multilingual vocabulary with the same number of words. Special tokens are used to allow the decoder to perform multiple tasks: Tokens that denote language (one unique token per language). Tokens that specify task (<|transcribe|> or <|translate|>). Tokens that specify if no timestamps are present (<|notimestamps|>). If the token is not present, then the decoder predicts timestamps relative to the segment, and quantized to 20 ms intervals. <|nospeech|> for voice activity detection. <|startoftranscript|>, and <|endoftranscript|> . Any text that appears before <|startoftranscript|> is not generated by the decoder, but given to the decoder as context. Loss is only computed over non-contextual parts of the sequence, i.e. tokens between these two special tokens. == Training data == The training dataset consists of 680,000 hours of labeled audio-transcript pairs sourced from the internet using semi-supervised learning. This includes 117,000 hours in 96 non-English languages and 125,000 hours of X→English translation data, where X stands for any non-English language. Preprocessing involved standardization of transcripts, filtering to remove machine-generated transcripts using heuristics (e.g., punctuation, capitalization), language identification and matching with transcripts, fuzzy deduplication, and deduplication with evaluation datasets to avoid data contamination. Speechless segments were also included to allow voice activity detection training. For the files still remaining after the filtering process, audio files were then broken into 30-second segments paired with the subset of the transcript that occurs within that time. If this predicted spoken language differed from the language of the text transcript associated with the audio, that audio-transcript pair was not used for training the speech recognition models, but instead for training translation. The model was trained using the AdamW optimizer with gradient norm clipping and a linear learning rate decay with warmup, with batch size 256 segments. Training proceeded for 1 million updates (approximately 2-3 epochs). No data augmentation or regularization, except for the Large V2 model, which used SpecAugment, Stochastic Depth, and BPE Dropout. The training used data parallelism with float16, dynamic loss scaling, and activation checkpointing. === Post-training filtering === After training the first model, researchers ran it on different subsets of the training data, each representing a distinct source. Data sources were ranked by a combination of their error rate and size. Manual inspection of the top-ranked sources (high error, large size) helped determine if the source was low quality (e.g., partial transcriptions, inaccurate alignment). After training, it was fine-tuned to suppress the prediction of speaker names and low-quality sources were then removed. == Capacity == While Whisper does not outperform models which specialize in the LibriSpeech dataset, when tested across many datasets, it is more robust and makes 55.2% fewer errors than other models. Whisper has a differing error rate with respect to transcribing different languages, with a higher word error rate in languages not well-represented in the training data. The authors found that multi-task learning improved overall performance compared to models specialized to one task. They conjectured that the best Whisper model trained is still underfitting the dataset, and larger models and longer training can result in better models. Third-party evaluations have found varying levels of AI hallucination. A study of transcripts of public meetings found hallucinations in eight out of every 10 transcripts, while an engineer discovered hallucinations in "about half" of 100 hours of transcriptions and a developer identified them in "nearly every one" of 26,000 transcripts. A study of 13,140 short audio segments (averaging 10 seconds) found 187 hallucinations (1.4%), 38% of which generated text that could be harmful because it inserted false references to things like race, non-existent medications, or violent events that were not in the audio. == Applications == The model has been used as the base for many applications, such as a unified model for speech recognition and more general sound recognition. Whisper has also been integrated into the workflow of biomedical research. In 2025, a study on Alzheimer's disease detection used the model to transcribe spontaneous speech recordings. The transcripts that were generated by the model were combined with LLM vector embeddings and traditional classifiers to help classify the patients' health. Another application is when OVALYTICS incorporated Whisper to transcribe YouTube videos and automate content moderation systems, which improved its detection of offensive content. The model has also been used in academic libraries and cultral heritage institutions to generate transcripts and captions for their digitized audiovisual collections. In a 2025 case study, Emory University Libraries found that Whisper reduced the labor used in transcription by around 30-35%, shifting work from text creation to text correction. However, human review is still necessary to make sure accuracy, formatting, and accessibility are all standard.

    Read more →
  • Veo (text-to-video model)

    Veo (text-to-video model)

    Veo, or Google Veo, is a text-to-video model developed by Google DeepMind and announced in May 2024. As a generative AI model, it creates videos based on user prompts. Veo 3, released in May 2025, can also generate accompanying audio. == Development == In May 2024, a multimodal video generation model called Veo was announced at Google I/O 2024. Google claimed that it could generate 1080p videos over a minute long. In December 2024, Google released Veo 2, available via VideoFX. It supports 4K resolution video generation and has an improved understanding of physics. In April 2025, Google announced that Veo 2 became available for advanced users on the Gemini app. In May 2025, Google released Veo 3, which not only generates videos but also creates synchronized audio — including dialogue, sound effects, and ambient noise — to match the visuals. Google also announced Flow, a video-creation tool powered by Veo and Imagen. Google DeepMind CEO Demis Hassabis described the release as the moment when AI video generation left the era of the silent film. This was rebranded as Google Flow at the 2026 Google I/O keynote, along with the announcement of Google Flow Music. == Capabilities == Google Veo can be purchased at multiple subscription tiers and through Google "AI credits". The software itself can be run by two different consoles, Google Gemini and Google Flow. Gemini being geared towards shorter, quicker, and faster projects, using the Gemini AI chat model, with Google Flow, which is essentially a movie editor allowing users to create longer projects with continuity, using the same characters and actors. Users can create a maximum of eight seconds per clip. According to Gizmodo Veo 3 users were directing the model to generate low-quality content, such as man on the street interviews or haul videos of people unboxing products. 404 Media reported that the tool tended to repeat the same joke in response to different prompts. Commentators speculated that Google had trained the service on YouTube videos or Reddit posts. Google itself had not stated the source of its training content. In July 2025, Media Matters for America reported that racist and antisemitic videos generated using Veo 3 were being uploaded to TikTok. Ryan Whitwam of Ars Technica commented, "In a perfect world, Veo 3 would refuse to create these videos, but vagueness in the prompt and the AI's inability to understand the subtleties of racist tropes (i.e., the use of monkeys instead of humans in some videos) make it easy to skirt the rules."

    Read more →
  • Smart speaker

    Smart speaker

    A smart speaker is a type of loudspeaker and voice command device with an integrated virtual assistant that offers interactive actions and hands-free activation with the help of one "wake word" (or several "wake words"). Some smart speakers also act as smart home hubs by using Wi-Fi, Bluetooth, Thread, and other protocol standards to extend usage beyond audio playback and control home automation devices connected through a local area network. == History == Early voice-activated devices began in 2013 with MIT's Jasper project, which used multiple microphones and cloud software to power hands-free interactions from across a room. The first commercial smart speaker was the Amazon Echo, which was released in 2014 powered by Alexa and a ring of far-field microphones. Google followed in 2016 with Home, powered by Google Assistant. By 2017, devices like the Echo Show and Home Hub (later called Nest Hub) added touchscreens and video, creating the "smart display" subcategory. In 2018, Apple joined the smart speaker trend by launching the HomePod, which focused on high-quality audio alongside their built-in assistant Siri. ASUS release its own smart Speaker Xiao-Bu in 2019 with Artificial Intelligence, it terminates the Cloud Service on June 1st, 2025, which means all real-time service such as weather, news, currency conversion is affected. Sonos's 1st smart speaker Sonos One released in 2017, powered by Alexa. Invoke by Harman Kardon was powered by Microsoft's intelligent personal assistant, Cortana. In the early 2020s, smart speakers gained on-device voice processing for faster responses and improved privacy. New standards such as Matter and Thread allowed multitudes of smart-home devices (even from completely different brands) to work together. == Features == === Audio and Voice === Smart speakers use multiple microphones along with noise-cancelling software to pick up your voice from across the room, even when music is playing or the assistant is already talking. Noise suppression and echo cancellation is also used by the speaker so it can focus in on who is talking and ignore any background noises. Most smart speaker models can recognize who is speaking by voiceprint, which allows the speaker to grab information from that person's calendar, preferences, or music playlists. Listening to music on a speaker is when importance for good audio quality becomes apparent. Entry-level (cheaper) speakers such as the Home Mini or the Echo Dot have a single full-range driver. These lower-end speakers typically aren't great for listening to music as the audio quality is pretty poor. More advanced units such as the Home Max or Echo Studio have separate tweeters and woofers meant for listening to music in high quality. === Connectivity and smart-home control === Most connect over Wi-Fi or Bluetooth and support hub protocols like Thread and Matter. That lets them not only stream and play music but also allows you to control various brands of smart lights, thermostats, door locks, cameras, and much more-all from one point of control. Each can have its own designated interface and features in-house, usually launched or controlled via application or home automation software. These devices are able to communicate with each other via peer-to-peer connection through mesh networking. These speakers and related smart devices are typically controlled with one smartphone application. === Assistant services and skills === The built-in assistants handle timers, alarms, reminders, news briefings, weather updates, send messages to other smart devices, send texts, make calls, and simple questions. You can combine actions together in what are typically known as routines (for example saying "good morning" turns on lights, starts the coffee, says the weather, and reads the news) and add extra functions known as skills or actions (for things like ordering food or playing trivia games). This hands-free use of smart speakers can help assist those with disabilities. Most other technologies need the user to be able to physically interact with the device. Smart speakers are not bound by these limitations and can serve as an excellent tool for those who are unable to use their arms or legs or have vision issues. Although these tasks can be completed by a phone or computer, consumers tend to lean towards smart speakers due to factors such as their range being much greater than that of a phone and the need to not have to physically interact with the speaker to get the voice assistant as with most smartphones, certain parts of a phone may need to be interacted with to activate the speaking assistant. === Smart displays === Some smart speakers also include a screen to show the user a visual response. A smart speaker with a touchscreen is known as a smart display; these integrate a conversational user interface with display screens to augment voice interaction with images and video. They are powered by one of the common voice assistants and offer additional controls for smart home devices, feature streaming apps, and web browsers with touch controls for selecting content. The first smart displays were introduced in 2017 by Amazon (Amazon Echo Show) and Google (Google/Nest Home Hub). Hotel chain Marriott International partnered with Amazon to install Echo devices in select hotels since 2018. A Taiwanese startup, Aiello, launched the Aiello Voice Assistant (AVA) in the Asian hotel market in 2019, claiming it is powered by a multi-AI model system. Angie by Nomadix, which is similar to the Amazon Echo, launched its first product in 2017, specifically targeting hotel properties in the North America. In May 2019, Angie Hospitality acquired the assets of Roxy, a competitor that also built its own speech-enabled virtual assistant technology for hotels. This acquisition merged two proprietary NLP stacks into the current Nomadix product. === Artificial intelligence === The newest speakers can use on-device AI or cloud-based generative models to allow the smart speaker to carry on much more natural conversations, draft emails or recipes, suggest ideas based on context, or even create short pieces of music or art. This AI evolution allows these speakers to do far more than what they could do before. == Accuracy == According to a study by Proceedings of the National Academy of Sciences of the United States of America released In March 2020, the six biggest tech development companies, Amazon, Apple, Google, Yandex, IBM and Microsoft, have misidentified more words spoken by "black people" than "white people". The systems tested errors and unreadability, with a 19 and 35 percent discrepancy for the former and a 2 and 20 percent discrepancy for the latter. The North American Chapter of the Association for Computational Linguistics (NAACL) also identified a discrepancy between male and female voices. According to their research, Google's speech recognition software is 13 percent more accurate for men than women. It performs better than the systems used by Bing, AT&T, and IBM. == Privacy concerns == The built-in microphone in smart speakers is continuously listening for wake words followed by a command. However, these continuously listening microphones also raise privacy concerns among users. According to a survey taken by 1,007 people in Western Europe, it is clear that privacy is the biggest concern holding consumers back from buying "smart" products. these concerns include what is being recorded, how the data will be used, how it will be protected, and whether it will be used for invasive advertising. Furthermore, an analysis of Amazon Echo Dots showed that 30–38% of "spurious audio recordings were human conversations", suggesting that these devices capture audio other than strictly detection of the wake word. === As a wiretap === There are strong concerns that the ever-listening microphone of smart speakers presents a perfect candidate for wiretapping. In 2017, British security researcher Mark Barnes showed that pre-2017 Echos have exposed pins which allow for a compromised OS to be booted. According to Umar Iqbal, an assistant professor at Washington University in St. Louis, research indicates that data from consumer interactions with Alexa was used to targeted advertisements and products to consumer with over 40% of transmitted data lacking proper encryption raising privacy concerns. Further data indicates that due to the Smart Speakers ability to always capture audio, it begins to pick up on external conversations from consumers not related to commands given to the smart speaker. Things such as other members in the household, consumers on the phone and even TV audio can be picked up by these speakers and stored for future use by companies. === Voice assistance vs privacy === While voice assistants provide a valuable service, there can be some hesitation towards using them in various social contexts, such as in public or around other users. However, only more recently have users begun interac

    Read more →
  • ShareMethods

    ShareMethods

    ShareMethods is a Web 2.0 document management and collaboration service with a focus on sales, marketing, and the extended selling network. It offers a software as a service (SaaS) subscription to companies and is available as a stand-alone application or as an integrated program with CRM tools such as Oracle CRM On Demand or salesforce.com. == History == ShareMethods was launched in 2004 to provide collaboration and communication services for sales and marketing teams, business partners, and customers. The founders have a background of building software-as-a-service applications and creating digital media applications. In September 2005, ShareMethods launched "ShareNow" as one of the first applications on the salesforce.com AppExchange. In September 2006, ShareMethods moved its operations into a SAS 70 Type II data center owned by SunGard. In March 2009, ShareMethods launched "ShareSpaces" to provide on-demand portals or workspaces. In 2013, ShareMethods announced that its platform is available in a private cloud (on-premises) version. == Products == ShareMethods: Combines document management, collaboration, analytics, and CRM integration into a single solution. Key content can be centrally managed and delivered to sales channels, while providing feedback to marketing. ShareMethods is often used as a sales portal for internal sales and a partner portal for external partners. ShareNow: Integrates ShareMethods with salesforce.com providing Single Sign On for salesforce.com users and access to files related to accounts opportunities, etc. including custom objects. Also facilitates collaboration between salesforce.com users and non-users. ShareMethods for Oracle CRM On Demand: Integrates ShareMethods with Oracle CRM On Demand providing Single Sign On for Oracle users and easy access to files related to accounts opportunities, etc. ShareOffice: An on-demand intranet/extranet solution. Features include full-text search, version history, server sync-up, email updates, audit trail/analytics, check-in/check-out, multilingual user interface. ShareSpaces: Independent workspaces or portals where users can collaborate with business partners, teammates, or individuals to work together on content and documents. == Integration and interoperability == ShareMethods is available on Salesforce.com's AppExchange platform. ShareMethods also integrates with Oracle CRM On Demand to provide document management within the CRM application. Customers also can integrate proprietary systems via single-sign-on and self-registration. In addition, developers can make use of the ShareMethods API based on WebDAV to integrate document management functionality.

    Read more →
  • Context-sensitive user interface

    Context-sensitive user interface

    A context-sensitive user interface offers the user options based on the state of the active program. Context sensitivity is ubiquitous in current graphical user interfaces, often in context menus. A user-interface may also provide context sensitive feedback, such as changing the appearance of the mouse pointer or cursor, changing the menu color, or with auditory or tactile feedback. == Reasoning and advantages of context sensitivity == The primary reason for introducing context sensitivity is to simplify the user interface. Advantages include: Reduced number of commands required to be known to the user for a given level of productivity. Reduced number of clicks or keystrokes required to carry out a given operation. Allows consistent behaviour to be pre-programmed or altered by the user. Reduces the number of options needed on screen at one time. === Disadvantages === Context sensitive actions may be perceived as dumbing down of the user interface, leaving the operator at a loss as to what to do when the computer decides to perform an unwanted action. Additionally non-automatic procedures may be hidden or obscured by the context sensitive interface causing an increase in user workload for operations the designers did not foresee. A poor implementation can be more annoying than helpful – a classic example of this is Office Assistant. == Implementation == At the simplest level each possible action is reduced to a single most likely action – the action performed is based on a single variable (such as file extension). In more complicated implementations multiple factors can be assessed such as the user's previous actions, the size of the file, the programs in current use, metadata etc. The method is not only limited to the response to imperative button presses and mouse clicks – pop-up menus can be pruned and/or altered, or a web search can focus results based on previous searches. At higher levels of implementation context sensitive actions require either larger amounts of meta-data, extensive case analysis based programming, or other artificial intelligence algorithms. === In computer and video games === Context sensitivity is important in video games, especially those controlled by a gamepad, joystick or computer mouse in which the number of buttons available is limited. It is primarily applied when the player is in a certain place and is used to interact with a person or object. For example, if the player is standing next to a non-player character, an option may come up allowing the player to talk with them. Implementations range from the embryonic 'Quick Time Event' to context sensitive sword combat in which the attack used depends on the position and orientation of both the player and opponent, as well as the virtual surroundings. A similar range of use is found in the 'action button' which, depending upon the in-game position of the player's character, may cause it to pick something up, open a door, grab a rope, punch a monster or opponent, or smash an object. The response does not have to be player activated – an on-screen device may only be shown in certain circumstances, e.g. 'targeting' cross hairs in a flight combat game may indicate the player should fire. An alternative implementation is to monitor the input from the player (e.g. level of button pressing activity) and use that to control the pace of the game in an attempt to maximize enjoyment or to control the excitement or ambience. The method has become increasingly important as more complex games are designed for machines with few buttons (keyboard-less consoles). Bennet Ring commented (in 2006) that "Context-sensitive is the new lens flare". === Context-sensitive help === Context sensitive help is a common implementation of context sensitivity, a single help button is actioned and the help page or menu will open a specific page or related topic.

    Read more →
  • Feeding the Machine (book)

    Feeding the Machine (book)

    Feeding the Machine: The Hidden Human Labour Powering AI is a 2024 book by James Muldoon, Mark Graham and Callum Cant. == Writing == The authors developed the concept for the book while doing fieldwork studying data annotation in developing countries in East Africa. == Synopsis == The book examines the human input needed to develop and sustain AI ecosystems. == Reception == The book received positive reviews. Rosalie Waelen of Capital & Class gave it a mostly positive review. Tim Hornyak of Literary Review praised it. Kirkus Reviews called it "A sobering and timely—if sometimes distracted—study of AI.". Publishers Weekly gave the book a starred review, writing that "The grim real-life stories read like dystopian parables, such as the account of a European voice actor whose recordings were legally used without her consent to create an inexpensive synthetic clone whom she now competes with for business. Driven by striking reporting and finely observed profiles, this unsettles."

    Read more →