AI Coding Quality

AI Coding Quality — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Inpainting

    Inpainting

    Inpainting is a conservation process where damaged, deteriorated, or missing parts of an artwork are filled in to present a complete image. This process is commonly used in image restoration. It can be applied to both physical and digital art mediums such as oil or acrylic paintings, chemical photographic prints, sculptures, or digital images and video. With its roots in physical artwork, such as painting and sculpture, traditional inpainting is performed by a trained art conservator who has carefully studied the artwork to determine the mediums and techniques used in the piece, potential risks of treatments, and ethical appropriateness of treatment. == History == The modern use of inpainting can be traced back to Pietro Edwards (1744–1821), Director of the Restoration of the Public Pictures in Venice, Italy. Using a scientific approach, Edwards focused his restoration efforts on the intentions of the artist. It was during the 1930 International Conference for the Study of Scientific Methods for the Examination and Preservation of Works of Art, that the modern approach to inpainting was established. Helmut Ruhemann (1891–1973), a German restorer and conservator, led the discussions on the use of inpainting in conservation. Helmut Ruhemann was a leading figure in modernizing restoration and conservation. His greatest contribution to the field of conservation "was his insistence on following the methods of the original painter exactly, and on understanding the painter's artistic intention". After his career of over 40 years as a conservator, Ruhemann published his treatise The Cleaning of Paintings: Problems & Potentialities in 1968. In describing his method, Ruhemann states that "The surface [of the fill] should be slightly lower than that of the surrounding paint to allow for the thickness of the inpainting...Inpainting medium should look and behave like the original medium, but must not darken with age." Cesare Brandi (1906–1988) developed the teoria del restauro, the inpainting approach combining aesthetics and psychology. However, this approach was used primarily by Italian restorers and conservators, with the terminology becoming widespread in the 1990s. Technological advancements led to new applications of inpainting. Widespread use of digital techniques range from entirely automatic computerized inpainting to tools used to simulate the process manually. Since the mid-1990s, the process of inpainting has evolved to include digital media. More commonly known as image or video interpolation, a form of estimation, digital inpainting includes the use of computer software that relies on sophisticated algorithms to replace lost or corrupted parts of the image data. == Ethics == In order to preserve the integrity of an original artwork, any inpainting technique or treatment applied to physical or digital work should be reversible or distinguishable from the original content of the artwork. Prior to any treatments, conservators proceed according to the American Institute of Conservation of Historical and Artistic Works. There are several ethic considerations before Inpainting can be justified. Various deliberation decisions over the ethical appropriateness of the amount and type of inpainting done, resides on many factors. As most conservation treatments, inpainting's ethical questions rest mainly with authenticity, reversibility and documentation.Any intervention to compensate for loss should be documented in treatment records and reports and should be detectable by common examination methods. Such compensation should be reversible and should not falsely modify the known aesthetic, conceptual, and physical characteristics of the cultural property, especially by removing or obscuring original material.New technologies and the aesthetic demand for perfect images without imperfections challenge conservators' ethical practices to protect the integrity of originals. == Methods == Inpainting methods and techniques depend on the desired goal and type of image being treated. Treatments to fill in the gaps are different between physical and digital art. In inpainting, detailed records of the initial state of the images can help with the treatment and replicate the original closer. === Physical inpainting === Inpainting is rooted in the conservation and restoration of paintings. Inpainting can aim to make a visual improvement to the artwork as a whole by repairing missing or damaged parts using methods and materials equivalent to the original artist's work. ==== Application techniques ==== By studying the painting methods of various artists and the composition of paints used historically, conservators are able to restore works very closely to their original visual appearance. The picture as a whole determines how to fill in the gap. Helmut Ruhemann's inpainting techniques by Jessell have procedures to "preserve" the quality of oil and tempera paintings. === Digital inpainting === Many programs are able to reconstruct missing or damaged areas of digital photographs and videos. Most widely known for use with digital images is Adobe Photoshop. Given the various abilities of the digital camera and the digitization of old photos, inpainting has become an automatic process that can be performed on digital images. The inpainting techniques can be applied to object removal, text removal, and other automatic modifications of images and videos. In video special effects, inpainting is usually performed after video matting. They can also be observed in applications like image compression and super-resolution. In photography and cinema, it is used for film restoration to reverse, repair, or mitigate deterioration (e.g., physical damage such as cracks in photographs, scratches and dust spots in film, or chemical damage resulting in image loss; performed infrared cleaning). It can also be used for removing red-eye, the stamped date from photographs, and objects for creative effect. This technique can be used to replace any lost blocks in the coding and transmission of images, for example, in a streaming video. It can also be used to remove logos or watermarks in videos. Deep learning neural network-based inpainting can be used for decensoring images. Deep image prior-based techniques can be used for digital image inpainting, where a trained deep learning model is either unavailable or infeasible. Deep models for visual content generation, like text-to-image or text-to-video, learn complex priors over the distribution of visual content, and can be used to inpaint missing parts. For example, videos can be separated into layers, using a technique called omnimatte, which either pretrain an omnimatte model or without any training using an omnimatte-zero model. Three main groups of 2D image-inpainting algorithms can be found in the literature. The first one to be noted is structural (or geometric) inpainting, the second one is texture inpainting, the last one is a combination of these two techniques. They use the information of the known or non-destroyed image areas in order to fill the gap, similar to how physical images are restored. ==== Structural ==== Structural or geometric inpainting is used for smooth images that have strong, defined borders. There are many different approaches to geometric inpainting, but they all come from the idea that geometry can be recovered from similar areas or domains. Bertalmio proposed a method of structural inpainting that mimics how conservators address painting restoration. Bertalmio proposed that by progressively transferring similar information from the borders of an inpainting domain inwards, the gap can be filled. ==== Textural ==== While structural/geometric inpainting works to repair smooth images, textural inpainting works best with images that are heavily textured. Texture has a repetitive pattern which means that a missing portion cannot be restored by continuing the level lines into the gap; level lines provide a complete, stable representation of an image. To repair texture in an image, one can combine frequency and spatial domain information to fill in a selected area with a desired texture. This method, while the most simple and very effective, works well when selecting a texture to be in-painted. For a texture that covers a wider area or a larger frame one would have to go through the image segmenting the areas to be in-painted and selecting the corresponding textures from throughout the image; there are programs that can help find the corresponding areas that work in a similar way as 'find and replace' works in a word processor. ==== Combined structural and textural ==== Combined structural and textural inpainting approaches simultaneously try to perform texture- and structure-filling in regions of missing image information. Most parts of an image consist of texture and structure and the boundaries between image regions contain a large amount of structural information. This is the result when blending differ

    Read more →
  • Darwin among the Machines

    Darwin among the Machines

    "Darwin among the Machines" is a letter to the editor published in The Press newspaper on 13 June 1863 in Christchurch, New Zealand. The title, which was chosen by the author, references the work of Charles Darwin. Written by Samuel Butler but signed Cellarius, the letter raised the possibility that machines were a kind of "mechanical life" undergoing constant evolution, and that eventually machines might supplant humans as the dominant species. == Book of the Machines == Butler developed this and subsequent articles into The Book of the Machines, three chapters of Erewhon, published anonymously in 1872. The Erewhonian society Butler envisioned had long ago undergone a revolution that destroyed most mechanical inventions. The narrator of the story finds a book that details the reasons for this revolution, which he translates for the reader. Despite the initial popularity of Erewhon, Butler commented in the preface to the second edition that reviewers had "in some cases been inclined to treat the chapters on Machines as an attempt to reduce Mr. Darwin's theory to an absurdity." He protested that "few things would be more distasteful to me than any attempt to laugh at Mr. Darwin", but also added "I am surprised, however, that the book at which such an example of the specious misuse of analogy would seem most naturally levelled should have occurred to no reviewer; neither shall I mention the name of the book here, though I should fancy that the hint given will suffice", which may suggest that the chapter on Machines was in fact a satire intended to illustrate the "specious misuse of analogy", even if the target was not Darwin; Butler, fearing that he had offended Darwin, wrote him a letter explaining that the actual target was Joseph Butler's 1736 The Analogy of Religion, Natural and Revealed, to the Constitution and Course of Nature. The Victorian scholar Herbert Sussman has suggested that although Butler's exploration of machine evolution was intended to be whimsical, he may also have been genuinely interested in the notion that living organisms are a type of mechanism and was exploring this notion with his writings on machines, while the philosopher Louis Flaccus called it "a mixture of fun, satire, and thoughtful speculation." == Evolution of Global Intelligence == George Dyson applies Butler's original premise to the artificial life and intelligence of Alan Turing in Darwin Among the Machines: The Evolution of Global Intelligence (1998) ISBN 0-7382-0030-1, to suggest that the internet is a living, sentient being. Dyson's main claim is that the evolution of a conscious mind from today's technology is inevitable. It is not clear whether this will be a single mind or multiple minds, how smart that mind would be, and even if we will be able to communicate with it. He also clearly suggests that there are forms of intelligence on Earth that we are currently unable to understand. From the book: "What mind, if any, will become apprehensive of the great coiling of ideas now under way is not a meaningless question, but it is still too early in the game to expect an answer that is meaningful to us."

    Read more →
  • Blocks world

    Blocks world

    The blocks world is a planning domain in artificial intelligence. It consists of a set of wooden blocks of various shapes and colors sitting on a table. The goal is to build one or more vertical stacks of blocks. Only one block may be moved at a time: it may either be placed on the table or placed atop another block. Because of this, any blocks that are, at a given time, under another block cannot be moved. Moreover, some kinds of blocks cannot have other blocks stacked on top of them. The simplicity of this toy world lends itself readily to classical symbolic artificial intelligence approaches, in which the world is modeled as a set of abstract symbols which may be reasoned about. == Motivation == Artificial Intelligence can be researched in theory and with practical applications. The problem with most practical applications is that the engineers don't know how to program an AI system. Instead of rejecting the challenge at all the idea is to invent an easy to solve domain which is called a toy problem. Toy problems were invented with the aim to program an AI which can solve it. The blocks world domain is an example of a toy problem. Its major advantage over more realistic AI applications is that many algorithms and software programs are available which can handle the situation. This allows comparing different theories against each other. In its basic form, the blocks world problem consists of cubes of the same size which have all the color black. A mechanical robot arm has to pick and place the cubes. More complicated derivatives of the problem consist of cubes of different sizes, shapes and colors. From an algorithmic perspective, blocks world is an NP-hard search and planning problem. The task is to bring the system from an initial state into a goal state. Automated planning and scheduling problems are usually described in the Planning Domain Definition Language (PDDL) notation which is an AI planning language for symbolic manipulation tasks. If something was formulated in the PDDL notation, it is called a domain. Therefore, the task of stacking blocks is a blocks world domain which stands in contrast to other planning problems like the dock worker robot domain and the monkey and banana problem. == Theses/projects which took place in a blocks world == Terry Winograd's SHRDLU Patrick Winston's Learning Structural Descriptions from Examples and Copy Demo Gerald Jay Sussman's Sussman anomaly Decision problem (Gupta and Nau, 1992): Given a starting Blocks World, an ending Blocks World, and an integer L > 0, is there a way to move the blocks to change the starting position to the ending position with L or less steps? This decision problem is NP-hard.

    Read more →
  • Gundam Build Metaverse

    Gundam Build Metaverse

    Gundam Build Metaverse (Japanese: ガンダムビルドメタバース, Hepburn: Gandamu Birudo Metabāzu) is a Japanese original net animation anime mini-series produced by Sunrise Beyond, and the fifth series within the Gundam Build Series sub-series. The series celebrates the 10th anniversary of the Gundam Build franchise, including characters from the previous installments. == Plot == The story is set in the same universe of the Gundam Build series in an online metaverse space where users can use avatars to move around and interact with other users, including conducting Gunpla (Gundam plastic model) battles with them. The story centers on Rio Hōjō, a boy who lives in Hawaii, and who learns how to build Gunpla from a local hobbyist named Seria Urutsuki. In the metaverse, a figure known as Mask Lady teaches him the art of Gunpla battling, and he strives to get better at it every day. With his custom Lah Gundam, he seeks out ever stronger opponents. == Characters == === Main characters === Rio Hojo (ホウジョウ・リオ, Hōjō Rio) Voiced by: Chika Anzai A young boy from Hawaii who is an enthusiast of Gunpla Battle and is an apprentice of the mysterious Diver "Mask Lady". Rio's Gunpla is the Lah Gundam, modeled after an entry-grade RX-78-2 Gundam, from the original Mobile Suit Gundam anime series. Seria Urutsuki (ウルツキ・セリア, Urutsuki Seria) / Mask Lady (マスクレディー, Masuku Reidi) Voiced by: Rio Tsuchiya A clerk at a local hobby shop and the instructor at their Gunpla class, Seria becomes Rio's Gunpla mentor using the alias "Mask Lady". Seria's Gunpla is the ZGMF-X20A-PF Gundam Perfect Strike Freedom Rouge, based on both the MBF-02 Strike Rouge and the GAT-X105+AQM/E-YM1 Perfect Strike Gundam from Mobile Suit Gundam Seed and the ZGMF-X20A Strike Freedom Gundam from Mobile Suit Gundam Seed Destiny. === Returning characters === Fumina Hoshino (ホシノ・フミナ, Hoshino Fumina) Voiced by: Yui Makino A veteran Gunpla Battler from the early days of the sport and the Leader of "Team Try Fighters", she works as an advertiser and announcer within the Metaverse realm. Tatsuya Yuuki (ユウキ・タツヤ, Yūki Tatsuya) / Meijin Kawaguchi III (三代目メイジン・カワグチ, Sandaime Meijin Kawaguchi) Voiced by: Takuya Satō A builder and three-times Gunpla Battle world champion who inherited the name of the legendary Meijin Kawaguchi, known as "Meijin Kawaguchi III", and still the current title holder. His newest Gunpla is the Gundam Amazing Barbatos Lupus based on the ASW-G-08 Gundam Barbatos Lupus from Mobile Suit Gundam: Iron-Blooded Orphans. Riku Mikami (ミカミ・リク, Mikami Riku) / Riku (リク) Voiced by: Yūsuke Kobayashi The Founder and former leader of the legendary force, "Build Divers". His Gunpla is the Gundam 00 Diver Arc, the latest version of the original GN-0000DVR Gundam 00 Diver from Gundam Build Divers, incorporating elements from the 00 Gundam from Mobile Suit Gundam 00 and the Gundam AGE-FX from Mobile Suit Gundam AGE. Sarah (サラ, Sara) Voiced by: Haruka Terui An EL-Diver and member of the Build Divers. Momoka Yashiro (ヤシロ・モモカ, Yashiro Momoka) / Momo (モモ) Voiced by: Nene Hieda Member of Build Divers. Her gunpla is the MOMOKAPOOL (R×R), an upgraded version of her PEN-01M Momokapool from Gundam Build Divers Aya Fujisawa (フジサワ・アヤ, Fujisawa Aya) / Ayame (アヤメ) Voiced by: Manami Numakura Member of Build Divers. Her Gunpla is the F-Kunoichi Kai, an SD Gunpla based on the F91 Gundam F91 from Mobile Suit Gundam F91. Sei Iori (イオリ・セイ, Iori Sei) Voiced by: Mikako Komatsu A builder and one time Gunpla Battle World Champion. His current Gunpla is the GAT-X105B/EG Build Strike Exceed Galaxy, the latest version of the original GAT-X105B Build Strike Gundam from Gundam Build Fighters. Aria von Reiji Asuna (アリーア・フォン・レイジ・アスナ, Arīa fon Reiji Asuna) Voiced by: Sachi Kokuryu A prince from the country called Arian that exists within a space colony in another dimension, who became friends with Sei Iori and together won the Gunpla Battle World Championship. He somehow manages to log into the metaverse to reunite with his friend, piloting the SB-011 Star Burning Gundam. Sekai Kamiki (カミキ・セカイ, Kamiki Sekai) Voiced by: Kazumi Togashi A veteran builder and former member of Team Try Fighters. He is currently the Japanese National representative Champion. In the series he develops a rivalry relationship with Hiroto similar to that of Kyoya and Rommel. His current Gunpla is the Shin Burning Gundam, the latest version of the original KMK-B01 Kamiki Burning Gundam from Gundam Build Fighters Try which is based on the Burning Gundam and Master Gundam. Hiroto Kuga (クガ・ヒロト, Kuga Hiroto) / Hiroto (ヒロト, Hiroto) Voiced by: Chiaki Kobayashi A veteran diver, the one responsible for discovering more EL-Divers, and a former member of the legendary force "Avalon", who later joined the unofficial, "BUILD DiVERS" and eventually became the current Force Leader, and as well as the current title holder of "Hero of Gunpla". In the third episode he is the only Build Diver member who participates in the tournament, while his fellow force-mates are in the audience routing for him and Rio. His Gunpla is the Plutine Gundam, which is a combination of his Core Gundam II Plus, upgraded from the Core Gundam II featured in Gundam Build Divers Re:Rise equipped with the Pluto Armor. Magee (マギー, Magī) Voiced by: Taishi Murata A flamboyant veteran Diver who owns a shop in the metaverse and is an acquaintance of Seria's. Freddie (フレディ, Furedi) Voiced by: Ai Kakuma An alien anthropomorphic dog boy from planet Eldora, a support member to both Build Diver teams, who manages to access the metaverse from his home planet along his fellow Eldorans. Ogre (オーガ, Ōga) Voiced by: Wataru Hatano Kyoya Kisugi (キスギ・キョウヤ, Kisugi Kyōya) / Kyoya Kujo (クジョウ・キョウヤ, Kujō Kyōya) Voiced by: Jun Kasama Leader of the legendary force "Avalon" and the reigning and current title holder of "World Champion". He along with Hiroto Kuga, Maria Urutsuki, and Tatsuya Yuuki are currently at the top of the entire gunpla world community. His current gunpla is an recolored version of his AGE-TRYMAG Gundam TRY AGE Magnum from Gundam Build Divers Re:Rise. Susumu Sazaki (サザキ・ススム, Sazaki Susumu) Voiced by: Ryo Hirohashi Kaoruko Sazaki (サザキ・カオルコ, Sazaki Kaoruko) Voiced by: Ryo Hirohashi Mahiru Shigure (シグレ・マヒル, Shigure Mahiru) Voiced by: Rinko Natsuhi Keiko Sano (サノ・ケイコ, Sano Keiko) Voiced by: Ami Naito === Others === Maria Urutsuki (ウルツキ・マリア, Urutsuki Maria) / Mascarilla (マスカリージャ, Masukarīja) Voiced by: Ai Kakuma A mysterious masked woman with a harsh rivalry with Seria and a similar avatar as hers, she is later revealed as Seria's younger sister Maria, who began to loathe her sister after she quit on their dream to fight for the title of Lady Kawaguchi. She later obtains the title, becoming "Lady Kawaguchi VII". Jeff (ジェフさん, Jefu-san) Voiced by: Kenta Miyake A distant relative of Seria and Maria's and owner of the hobby shop where Seria lives. Mellow Neige (メロウ・ネージュ, Merō Nēju) Voiced by: Chikano Ibuki A sentient A.I. who is the current publicity face of the Gunpla Metaverse. == Episodes ==

    Read more →
  • LENA Foundation

    LENA Foundation

    The LENA Foundation is an American nonprofit organisation which provides tools for measuring children's language acquisition and exposure. Specifically, the LENA system consists of a digital language processor which is worn by a child and records and analyses their auditory environment, using propriety software. It then presents a summary of child-adult conversation, such as conversation turns and word counts. The purpose of the LENA system is to encourage interactive talk between children (between the age of two to forty-eight months) and their caretakers. The LENA system is also used for research; while useful for researchers who wish to save transcription costs or observe the child in its natural state, the accuracy of this system, while often quite high, varies between contexts, for example notably in the case of hard of hearing children. Because of this, several researchers recommend caution in using only the LENA system on its own for the purposes of scientific research. == History == The LENA Foundation was established in 2009 by Terrance and Judith Paul, founders of Renaissance Learning, Inc., with the purpose of aiding children with disabilities and assisting with early learning. They were inspired by the book "Meaningful Differences in the Everyday Experience of American Children" by Dr. Betty Hart and Dr. Todd Risley. A pilot version of the LENA system was launched in February 2006. The LENA Research Foundation was registered as a tax-exempt 501(c)(3) nonprofit in September 2010. The organisation was renamed simply LENA in 2018 and adopted the tagline "Building brains through early talk." LENA has been used for parental feedback, linguistics or paediatrics research, and for specific clinical cases. == Scientific background == In 2018, research using the LENA system showed that there was a link between children's conversational turns and activation of Broca's area (a part of the brain responsible, although not necessarily essential, for language processing). The LENA foundation cites research by its own employees as evidence for the scientific basis of its technology. Said research claims that verbal interaction with young children has an effect on language acquisition, including verbal comprehension skills during adolescence. == LENA System == The LENA software analyses a child's natural language environment, such as verbal exposure, and provides several metrics, such as adult and child speech time, television/recorded audio time, word count, or conversation turn count. The LENA hardware is a recorder that is usually placed into a child's specially-designed vest. The software was trained on over 65,000 hours of manually annotated American English audio recordings. It splits the audio into segments which are categorised as "key child", "other child", "male adult", "noise", etc. The advantages of LENA as opposed to manual transcription are its speed and ease of use; the disadvantages are its potential inaccuracies and lack of transcription capability (which LENA does not profess to attempt). The LENA system has also been criticised for prioritising quantity of speaking over quality (i.e., mastery of the language, as opposed to babble). == Product lines == === LENA Start === LENA Start is a program for parents that utilises feedback from the LENA System in conjunction with weekly group sessions in order to address the home language environment. It was introduced in 2015 and implemented across several U.S. states. In October 2020, during the restrictions of the COVID-19 pandemic, Read Aloud Delaware began a virtual LENA Start program with families statewide, where parents received feedback and participated in one-hour Zoom workshops each week during the 10-week program. === LENA Grow === LENA Grow is a professional development program for teachers in early childhood classrooms. Before launching at sites around the country, the program was first piloted in Escambia County, Florida. === LENA Home === LENA Home is a supplement to existing parent coaching curricula. Typically, home visitors facilitate the use of the LENA System to help parents track their progress towards increasing interactive talk in their homes. === Developmental Snapshot === The LENA Developmental Snapshot, based on a 52-question parent survey, assesses both expressive and receptive language skills and provides an estimate of a child's developmental age from 2 months to 36 months.

    Read more →
  • Content-based image retrieval

    Content-based image retrieval

    Content-based image retrieval, also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR), is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases (see this survey for a scientific overview of the CBIR field). Content-based image retrieval is opposed to traditional concept-based approaches (see Concept-based image indexing). "Content-based" means that the search analyzes the contents of the image rather than the metadata such as keywords, tags, or descriptions associated with the image. The term "content" in this context might refer to colors, shapes, textures, or any other information that can be derived from the image itself. CBIR is desirable because searches that rely purely on metadata are dependent on annotation quality and completeness. == Comparison with metadata searching == An image meta search requires humans to have manually annotated images by entering keywords or metadata in a large database, which can be time-consuming and may not capture the keywords desired to describe the image. The evaluation of the effectiveness of keyword image search is subjective and has not been well-defined. In the same regard, CBIR systems have similar challenges in defining success. "Keywords also limit the scope of queries to the set of predetermined criteria." and, "having been set up" are less reliable than using the content itself. == History == The term "content-based image retrieval" seems to have originated in 1992 when it was used by Japanese Electrotechnical Laboratory engineer Toshikazu Kato to describe experiments into automatic retrieval of images from a database, based on the colors and shapes present. Since then, the term has been used to describe the process of retrieving desired images from a large collection on the basis of syntactical image features. The techniques, tools, and algorithms that are used originate from fields such as statistics, pattern recognition, signal processing, and computer vision. === QBIC - Query By Image Content === The earliest commercial CBIR system was developed by IBM and was called QBIC (Query By Image Content). Recent network- and graph-based approaches have presented a simple and attractive alternative to existing methods. While the storing of multiple images as part of a single entity preceded the term BLOB (Binary Large OBject), the ability to fully search by content, rather than by description, had to await IBM's QBIC. === VisualRank === == Technical progress == The interest in CBIR has grown because of the limitations inherent in metadata-based systems, as well as the large range of possible uses for efficient image retrieval. Textual information about images can be easily searched using existing technology, but this requires humans to manually describe each image in the database. This can be impractical for very large databases or for images that are generated automatically, e.g. those from surveillance cameras. It is also possible to miss images that use different synonyms in their descriptions. Systems based on categorizing images in semantic classes like "cat" as a subclass of "animal" can avoid the miscategorization problem, but will require more effort by a user to find images that might be "cats", but are only classified as an "animal". Many standards have been developed to categorize images, but all still face scaling and miscategorization issues. Initial CBIR systems were developed to search databases based on image color, texture, and shape properties. After these systems were developed, the need for user-friendly interfaces became apparent. Therefore, efforts in the CBIR field started to include human-centered design that tried to meet the needs of the user performing the search. This typically means inclusion of: query methods that may allow descriptive semantics, queries that may involve user feedback, systems that may include machine learning, and systems that may understand user satisfaction levels. == Techniques == Many CBIR systems have been developed, but as of 2006, the problem of retrieving images on the basis of their pixel content remains largely unsolved. Different query techniques and implementations of CBIR make use of different types of user queries. === Query By Example === QBE (Query By Example) is a query technique that involves providing the CBIR system with an example image that it will then base its search upon. The underlying search algorithms may vary depending on the application, but result images should all share common elements with the provided example. Options for providing example images to the system include: A preexisting image may be supplied by the user or chosen from a random set. The user draws a rough approximation of the image they are looking for, for example with blobs of color or general shapes. This query technique removes the difficulties that can arise when trying to describe images with words. === Semantic retrieval === Semantic retrieval starts with a user making a request like "find pictures of Abraham Lincoln". This type of open-ended task is very difficult for computers to perform - Lincoln may not always be facing the camera or in the same pose. Many CBIR systems therefore generally make use of lower-level features like texture, color, and shape. These features are either used in combination with interfaces that allow easier input of the criteria or with databases that have already been trained to match features (such as faces, fingerprints, or shape matching). However, in general, image retrieval requires human feedback in order to identify higher-level concepts. === Relevance feedback (human interaction) === Combining CBIR search techniques available with the wide range of potential users and their intent can be a difficult task. An aspect of making CBIR successful relies entirely on the ability to understand the user intent. CBIR systems can make use of relevance feedback, where the user progressively refines the search results by marking images in the results as "relevant", "not relevant", or "neutral" to the search query, then repeating the search with the new information. Examples of this type of interface have been developed. === Iterative/machine learning === Machine learning and application of iterative techniques are becoming more common in CBIR. === Other query methods === Other query methods include browsing for example images, navigating customized/hierarchical categories, querying by image region (rather than the entire image), querying by multiple example images, querying by visual sketch, querying by direct specification of image features, and multimodal queries (e.g. combining touch, voice, etc.) == Content comparison using image distance measures == The most common method for comparing two images in content-based image retrieval (typically an example image and an image from the database) is using an image distance measure. An image distance measure compares the similarity of two images in various dimensions such as color, texture, shape, and others. For example, a distance of 0 signifies an exact match with the query, with respect to the dimensions that were considered. As one may intuitively gather, a value greater than 0 indicates various degrees of similarities between the images. Search results then can be sorted based on their distance to the queried image. Many measures of image distance (Similarity Models) have been developed. === Color === Computing distance measures based on color similarity is achieved by computing a color histogram for each image that identifies the proportion of pixels within an image holding specific values. Examining images based on the colors they contain is one of the most widely used techniques because it can be completed without regard to image size or orientation. However, research has also attempted to segment color proportion by region and by spatial relationship among several color regions. === Texture === Texture measures look for visual patterns in images and how they are spatially defined. Textures are represented by texels which are then placed into a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture is located. Texture is a difficult concept to represent. The identification of specific textures in an image is achieved primarily by modeling texture as a two-dimensional gray level variation. The relative brightness of pairs of pixels is computed such that degree of contrast, regularity, coarseness and directionality may be estimated. The problem is in identifying patterns of co-pixel variation and associating them with particular classes of textures such as silky, or rough. Other methods of classifying textures include: Co-occurrence matrix Laws texture energy Wavelet transform Orthogonal transforms (discrete Chebyshev moments) =

    Read more →
  • Emi Kusano

    Emi Kusano

    Emi Kusano (Japanese: 草野 絵美, Hepburn: Kusano Emi; born August 4, 1990) is a Tokyobased Japanese multidisciplinary artist known for creating photography, video, and installations using generative AI technology. Her work explores themes of nostalgia, pop culture, and collective memory. Her work explores themes of nostalgia, pop culture, and collective memory. She is recognized as one of the early practitioners of generative AI art. Her work has been exhibited at the 21st Century Museum of Contemporary Art, Kanazawa, and screened at the M+ Museum’s Asian Avant-Garde Film Festival. Additionally, she has participated in prestigious international art fairs, including Paris Photo and Art Basel Hong Kong. In 2025, she was named one of the World Economic Forum's Young Global Leaders. In 2026, she was selected as a fellow for the AI x Arts Fellowship at Mohamed bin Zayed University of Artificial Intelligence. Kusano serves as a part-time lecturer at the Tokyo University of the Arts and is the producer and vocalist for the Synthwave music unit, Satellite Young. == Early life == === Photography === Kusano was born and raised in Tokyo. Kusano's career began during her high school years before 2008 when she became involved in street fashion photography. Her photographs, primarily taken in Harajuku, were published on "Japanese Streets", "Metropolis", CNN's travel guide magazine "CNN GO","WGSN". Her photography was exhibited at the FIT Museum in New York and the Victoria and Albert Museum in London. == Career == === Music and Installation work === Since 2014, in collaboration with BelleMaison Sekine, Kusano has led "Satellite Young," a synthwave music unit s the lead vocalist, she sings about blending 1980s idol culture with lyrics that tackle contemporary issues such as planned obsolescence ("Sony Timer"), online dating, artificial intelligence, and social media. Their music, known for its conceptual depth, has earned international niche recognition. "Satellite Young" has participated in music festivals, including "South by Southwest," showcasing their unique fusion of retro aesthetics and modern critiques. In 2018, she was selected to participate in "Art Hack Day," an interdisciplinary art hackathon held at The National Museum of Emerging Science and Innovation. where she presented "Singing Dream," a karaoke machine endowed with artificial life, earning the Jury Prize. "Instababy Generator," a 2019 installation co-created with Junichi Yamaoka, explored the concept of designer babies and received recognition at the SIGGRAPH Art Gallery. In October 2020, operating under the name Emi Satellite, she debuted as a solo singer with her first single "Glass Ceiling," an empowerment anthem that addresses the challenges faced by women and encourages progress towards the future. The music video for this song features a direction where strong women rewrite the roles of protagonists in a Bishōjo game, a type of dating simulation game. This concept later served as a prototype for Shinsei Galverse. === Challenge for Blockchain Art === In 2021, she explored the financial world through her single "IPO" and entered the NFT space with "Love Is an IPO," her first NFT work on Ethereum, sold on Foundation. In April 2022, she co-founded the crowdfunded anime project "Shinsei Galverse" with Ayaka Ohira, Devin Mancuso, and Jack Baldwin. serving as one of the executive directors overseeing the creative direction and story. The project's NFT collection of 8,888 ranked #1 on OpenSea's "Top NFTs" for several days, marking one of Japan's first globally successful blockchain art projects. In 2023, Shinsei Galverse produced the official "I like u" music video by Grammy-nominated singer Tove Lo as an initial anime endeavor. Kusano also contributed to discussions on Web3.0 and blockchain technology as a panelist in seminars organized by the Digital Agency of Japan. === AI art === In May 2023, Kusano's first AI art collection "Neural Fad" depicting imaginary fashion history sold out 100 pieces within 24 hours at the "Bright Moments Tokyo" In June, she created WWDJAPAN's first AI-generated magazine cover using her own face. It is the first AI cover in Japanese fashion media. She was also appointed t to the Cultural Affairs Agency's Copyright Subcommittee, she participates in discussions on generative AI and copyright. Her "Synthetic Reflections" self-portrait series debuted on SuperRare, with the first piece auctioned for 3.5 ETH (equivalent to 6,480 US dollars at the time). In July 2023, she co-exhibited a 3D AI-generated dress at Christie's "Future Frequencies" auction with Gucci, alongside Claire Silver. In September, her 30-piece "Pixelated Perception" exhibit at Art Blocks Marfa explored 1990s media and gender, also showcased at the 21st Century Museum of Contemporary Art, Kanazawa. In December, her "Techno-Animism" AI art collection fused Japanese animism with technology. Collaborating with a U.S. gallery, she unveiled 336 pieces during a two-week Art Basel world tour. Throughout the two-week tour, she sold a total of 336 pieces, generating 11.2 ETH (equivalent to 21,264 US dollars at the time). === Generative art === In February 2024, the generative art platform Art Blocks selected the work "Melancholic Magical Maiden," for its Curated category. This piece reconstructs the aesthetics of 1990s magical girl anime, offering a critique of past anime heroines. It sold out within an hour, with all 300 pieces going for a total of 57 ETH (equivalent to approximately 215,385US dollars at the time). In April 2024, Emi Kusano spoke at the Standing Committee on Copyright and Other Rights at the World Intellectual Property Organization (WIPO) in Geneva, Switzerland, where she presented AI-specific information for discussion. == Style and technique == Kusano draws inspiration from Japanese retro-futurism as a foundation for her artwork, which explores the cutting-edge of technology. This approach is fueled by nostalgia for the pre-internet era, specifically the postwar period when Japanese mass media held significant sway. By blending modern technology with retro-culture, she captures the complex feelings of love, hate, and ambivalence towards present and future accelerationism. While at university, Kusano was profoundly influenced by Naoki Sakai, the industrial designer responsible for igniting the retro-futurism movement. In her musical project "Satellite Young", Kusano dons the persona of an '80s female idol and sings about contemporary technology. In her installation piece "Singing Dream", she investigates the concept of an artificial life form inhabiting a karaoke machine, which has been popular since the 1980s, compelling people to sing. In the collaborative NFT art project "Shinsei Galverse", Kusano reimagines a cyberpunk anime primarily featuring female characters, incorporating elements of magical girls popular in the early Heisei period. == Personal life == Kusano has two sons. In August 2021, she minted her older son Zombie Zoo Keeper's pixel art on "OpenSea" as part of his summer research project. The artwork was purchased by notable figures including Brud CEO Trevor McFedries and Steve Aoki, who bought the piece for the equivalent of 21.82 thousand US dollars, highlighting the intersection of art, technology, and family in her work.

    Read more →
  • Type-1 OWA operators

    Type-1 OWA operators

    Type-1 OWA operators are a set of aggregation operators that generalise the Yager's OWA (ordered weighted averaging) operators in the interest of aggregating fuzzy sets rather than crisp values in soft decision making and data mining. These operators provide a mathematical technique for directly aggregating uncertain information with uncertain weights via OWA mechanism in soft decision making and data mining, where these uncertain objects are modelled by fuzzy sets. The two definitions for type-1 OWA operators are based on Zadeh's Extension Principle and α {\displaystyle \alpha } -cuts of fuzzy sets. The two definitions lead to equivalent results. == Definitions == === Definition 1 === Let F ( X ) {\displaystyle F(X)} be the set of fuzzy sets with domain of discourse X {\displaystyle X} , a type-1 OWA operator is defined as follows: Given n linguistic weights { W i } i = 1 n {\displaystyle \left\{{W^{i}}\right\}_{i=1}^{n}} in the form of fuzzy sets defined on the domain of discourse U = [ 0 , 1 ] {\displaystyle U=[0,1]} , a type-1 OWA operator is a mapping, Φ {\displaystyle \Phi } , Φ : F ( X ) × ⋯ × F ( X ) ⟶ F ( X ) {\displaystyle \Phi \colon F(X)\times \cdots \times F(X)\longrightarrow F(X)} ( A 1 , ⋯ , A n ) ↦ Y {\displaystyle (A^{1},\cdots ,A^{n})\mapsto Y} such that μ Y ( y ) = sup ∑ k = 1 n w ¯ i a σ ( i ) = y ( μ W 1 ( w 1 ) ∧ ⋯ ∧ μ W n ( w n ) ∧ μ A 1 ( a 1 ) ∧ ⋯ ∧ μ A n ( a n ) ) {\displaystyle \mu _{Y}(y)=\displaystyle \sup _{\displaystyle \sum _{k=1}^{n}{\bar {w}}_{i}a_{\sigma (i)}=y}\left({\begin{array}{{1}l}\mu _{W^{1}}(w_{1})\wedge \cdots \wedge \mu _{W^{n}}(w_{n})\wedge \mu _{A^{1}}(a_{1})\wedge \cdots \wedge \mu _{A^{n}}(a_{n})\end{array}}\right)} where w ¯ i = w i ∑ i = 1 n w i {\displaystyle {\bar {w}}_{i}={\frac {w_{i}}{\sum _{i=1}^{n}{w_{i}}}}} , and σ : { 1 , ⋯ , n } ⟶ { 1 , ⋯ , n } {\displaystyle \sigma \colon \{1,\cdots ,n\}\longrightarrow \{1,\cdots ,n\}} is a permutation function such that a σ ( i ) ≥ a σ ( i + 1 ) , ∀ i = 1 , ⋯ , n − 1 {\displaystyle a_{\sigma (i)}\geq a_{\sigma (i+1)},\ \forall i=1,\cdots ,n-1} , i.e., a σ ( i ) {\displaystyle a_{\sigma (i)}} is the i {\displaystyle i} th highest element in the set { a 1 , ⋯ , a n } {\displaystyle \left\{{a_{1},\cdots ,a_{n}}\right\}} . === Definition 2 === Using the alpha-cuts of fuzzy sets: Given the n linguistic weights { W i } i = 1 n {\displaystyle \left\{{W^{i}}\right\}_{i=1}^{n}} in the form of fuzzy sets defined on the domain of discourse U = [ 0 , 1 ] {\displaystyle U=[0,\;\;1]} , then for each α ∈ [ 0 , 1 ] {\displaystyle \alpha \in [0,\;1]} , an α {\displaystyle \alpha } -level type-1 OWA operator with α {\displaystyle \alpha } -level sets { W α i } i = 1 n {\displaystyle \left\{{W_{\alpha }^{i}}\right\}_{i=1}^{n}} to aggregate the α {\displaystyle \alpha } -cuts of fuzzy sets { A i } i = 1 n {\displaystyle \left\{{A^{i}}\right\}_{i=1}^{n}} is: Φ α ( A α 1 , … , A α n ) = { ∑ i = 1 n w i a σ ( i ) ∑ i = 1 n w i | w i ∈ W α i , a i ∈ A α i , i = 1 , … , n } {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\ldots ,A_{\alpha }^{n}}\right)=\left\{{{\frac {\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}}}{\sum \limits _{i=1}^{n}{w_{i}}}}\left|{w_{i}\in W_{\alpha }^{i},\;a_{i}}\right.\in A_{\alpha }^{i},\;i=1,\ldots ,n}\right\}} where W α i = { w | μ W i ( w ) ≥ α } , A α i = { x | μ A i ( x ) ≥ α } {\displaystyle W_{\alpha }^{i}=\{w|\mu _{W_{i}}(w)\geq \alpha \},A_{\alpha }^{i}=\{x|\mu _{A_{i}}(x)\geq \alpha \}} , and σ : { 1 , ⋯ , n } → { 1 , ⋯ , n } {\displaystyle \sigma :\{\;1,\cdots ,n\;\}\to \{\;1,\cdots ,n\;\}} is a permutation function such that a σ ( i ) ≥ a σ ( i + 1 ) , ∀ i = 1 , ⋯ , n − 1 {\displaystyle a_{\sigma (i)}\geq a_{\sigma (i+1)},\;\forall \;i=1,\cdots ,n-1} , i.e., a σ ( i ) {\displaystyle a_{\sigma (i)}} is the i {\displaystyle i} th largest element in the set { a 1 , ⋯ , a n } {\displaystyle \left\{{a_{1},\cdots ,a_{n}}\right\}} . == Representation theorem of Type-1 OWA operators == Given the n linguistic weights { W i } i = 1 n {\displaystyle \left\{{W^{i}}\right\}_{i=1}^{n}} in the form of fuzzy sets defined on the domain of discourse U = [ 0 , 1 ] {\displaystyle U=[0,\;\;1]} , and the fuzzy sets A 1 , ⋯ , A n {\displaystyle A^{1},\cdots ,A^{n}} , then we have that Y = G {\displaystyle Y=G} where Y {\displaystyle Y} is the aggregation result obtained by Definition 1, and G {\displaystyle G} is the result obtained by in Definition 2. == Programming problems for Type-1 OWA operators == According to the Representation Theorem of Type-1 OWA Operators, a general type-1 OWA operator can be decomposed into a series of α {\displaystyle \alpha } -level type-1 OWA operators. In practice, this series of α {\displaystyle \alpha } -level type-1 OWA operators is used to construct the resulting aggregation fuzzy set. So we only need to compute the left end-points and right end-points of the intervals Φ α ( A α 1 , ⋯ , A α n ) {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)} . Then, the resulting aggregation fuzzy set is constructed with the membership function as follows: μ G ( x ) = ⋁ α : x ∈ Φ α ( A α 1 , ⋯ , A α n ) α ⁡ α {\displaystyle \mu _{G}(x)=\operatorname {\bigvee } \limits _{\alpha :x\in \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{\alpha }}\alpha } For the left end-points, we need to solve the following programming problem: Φ α ( A α 1 , ⋯ , A α n ) − = min W α − i ≤ w i ≤ W α + i A α − i ≤ a i ≤ A α + i ⁡ ∑ i = 1 n w i a σ ( i ) / ∑ i = 1 n w i {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{-}=\operatorname {\min } \limits _{\begin{array}{l}W_{\alpha -}^{i}\leq w_{i}\leq W_{\alpha +}^{i}A_{\alpha -}^{i}\leq a_{i}\leq A_{\alpha +}^{i}\end{array}}\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}/\sum \limits _{i=1}^{n}{w_{i}}}} while for the right end-points, we need to solve the following programming problem: Φ α ( A α 1 , ⋯ , A α n ) + = max W α − i ≤ w i ≤ W α + i A α − i ≤ a i ≤ A α + i ⁡ ∑ i = 1 n w i a σ ( i ) / ∑ i = 1 n w i {\displaystyle \Phi _{\alpha }\left({A_{\alpha }^{1},\cdots ,A_{\alpha }^{n}}\right)_{+}=\operatorname {\max } \limits _{\begin{array}{l}W_{\alpha -}^{i}\leq w_{i}\leq W_{\alpha +}^{i}A_{\alpha -}^{i}\leq a_{i}\leq A_{\alpha +}^{i}\end{array}}\sum \limits _{i=1}^{n}{w_{i}a_{\sigma (i)}/\sum \limits _{i=1}^{n}{w_{i}}}} A fast method has been presented to solve two programming problem so that the type-1 OWA aggregation operation can be performed efficiently, for details, please see the paper. == Alpha-level approach to Type-1 OWA operation == Three-step process: Step 1—To set up the α {\displaystyle \alpha } - level resolution in [0, 1]. Step 2—For each α ∈ [ 0 , 1 ] {\displaystyle \alpha \in [0,1]} , Step 2.1—To calculate ρ α + i 0 ∗ {\displaystyle \rho _{\alpha +}^{i_{0}^{\ast }}} Let i 0 = 1 {\displaystyle i_{0}=1} ; If ρ α + i 0 ≥ A α + σ ( i 0 ) {\displaystyle \rho _{\alpha +}^{i_{0}}\geq A_{\alpha +}^{\sigma (i_{0})}} , stop, ρ α + i 0 {\displaystyle \rho _{\alpha +}^{i_{0}}} is the solution; otherwise go to Step 2.1-3. i 0 ← i 0 + 1 {\displaystyle i_{0}\leftarrow i_{0}+1} , go to Step 2.1-2. Step 2.2 To calculate ρ α − i 0 ∗ {\displaystyle \rho _{\alpha -}^{i_{0}^{\ast }}} Let i 0 = 1 {\displaystyle i_{0}=1} ; If ρ α − i 0 ≥ A α − σ ( i 0 ) {\displaystyle \rho _{\alpha -}^{i_{0}}\geq A_{\alpha -}^{\sigma (i_{0})}} , stop, ρ α − i 0 {\displaystyle \rho _{\alpha -}^{i_{0}}} is the solution; otherwise go to Step 2.2-3. i 0 ← i 0 + 1 {\displaystyle i_{0}\leftarrow i_{0}+1} , go to step Step 2.2-2. Step 3—To construct the aggregation resulting fuzzy set G {\displaystyle G} based on all the available intervals [ ρ α − i 0 ∗ , ρ α + i 0 ∗ ] {\displaystyle \left[{\rho _{\alpha -}^{i_{0}^{\ast }},\;\rho _{\alpha +}^{i_{0}^{\ast }}}\right]} : μ G ( x ) = ⋁ α : x ∈ [ ρ α − i 0 ∗ , ρ α + i 0 ∗ ] ⁡ α {\displaystyle \mu _{G}(x)=\operatorname {\bigvee } \limits _{\alpha :x\in \left[{\rho _{\alpha -}^{i_{0}^{\ast }},\;\rho _{\alpha +}^{i_{0}^{\ast }}}\right]}\alpha } == Some Examples == The type-1 OWA operator with the weights shown in the top figure is used to aggregate the fuzzy sets (solide lines) in the bottom figure, and the dashed line is the aggregation result. == Special cases == Any OWA operators, like maximum, minimum, mean operators; Join operators of (type-1) fuzzy sets, i.e., fuzzy maximum operators; Meet operators of (type-1) fuzzy sets, i.e., fuzzy minimum operators; Join-like operators of (type-1) fuzzy sets; Meet-like operators of (type-1) fuzzy sets. == Generalizations == Type-2 OWA operators have been suggested to aggregate the type-2 fuzzy sets for soft decision making. == Applications == Type-1 OWA operators have been applied to different domains for soft decision making. Improved efficiency of computing approach ; Type reduction of type-2 fuzzy sets ; Group decision making ; Credit risk evaluation ; Information fusion ; Linguistic expressions and symbolic translation ; Sentiment analysis ; Ro

    Read more →
  • Connected-component labeling

    Connected-component labeling

    Connected-component labeling (CCL), connected-component analysis (CCA), blob extraction, region labeling, blob discovery, or region extraction is an algorithmic application of graph theory, where subsets of connected components are uniquely labeled based on a given heuristic. Connected-component labeling is not to be confused with segmentation. Connected-component labeling is used in computer vision to detect connected regions in binary digital images, although color images and data with higher dimensionality can also be processed. When integrated into an image recognition system or human-computer interaction interface, connected component labeling can operate on a variety of information. Blob extraction is generally performed on the resulting binary image from a thresholding step, but it can be applicable to gray-scale and color images as well. Blobs may be counted, filtered, and tracked. Blob extraction is related to but distinct from blob detection. == Overview == A graph, containing vertices and connecting edges, is constructed from relevant input data. The vertices contain information required by the comparison heuristic, while the edges indicate connected 'neighbors'. An algorithm traverses the graph, labeling the vertices based on the connectivity and relative values of their neighbors. Connectivity is determined by the medium; image graphs, for example, can be 4-connected neighborhood or 8-connected neighborhood. Following the labeling stage, the graph may be partitioned into subsets, after which the original information can be recovered and processed . == Definition == The usage of the term connected-component labeling (CCL) and its definition is quite consistent in the academic literature, whereas connected-component analysis (CCA) varies both in terminology and in its definition of the problem. Rosenfeld et al. define connected components labeling as the “[c]reation of a labeled image in which the positions associated with the same connected component of the binary input image have a unique label.” Shapiro et al. define CCL as an operator whose “input is a binary image and [...] output is a symbolic image in which the label assigned to each pixel is an integer uniquely identifying the connected component to which that pixel belongs.” There is no consensus on the definition of CCA in the academic literature. It is often used interchangeably with CCL. A more extensive definition is given by Shapiro et al.: “Connected component analysis consists of connected component labeling of the black pixels followed by property measurement of the component regions and decision making.” The definition for connected-component analysis presented here is more general, taking the thoughts expressed in into account. == Algorithms == The algorithms discussed can be generalised to arbitrary dimensions, albeit with increased time and space complexity. === One component at a time === This is a fast and very simple method to implement and understand. It is based on graph traversal methods in graph theory. In short, once the first pixel of a connected component is found, all the connected pixels of that connected component are labelled before going onto the next pixel in the image. This algorithm is part of Vincent and Soille's watershed segmentation algorithm, other implementations also exist. In order to do that a linked list is formed that will keep the indexes of the pixels that are connected to each other, steps (2) and (3) below. The method of defining the linked list specifies the use of a depth or a breadth first search. For this particular application, there is no difference which strategy to use. The simplest kind of a last in first out queue implemented as a singly linked list will result in a depth first search strategy. It is assumed that the input image is a binary image, with pixels being either background or foreground and that the connected components in the foreground pixels are desired. The algorithm steps can be written as: Start from the first pixel in the image. Set current label to 1. Go to (2). If this pixel is a foreground pixel and it is not already labelled, give it the current label and add it as the first element in a queue, then go to (3). If it is a background pixel or it was already labelled, then repeat (2) for the next pixel in the image. Pop out an element from the queue, and look at its neighbours (based on any type of connectivity). If a neighbour is a foreground pixel and is not already labelled, give it the current label and add it to the queue. Repeat (3) until there are no more elements in the queue. Go to (2) for the next pixel in the image and increment current label by 1. Note that the pixels are labelled before being put into the queue. The queue will only keep a pixel to check its neighbours and add them to the queue if necessary. This algorithm only needs to check the neighbours of each foreground pixel once and doesn't check the neighbours of background pixels. The pseudocode is: algorithm OneComponentAtATime(data) input : imageData[xDim][yDim] initialization : label = 0, labelArray[xDim][yDim] = 0, statusArray[xDim][yDim] = false, queue1, queue2; for i = 0 to xDim do for j = 0 to yDim do if imageData[i][j] has not been processed do if imageData[i][j] is a foreground pixel do check its four neighbors(north, south, east, west) : if neighbor is not processed do if neighbor is a foreground pixel do add it to queue1 else update its status to processed end if labelArray[i][j] = label (give label) statusArray[i][j] = true (update status) while queue1 is not empty do For each pixel in the queue do : check its four neighbors if neighbor is not processed do if neighbor is a foreground pixel do add it to queue2 else update its status to processed end if give it the current label update its status to processed remove the current element from queue1 copy queue2 into queue1 end While increase the label end if else update its status to processed end if end if end if end for end for === Two-pass === Relatively simple to implement and understand, the two-pass algorithm, (also known as the Hoshen–Kopelman algorithm) iterates through 2-dimensional binary data. The algorithm makes two passes over the image: the first pass to assign temporary labels and record equivalences, and the second pass to replace each temporary label by the smallest label of its equivalence class. The input data can be modified in situ (which carries the risk of data corruption), or labeling information can be maintained in an additional data structure. Connectivity checks are carried out by checking neighbor pixels' labels (neighbor elements whose labels are not assigned yet are ignored), or say, the north-east, the north, the north-west and the west of the current pixel (assuming 8-connectivity). 4-connectivity uses only north and west neighbors of the current pixel. The following conditions are checked to determine the value of the label to be assigned to the current pixel (4-connectivity is assumed) Conditions to check: Does the pixel to the left (west) have the same value as the current pixel? Yes – We are in the same region. Assign the same label to the current pixel No – Check next condition Do both pixels to the north and west of the current pixel have the same value as the current pixel but not the same label? Yes – We know that the north and west pixels belong to the same region and must be merged. Assign the current pixel the minimum of the north and west labels, and record their equivalence relationship No – Check next condition Does the pixel to the left (west) have a different value and the one to the north the same value as the current pixel? Yes – Assign the label of the north pixel to the current pixel No – Check next condition Do the pixel's north and west neighbors have different pixel values than current pixel? Yes – Create a new label id and assign it to the current pixel The algorithm continues this way, and creates new region labels whenever necessary. The key to a fast algorithm, however, is how this merging is done. This algorithm uses the union-find data structure which provides excellent performance for keeping track of equivalence relationships. Union-find essentially stores labels which correspond to the same blob in a disjoint-set data structure, making it easy to remember the equivalence of two labels by the use of an interface method E.g.: findSet(l). findSet(l) returns the minimum label value that is equivalent to the function argument 'l'. Once the initial labeling and equivalence recording is completed, the second pass merely replaces each pixel label with its equivalent disjoint-set representative element. A faster-scanning algorithm for connected-region extraction is presented below. On the first pass: Iterate through each element of the data by column, then by row (Raster Scanning) If the element is not the background Get the neighboring elements of the current element If there are no neighbors, uniquely

    Read more →
  • Speech-generating device

    Speech-generating device

    Speech-generating devices (SGDs), also known as voice output communication aids, are electronic augmentative and alternative communication (AAC) systems used to supplement or replace speech or writing for individuals with severe speech impairments, enabling them to verbally communicate. SGDs are important for people who have limited means of interacting verbally, as they allow individuals to become active participants in communication interactions. They are particularly helpful for patients with amyotrophic lateral sclerosis (ALS) but recently have been used for children with predicted speech deficiencies. There are several input and display methods for users of varying abilities to make use of SGDs. Some SGDs have multiple pages of symbols to accommodate a large number of utterances, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices can produce electronic voice output by using digitized recordings of natural speech or through speech synthesis—which may carry less emotional information but can permit the user to speak novel messages. The content, organization, and updating of the vocabulary on an SGD is influenced by a number of factors, such as the user's needs and the contexts that the device will be used in. The development of techniques to improve the available vocabulary and rate of speech production is an active research area. Vocabulary items should be of high interest to the user, be frequently applicable, have a range of meanings, and be pragmatic in functionality. There are multiple methods of accessing messages on devices: directly or indirectly, or using specialized access devices—although the specific access method will depend on the skills and abilities of the user. SGD output is typically much slower than speech, although rate enhancement strategies can increase the user's rate of output, resulting in enhanced efficiency of communication. The first known SGD was prototyped in the mid-1970s, and rapid progress in hardware and software development has meant that SGD capabilities can now be integrated into devices like smartphones. Notable users of SGDs include Stephen Hawking, Roger Ebert, Tony Proudfoot, and Pete Frates (founder of the ALS Ice Bucket Challenge). Speech-generating systems may be dedicated devices developed solely for AAC, or non-dedicated devices such as computers running additional software to allow them to function as AAC devices. == History == SGDs have their roots in early electronic communication aids. The first such aid was a sip-and-puff typewriter controller named the patient-operated selector mechanism (Naman) prototyped by Reg Maling in the United Kingdom in 1960. POSSUM scanned through a set of symbols on an illuminated display. Researchers at Delft University in the Netherlands created the lightspot-operated typewriter (LOT) in 1970, which made use of small movements of the head to point a small spot of light at a matrix of characters, each equipped with a photoelectric cell. Although it was commercially unsuccessful, the LOT was well received by its users. In 1966, Barry Romich, a freshman engineering student at Case Western Reserve University, and Ed Prentke, an engineer at Highland View Hospital in Cleveland, Ohio, formed a partnership, creating the Prentke Romich Company. In 1969, the company produced its first communication device, a typing system based on a discarded Teletype machine. In 1979, Mark Dahmke developed software for a vocal communication aid program using the Computalker CT-1 analog speech synthesizer with a microcomputer. The software utilized phonemes to generate speech, assisting individuals with communication impairments in constructing words and sentences. Dahmke's work contributed to the advancement of assistive technology for people with disabilities. Notably, he designed the "Vocabulary Management System" for Bill Rush, a student with cerebral palsy. This early speech synthesis technology facilitated improved communication for Rush and was featured in a 1980 issue of LIFE Magazine. Dahmke's contributions have influenced the development of augmentative and alternative communication (AAC) technologies. During the 1970s and early 1980s, several other companies emerged that have since become prominent manufacturers of SGDs. Toby Churchill founded Toby Churchill Ltd in 1973, after losing his speech following encephalitis. In the US, Dynavox (then known as Sentient Systems Technology) grew out of a student project at Carnegie-Mellon University, created in 1982 to help a young woman with cerebral palsy to communicate. Beginning in the 1980s, improvements in technology led to a greatly increased number, variety, and performance of commercially available communication devices, and a reduction in their size and price. Alternative methods of access such as Target Scanning (also known as eye pointing) calibrate the movement of a user's eyes to direct an SGD to produce the desired speech. Scanning, in which alternatives are presented to the user sequentially, became available on communication devices. Speech output possibilities included both digitized and synthesized speech. Rapid progress in hardware and software development continued, including projects funded by the European Community. The first commercially available dynamic screen speech-generating devices were developed in the 1990s. Software was developed that allowed the computer-based production of communication boards. High-tech devices have continued to become smaller and lighter, while increasing accessibility and capability; communication devices can be accessed using eye-tracking systems, perform as a computer for word-processing and Internet use, and as an environmental control device for independent access to other equipment such as TV, radio and telephones. Stephen Hawking came to be associated with the unique voice of his particular synthesis equipment. Hawking was unable to speak due to a combination of disabilities caused by ALS, and an emergency tracheotomy. In the past 20 or so years SGD have gained popularity amongst young children with speech deficiencies, such as autism, Down syndrome, and predicted brain damage due to surgery. Starting in the early 2000s, specialists saw the benefit of using SGDs not only for adults but for children, as well. Neuro-linguists found that SGDs were just as effective in helping children who were at risk for temporary language deficits after undergoing brain surgery as it is for patients with ALS. In particular, digitized SGDs have been used as communication aids for pediatric patients during the recovery process. == Access methods == There are many methods of accessing messages on devices: directly, indirectly, and with specialized access devices. Direct access methods involve physical contact with the system, by using a keyboard or a touch screen. Users accessing SGDs indirectly and through specialized devices must manipulate an object in order to access the system, such as maneuvering a joystick, head mouse, optical head pointer, light pointer, infrared pointer, or switch access scanner. The specific access method will depend on the skills and abilities of the user. With direct selection a body part, pointer, adapted mouse, joystick, or eye tracking could be used, whereas switch access scanning is often used for indirect selection. Unlike direct selection (e.g., typing on a keyboard, touching a screen), users of Target Scanning can only make selections when the scanning indicator (or cursor) of the electronic device is on the desired choice. Those who are unable to point typically calibrate their eyes to use eye gaze as a way to point and blocking as a way to select desired words and phrases. The speed and pattern of scanning, as well as the way items are selected, are individualized to the physical, visual and cognitive capabilities of the user. == Message construction == Augmentative and alternative communication is typically much slower than speech, with users generally producing 8–10 words per minute. Rate enhancement strategies can increase the user's rate of output to around 12–15 words per minute, and as a result enhance the efficiency of communication. In any given SGD there may be a large number of vocal expressions that facilitate efficient and effective communication, including greetings, expressing desires, and asking questions. Some SGDs have multiple pages of symbols to accommodate a large number of vocal expressions, and thus only a portion of the symbols available are visible at any one time, with the communicator navigating the various pages. Speech-generating devices generally display a set of selections either using a dynamically changing screen, or a fixed display. There are two main options for increasing the rate of communication for an SGD: encoding, and prediction. Encoding permits a user to produce a word, sentence or phrase using only on

    Read more →
  • Sinewave synthesis

    Sinewave synthesis

    Sinewave synthesis, or sine wave speech, is a technique for synthesizing speech by replacing the formants (main bands of energy) with pure tone whistles. The first sinewave synthesis program (SWS) for the automatic creation of stimuli for perceptual experiments was developed by Philip Rubin at Haskins Laboratories in the 1970s. This program was subsequently used by Robert Remez, Philip Rubin, David Pisoni, and other colleagues to show that listeners can perceive continuous speech without traditional speech cues, i.e., pitch, stress, and intonation. This work paved the way for a view of speech as a dynamic pattern of trajectories through articulatory-acoustic space.

    Read more →
  • The Way (novel series)

    The Way (novel series)

    The Way series is a trilogy of science fiction novels and one short story by American author Greg Bear published from 1985 to 1999. The first novel was Eon (1985), followed by a sequel, Eternity and a prequel, Legacy. It also includes The Way of All Ghosts, a short story that falls between Legacy and Eon. == Novels == === Eon === Eon chronicles the appearance and discovery of the Thistledown, and its subsequent effect on humanity. In the early 21st century, the United States and the USSR are on the verge of nuclear war. In that tense political climate, an asteroid appears out of near space after an unusual supernova and settles into an extremely elliptical orbit near Earth orbit. The two nations each try to claim this mysterious object, which appears to be a virtual duplicate of Juno. It is hollow and contains seven vast terraformed chambers. Two of the chambers contain cities long abandoned by human beings who seemed to come from Earth's future. The asteroid is called the Thistledown by its builders. A startling discovery is that it is bigger inside than outside. The seventh chamber appears to stretch into infinity. The human inhabitants of the Thistledown come from an alternate timeline, approximately 1000 years in the future. In their timeline, human civilization was nearly destroyed by the "Death", a calamitous World War involving nuclear weapons. The Death occurred at approximately the same time as the appearance of the Thistledown in the present time. Its presence threatens to cause the Death to occur on the current timeline as well. An expedition is sent down the seemingly infinite seventh chamber (The "Way", as it is known) where it encounters the descendants of humanity. The high technology of this civilization, known as the Hexamon, has control over genetic engineering, human augmentation, and matter itself. The Hexamon includes several alien species who have come to live with humanity's descendants. The Hexamon itself is at war with an alien race known as the Jarts from further down the corridor still. In 2007, CGSociety organised a "CG Challenge" based upon Eon === Eternity === Jarts, politics, and technology make up the second book in the series: Eternity. The Jart religion is based on the preservation of all data, which encompasses all life forms, past and present, and sending that data to the Jarts' future masters, their descendants. === Legacy === In the third book (a prequel, set in the time before Eon), Legacy, soldier Olmy ap Sennon is sent to spy on a group of dissidents who have used the spacetime tunnel of "the Way" (introduced in Eon) to colonize the alien world of Lamarckia, a planet with an ecosystem that learns from its changed environment in a way that resembles Lamarckian evolution. Its plants and animals turn out to actually be parts of continent-sized organisms. === "The Way of All Ghosts" === In the short story "The Way of All Ghosts" soldier Olmy ap Sennon is sent to close a lesion that formed out of a wayward gate into perfection. This story was published in 1999 in Far Horizons. == Fictional history of the Thistledown == Within the universe of The Way, the Thistledown is an asteroid starship built by hollowing out Juno and fitting it with mass-driver (rail gun) engines and thermonuclear drives. Inside the asteroid, seven giant "Chambers" are built, of which two host cities for the inhabitants, while others host machinery and recreation areas. The asteroid is prepared 500 years in the future, as told in Bear's novel Eon, and is engaged on a multi-generational journey to Epsilon Eridani, around which a habitable planet is known to circle. The journey is meant to take 60 years, as the ship can only maintain a velocity of 20% the speed of light. This limitation is removed after the technology of the Thistledown was improved to include inertial dampeners, allowing higher accelerations. Inhabiting the Thistledown are the best and brightest of Earth, who are quite diverse both culturally and politically. The Thistledown's society includes one transcendent genius, Konrad Korzenowski, whose preference for living in the Thistledown as compared with an outer universe, causes him to experiment with closed-geodesic space time in the Seventh Chamber, 20 years into the Thistledown's voyage. The results of his experiments are shattering in the extreme: He creates a unique pocket universe: The Way. == The Way == === Origin === The eponymous Way is an extension of the 7th Chamber, and was formed in the novels using the machinery of the 6th Chamber. This machinery is a selective inertial damper, developed by engineers within the Thistledown with twofold purpose—to permit the Thistledown to accelerate to the limit of its engines (up to 99% the speed of light) and to selectively dampen inertia within the vessel, e.g., water within waterways, high velocity train systems. The inertial dampening machinery within the 6th Chamber is anchored to the structure of the Thistledown, equally spaced around the chamber at the vertices of a regular heptagon. === Creation === At the creation, and rejoining of the Way to the Thistledown, the character Konrad Korzenowski and his engineers designed and 'built' the Way out of the in-folded geodesics of the inertial dampening field of the 6th Chamber machinery. This is described in the books by first considering the inertial dampening field: Within the Thistledown, the field envelops the asteroid, effectively isolating it from the Einsteinian Metrical Frame, permitting relative inertia to be ignored. The Thistledown was, at the time of activation, isolated from its continuum, but only selectively. Its matter and energy anchored it to its continuum and relative time, but its geometry and quantum entanglement had been strained by the inertial dampener, thus making it susceptible to superspace distortions, and therefore it could be affected by them negatively. Korzenowski, having been influenced by the earlier work of Vazquez on Earth, and in developing her work within the Thistledown, planned a radical extension of the inertial field of the 6th Chamber - effectively extending the field away to an infinite extent within the 7th Chamber. In order to do this effectively, he and his engineers modified a set of semi-sentient field calibration tools to build the first Clavicles. Unlike the field calibration tools from which they were descended, the Clavicles possessed the ability not only to manipulate the field, but extend it as an extension of the will of the operator. Already radical enough, Korzenowski and his team went further. By extending the field of the 6th Chamber from within the 7th Chamber of the Thistledown, they could then directly access what Vasquez had calculated within her own work—alternate world lines as non-gravity bent geodesics of superspace. Korzenowski thus 'felt' superspace within the 7th Chamber, selecting the infinite selection of possible alternate pocket universes accessible by the Clavicle to form, as a sheer act of will, the Way from his designs and his vision. The resulting structure was constructed, not of matter, but of previously in-folded superspace vectors now infinitely extended. (in the manner of Schwarzschild folded geometry, or of an asymptotic curve.) The Way was thus opened. The Way's geometry also gave rise to the Flaw - as superspace geometry of the field boundary was extended infinitely, so the folded geodesics of the field unfold in the geometric centre of the Way to form a singularity. This singularity, the Flaw, rests within the Way's plasma tube (which in turn is sustained by the Flaw). The Flaw 'produces' gravity by actively repulsing matter away from itself in an acceleration at the square of the distance away from itself. In addition, any object encircling the Flaw, and then exerting pressure against it, experiences this pressure as a translation force along the Flaw's length perpendicular to the direction of force. The motion thus induced is controllable by the angle at which an annular ring enclosure is pressed against the Flaw. The same spatial transform also can be used to turn tip turbines in order to generate electricity. The Flaw permits a violation of the First Law of Thermodynamics, therefore defining the Way as a perpetual motion machine of the First Order, making energy out of nothing. === Early history === The Way, as formed, was described by Bear as being in vacuum and did not consist of matter within its infinite length. Due to extremely slight ambiguity involved in its creation, the synchronicity between time within the Way, and within the Thistledown, was not exact. Thus, the Engineers spend two decades working to correct these faults using the Clavicles to manipulate the junction between Way and Thistledown. During this period, ambition led Korzenowksi to use the clavicle to open the first exploratory gate within the way, leading to the universe of the Jarts. Though the gate to Jart world was closed, the advanced Jarts neve

    Read more →
  • AI Overviews

    AI Overviews

    AI Overviews is an artificial intelligence (AI) feature integrated into Google Search that produces AI-generated summaries of search results. The feature has been criticized for its inaccuracy and for reducing website traffic. == History and development == AI Overviews were first introduced as part of Google's Search Generative Experience (SGE), which was unveiled at the Google I/O conference in May 2023. In May 2024 at Google I/O 2024, the feature was rebranded as AI Overviews and launched in the United States. The introduction of AI Overviews was seen as a strategic move to compete with other generative AI advancements, including OpenAI's ChatGPT. By August 2024, AI Overviews was rolled out to several other countries, including the United Kingdom, India, Japan, Brazil, Mexico, and Indonesia, with support for multiple languages. In October 2024, Google expanded the feature globally, making it available in over 100 countries. In December 2024, Botify x Demandsphere released findings stating that when AI Overviews and featured snippets appear together on the search engine results page, they take up approximately 67.1% of the screen on desktop and 75.7% on mobile. Even if content is ranking in the #1 position, it may not be visible to consumers if other visual elements on the results page are more prominent. In March 2025, Google started testing an "AI Mode", where the search results page is AI-generated. The company was also considering adding advertisements to the AI Mode, as they already exist in AI Overviews. As of May 2025, AI Overviews are available in over 200 countries and territories and in more than 40 languages. As of March 2026, Google AI Overviews appear on more than 48% of total Google Search queries, compared to just 6.49% in the previous year (58% year-over-year growth). == Functionality == The AI Overviews feature uses large language models to generate summaries from web content. The overviews are designed to be concise, providing a snapshot of relevant information about the queried topic. Google allows users to adjust the language complexity in summaries, offering both simplified and detailed options. The overviews also include links to sources. According to a June 2025 study by Semrush, the most cited source is Quora, followed by Reddit. == Reception == The feature has faced criticism for inaccuracies, including instances where erroneous or nonsensical content was generated. Depending on what is searched for, the overview may also consist of hallucinated content, such as when searching for idioms that do not exist. In May 2024, Google temporarily restricted the AI tool after it provided suggestions that were seen as nonsensical and harmful, such as telling users to eat rocks or apply glue on pizza. Concerns were also raised by content publishers, who feared a decline in web traffic as users relied on the summaries instead of visiting source websites. A Google patent from 2026 raised the concern of webmasters that Google could entirely replace the landing page of websites by an AI optimized copy of the website in its results. There is also apprehension about the ethical implications of AI-driven content aggregation, including its impact on intellectual property rights and the visibility of smaller content providers. The European Commission announced in December 2025 that they were investigating whether AI Overviews breached European competition law. In response, Google has stated its commitment to improve content validation and refine the algorithms used to filter unreliable information. Google implemented measures to prioritize link placement within AI Overviews, aiming to balance user convenience with the needs of content creators. In January 2026, Google restricted AI Overviews on certain health-related searches following an investigation by The Guardian. == Lawsuits == On February 24, 2025, Chegg sued Alphabet over the AI Overviews feature, claiming that it was leading to students preferring "low-quality, unverified AI summaries", thus violating antitrust law. Chegg also said it was considering either a sale or a take-private transaction. In September 2025, Penske Media Corporation, the publisher of Rolling Stone and The Hollywood Reporter, sued Google, claiming that AI Overviews illegally regurgitate content from their websites and drive off potential site visitors by always appearing on top of the search results while leaving little incentive to see the linked sources. The company stated that "the future of digital media and [...] its integrity [...] is threatened by Google's current actions", alleging that 20% of searches that link to Penske-owned websites show AI Overviews and that the figure is expected to rise. Google spokesperson José Castañeda called the claims "meritless" and stated that "AI Overviews send traffic to a greater diversity of sites." In 2026, Canadian musician Ashley MacIsaac filed a lawsuit against Google claiming that the AI Overview feature had wrongly stated that MacIsaac had been convicted of numerous criminal offences and was on the sex offender registry. He claims this incorrect information led to the cancellation of a December 2025 gig organized by the Sipekne'katik First Nation.

    Read more →
  • Transhuman Space

    Transhuman Space

    Transhuman Space (THS) is a role-playing game by David Pulver, published by Steve Jackson Games as part of the "Powered by GURPS" (Generic Universal Role-Playing System) line. Set in the year 2100, humanity has begun to colonize the Solar System. The pursuit of transhumanism is now in full swing, as more and more people reach fully posthuman states. In 2002, the Transhuman Space adventure "Orbital Decay" received an Origins Award nomination for Best Role-Playing Game Adventure. Transhuman Space won the 2003 Grog d'Or Award for Best Role-playing Game, Game Line or RPG Setting. == Setting == The game assumes that no cataclysm — natural or human-induced — swept Earth in the 21st century. Instead, constant developments in information technology, genetic engineering, nanotechnology and nuclear physics generally improved condition of the average human life. Plagues of the 20th century (like cancer or AIDS) have been suppressed, the ozone layer is being restored and Earth's ecosystems are recovering (although thermal emission by fusion power plants poses an environmental threat—albeit a much lesser one than previous sources of energy). Thanks to modern medicine humans live biblical timespans surrounded by various artificially intelligent helper applications and robots (cybershells), sensory experience broadcasts (future TV) and cyberspace telepresence. Thanks to cheap and clean fusion energy humanity has power to fuel all these wonders, restore and transform its home planet and finally settle on other heavenly bodies. Human genetic engineering has advanced to the point that anyone—single individuals, same-sex couples, groups of three or more—can reproduce. The embryos can be allowed to be developed naturally, or they can undergo three levels of tinkering: 1. Genefixing, which corrects defects; 2. Upgrades, which boost natural abilities (Ishtar Upgrades are slightly more attractive than usual, Metanoia Upgrades are more intelligent, etc.); and... 3. Full transition to parahuman status (Nyx Parahumans only need a few hours of sleep per week, Aquamorphs can live underwater, etc.) Another type of human genetic engineering, far more controversial, is the creation of bioroids, fully sentient slave races. People can "upload" by recording the simulation of their brains on computer disks. The emulated individual then becomes a ghost, an infomorph very easily confused with "sapient artificial intelligence". However, this technology has several problems as the solely available "brainpeeling" technique is fatal to the original biological lifeform being simulated, has a significant failure rate and the philosophical questions regarding personal identity remain equivocal. Any infomorph, regardless of its origin, can be plugged into a "cybershell" (robotic or cybernetic body), or a biological body, or "bioshell". Or, the individual can illegally make multiple "xoxes", or copies of themselves, and scatter them throughout the system, exponentially increasing the odds that at least one of them will live for centuries more, if not forever. This is also a time of space colonization. First, humanity (specifically China, followed by the United States and others) colonized Mars in a fashion resembling that outlined in the Mars Direct project. The Moon, Lagrangian points, inner planets and asteroids soon followed. In the late 21st century even some of Saturn's moons have been settled as a base for that planet's Helium-3 scooping operations. Transhuman Space's setting is neither utopia nor dystopia, however: several problems have arisen from these otherwise beneficial developments. The generation gap has become a chasm as lifespans increase. No longer do the elite fear death, and no longer can the young hope to replace them. While it seemed that outworld colonies would offer accommodation and work for those young ones, they are being replaced by genetically tailored bioroids and AI-powered cybershells. The concept of humanity is no longer clear in a world where even some animals speak of their rights and the dead haunt both cyberspace and reality (in form of infomorph-controlled bioshells or cybershells). And the wonders of high science are not universally shared — some countries merely struggle with informatization while others suffer from nanoplagues, defective drugs, implants and software tested on their populace. In some poor countries high-tech tyrants oppress their backward people. And in outer space all sort of modern crime thrives, barely suppressed by military forces. == Publication history == After the initial set of GURPS books that were published using the GURPS Lite, later publications such as Transhuman Space by David Pulver were labelled simply "Powered by GURPS" without using the name "GURPS" in the book title. Transhuman Space received a significant amount of supporting publications, and was the largest original background setting that Steve Jackson Games produced in 15 years. Shannon Appelcline noted that by its inclusion of posthuman characters, the book began to show the limits of the GURPS system as it was, which is something that Pulver would address soon thereafter. Steve Jackson Games has not updated the core book (GURPS Transhuman Space) to 4th edition, although the supplement Transhuman Space: Changing Times provides a path for migrating to 4th edition. It has produced several 4th edition supplements for the setting: Transhuman Space: Bioroid Bazaar, Transhuman Space: Cities on the Edge, Transhuman Space: Martial Arts 2100, Transhuman Space: Personnel Files 2-5, Transhuman Space: Shell-Tech, GURPS Spaceships 8: Transhuman Spacecraft, Transhuman Space: Transhuman Mysteries, and Transhuman Space: Wings of the Rising Sun. == Reception == In a review of Transhuman Space in Black Gate, William Stoddard said "Transhuman Space was a richly detailed setting; if it had imperfections, it had enough depth to make up for them. I think it has the potential to become a classic in its field. Perhaps a campaign set in its default start year of 2100 could leave the early twenty-first century blurry enough to avoid obvious incongruities." == Reviews == Review in Vol. 20, No. 1 of Prometheus, the journal of the Libertarian Futurist Society.

    Read more →
  • Model collapse

    Model collapse

    Model collapse, also known by other names such as "AI inbreeding", "AI cannibalism", "Habsburg AI", and "model autophagy disorder" or "MAD" is a phenomenon noted in artificial intelligence studies, where machine learning models gradually degrade due to errors coming from uncurated synthetic data, or due to training on the outputs of another model such as prior versions of itself. It is unclear to what extent the phenomenon threatens the long-term development of such models, and some techniques have been proposed to mitigate the effect. == Characteristics == Shumailov et al. coined the term to describe two specific stages to the degradation of machine learning models: early model collapse and late model collapse: In early model collapse, the model begins losing information about the tails of the distribution – mostly affecting minority data. Later work highlighted that early model collapse is hard to notice, since overall performance may appear to improve, while the model loses performance on minority data. In late model collapse, the model loses a significant proportion of its performance, confusing concepts and losing most of its variance. == Mechanism == Using synthetic data as training data can lead to issues with the quality and reliability of the trained model. Model collapse occurs for three main reasons: functional approximation errors sampling errors learning errors Importantly, it happens in even the simplest of models, where not all of the error sources are present. In more complex models the errors often compound, leading to faster collapse. == Disagreement over real-world impact == Some researchers and commentators on model collapse warn that the phenomenon could fundamentally threaten future generative AI development: As AI-generated data is shared on the Internet, it will inevitably end up in future training datasets, which are often crawled from the Internet. If training on "slop" (large quantities of unlabeled synthetic data) inevitably leads to model collapse, this could therefore pose a difficult problem. However, recently, other researchers have disagreed with this argument, showing that if synthetic data accumulates alongside human-generated data, model collapse is avoided. The researchers argue that data accumulating over time is a more realistic description of reality than deleting all existing data every year, and that the real-world impact of model collapse may not be as catastrophic as feared. An alternative branch of the literature investigates the use of machine learning detectors and watermarking to identify model generated data and filter it out. == Mathematical models of the phenomenon == === 1D Gaussian model === In 2024, a first attempt has been made at illustrating collapse for the simplest possible model — a single dimensional normal distribution fit using unbiased estimators of mean and variance, computed on samples from the previous generation. To make this more precise, we say that original data follows a normal distribution X 0 ∼ N ( μ , σ 2 ) {\displaystyle X^{0}\sim {\mathcal {N}}(\mu ,\sigma ^{2})} , and we possess M 0 {\displaystyle M_{0}} samples X j 0 {\displaystyle X_{j}^{0}} for j ∈ { 1 , … , M 0 } {\displaystyle j\in {\{\,1,\dots ,M_{0}\,{}\}}} . Denoting a general sample X j i {\displaystyle X_{j}^{i}} as sample j ∈ { 1 , … , M i } {\displaystyle j\in {\{\,1,\dots ,M_{i}\,{}\}}} at generation i {\displaystyle i} , then the next generation model is estimated using the sample mean and variance: μ i + 1 = 1 M i ∑ j X j i ; σ i + 1 2 = 1 M i − 1 ∑ j ( X j i − μ i + 1 ) 2 . {\displaystyle \mu _{i+1}={\frac {1}{M_{i}}}\sum _{j}X_{j}^{i};\quad \sigma _{i+1}^{2}={\frac {1}{M_{i}-1}}\sum _{j}(X_{j}^{i}-\mu _{i+1})^{2}.} Leading to a conditionally normal next generation model X j i + 1 | μ i + 1 , σ i + 1 ∼ N ( μ i + 1 , σ i + 1 2 ) {\displaystyle X_{j}^{i+1}|\mu _{i+1},\;\sigma _{i+1}\sim {\mathcal {N}}(\mu _{i+1},\sigma _{i+1}^{2})} . In theory, this is enough to calculate the full distribution of X j i {\displaystyle X_{j}^{i}} . However, even after the first generation, the full distribution is no longer normal: It follows a variance-gamma distribution. To continue the analysis, instead of writing the probability density function at each generation, it is possible to explicitly construct them in terms of independent random variables using Cochran's theorem. To be precise, μ 1 {\displaystyle \mu _{1}} and σ 1 {\displaystyle \sigma _{1}} are independent, with μ 1 ∼ N ( μ , σ 2 M 0 ) {\displaystyle \mu _{1}\sim {\mathcal {N}}\left(\mu ,{\frac {\sigma ^{2}}{M_{0}}}\right)} and ( M 0 − 1 ) σ 1 2 ∼ σ 2 Γ ( M 0 − 1 2 , 1 2 ) {\displaystyle (M_{0}-1)\,\sigma _{1}^{2}\sim \sigma ^{2}\,\Gamma \left({\frac {M_{0}-1}{2}},{\frac {1}{2}}\right)} , following a Gamma distribution. Denoting with Z {\displaystyle Z} Gaussian random variables distributed according to N ( 0 , 1 ) {\displaystyle {\mathcal {N}}(0,1)} and with S i {\displaystyle S^{i}} random variables distributed with 1 M i − 1 − 1 Γ ( M i − 1 − 1 2 , 1 2 ) {\displaystyle {\frac {1}{M_{i-1}-1}}\Gamma \left({\frac {M_{i-1}-1}{2}},{\frac {1}{2}}\right)} , it turns out to be possible to write samples at each generation as X j 0 = μ + σ Z j 0 , {\textstyle X_{j}^{0}=\mu +\sigma Z_{j}^{0},} X j 1 = μ + σ M 0 Z 1 + σ S 1 Z j 1 , {\textstyle X_{j}^{1}=\mu +{\frac {\sigma }{\sqrt {M_{0}}}}Z^{1}+\sigma {\sqrt {S^{1}}}Z_{j}^{1},} and more generally X j n = μ + σ M 0 Z 1 + σ M 1 S 1 Z 2 + ⋯ + σ M n − 1 S 1 × ⋯ × S n − 1 Z n + σ S 1 × ⋯ × S n Z j n . {\displaystyle X_{j}^{n}=\mu +{\frac {\sigma }{\sqrt {M_{0}}}}Z^{1}+{\frac {\sigma }{\sqrt {M_{1}}}}{\sqrt {S^{1}}}Z^{2}+\dots +{\frac {\sigma }{\sqrt {M_{n-1}}}}{\sqrt {S^{1}\times \dots \times S^{n-1}}}Z^{n}+\sigma {\sqrt {S^{1}\times \dots \times S^{n}}}Z_{j}^{n}.} Note, that these are not joint distributions, as Z n {\displaystyle Z^{n}} and S n {\displaystyle S^{n}} depend directly on Z j n − 1 {\displaystyle Z_{j}^{n-1}} , but when considering X j n {\displaystyle X_{j}^{n}} on its own the formula above provides all the information about the full distribution. To analyse the model collapse, we can first calculate variance and mean of samples at generation n {\displaystyle n} . This would tell us what kind of distributions we expect to arrive at after n {\displaystyle n} generations. It is possible to find its exact value in closed form, but the mean and variance of the square root of gamma distribution are expressed in terms of gamma functions, making the result quite clunky. Following, it is possible to expand all results to second order in each of 1 / M i {\displaystyle 1/M_{i}} , assuming each sample size to be large. It is then possible to show that 1 σ 2 Var ⁡ ( X j n ) = 1 M 0 + 1 M 1 + ⋯ + 1 M n − 1 + 1 + O ( M i − 2 ) . {\displaystyle {\frac {1}{\sigma ^{2}}}\operatorname {Var} (X_{j}^{n})={\frac {1}{M_{0}}}+{\frac {1}{M_{1}}}+\dots +{\frac {1}{M_{n-1}}}+1+{\mathcal {O}}\left(M_{i}^{-2}\right).} And if all sample sizes M i = M {\displaystyle M_{i}=M} are constant, this diverges linearly as n → ∞ {\displaystyle n\to \infty } : Var ⁡ ( X j n ) = σ 2 ( 1 + n M ) ; E ( X j n ) = μ . {\displaystyle \operatorname {Var} (X_{j}^{n})=\sigma ^{2}\left(1+{\frac {n}{M}}\right);\quad \mathbb {E} (X_{j}^{n})=\mu .} This is the same scaling as for a single dimensional Gaussian random walk. However, divergence of the variance of X j n {\displaystyle X_{j}^{n}} does not directly provide any information about the corresponding estimates of μ n + 1 {\displaystyle \mu _{n+1}} and σ n + 1 {\displaystyle \sigma _{n+1}} , particularly how different they are from the original μ {\displaystyle \mu } and σ {\displaystyle \sigma } . It turns out to be possible to calculate the distance between the true distribution and the approximated distribution at step n + 1 {\displaystyle n+1} , using the Wasserstein-2 distance (which is also sometimes referred to as risk): E [ W 2 2 ( N ( μ , σ 2 ) , N ( μ n + 1 , σ n + 1 2 ) ) ] = 3 2 σ 2 ( 1 M 0 + 1 M 1 + ⋯ + 1 M n ) + O ( M i − 2 ) , {\displaystyle \mathbb {E} \left[\mathbb {W} _{2}^{2}\left({\mathcal {N}}(\mu ,\sigma ^{2}),{\mathcal {N}}(\mu _{n+1},\sigma _{n+1}^{2})\right)\right]={\frac {3}{2}}\sigma ^{2}\left({\frac {1}{M_{0}}}+{\frac {1}{M_{1}}}+\dots +{\frac {1}{M_{n}}}\right)+{\mathcal {O}}\left(M_{i}^{-2}\right),} Var ⁡ [ W 2 2 ( N ( μ , σ 2 ) , N ( μ n + 1 , σ n + 1 2 ) ) ] = 1 2 σ 4 ( 3 M 0 2 + 3 M 1 2 + ⋯ + 3 M n 2 + ∑ i ≠ j 4 M i M j ) + O ( M i − 3 ) . {\displaystyle \operatorname {Var} \left[\mathbb {W} _{2}^{2}\left({\mathcal {N}}(\mu ,\sigma ^{2}),{\mathcal {N}}(\mu _{n+1},\sigma _{n+1}^{2})\right)\right]={\frac {1}{2}}\sigma ^{4}\left({\frac {3}{M_{0}^{2}}}+{\frac {3}{M_{1}^{2}}}+\dots +{\frac {3}{M_{n}^{2}}}+\sum _{i\neq j}{\frac {4}{M_{i}M_{j}}}\right)+{\mathcal {O}}\left(M_{i}^{-3}\right).} This directly shows why model collapse occurs in this simple model. Due to errors from re-sampling the approximated distribution, each generation ends up corresponding to a

    Read more →