Nearest centroid classifier

Nearest centroid classifier

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied to text classification using word vectors containing tfidf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback. An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors. == Algorithm == === Training === Given labeled training samples { ( x → 1 , y 1 ) , … , ( x → n , y n ) } {\displaystyle \textstyle \{({\vec {x}}_{1},y_{1}),\dots ,({\vec {x}}_{n},y_{n})\}} with class labels y i ∈ Y {\displaystyle y_{i}\in \mathbf {Y} } , compute the per-class centroids μ → ℓ = 1 | C ℓ | ∑ i ∈ C ℓ x → i {\displaystyle \textstyle {\vec {\mu }}_{\ell }={\frac {1}{|C_{\ell }|}}{\underset {i\in C_{\ell }}{\sum }}{\vec {x}}_{i}} where C ℓ {\displaystyle C_{\ell }} is the set of indices of samples belonging to class ℓ ∈ Y {\displaystyle \ell \in \mathbf {Y} } . === Prediction === The class assigned to an observation x → {\displaystyle {\vec {x}}} is y ^ = arg ⁡ min ℓ ∈ Y ‖ μ → ℓ − x → ‖ {\displaystyle {\hat {y}}={\arg \min }_{\ell \in \mathbf {Y} }\|{\vec {\mu }}_{\ell }-{\vec {x}}\|} .

Test data management

Test data management (TDM) is a process in software testing concerned with the creation, preparation, and control of data used for testing software systems. It involves supplying datasets required to execute test cases and verifying system behaviour under defined conditions. Test data management is an integral part of the software development lifecycle (SDLC) and is utilized in both manual and automated testing processes. It is applied in environments that use continuous integration and DevOps practices, where test execution requires consistent and repeatable data conditions. == Overview == Test data management includes the generation, selection, and preparation of data for testing purposes, as well as its distribution across test environments. It also involves controlling data versions and ensuring that datasets correspond to specific test scenarios. In many cases, production data is adapted for testing through techniques such as masking or subsetting to reduce size and remove sensitive content. Test data management ensures that test cases are executed with relevant, consistent, and readily available data. This reduces variability in test results and supports reproducibility across test cycles. == Importance == The role of test data management has expanded with the growth of complex, data-driven systems and regulatory requirements governing data usage. Testing often depends on data that reflects real-world conditions, but direct use of production data may introduce security and privacy risks. As a result, organizations apply methods such as data masking and anonymization to meet compliance requirements, including those set by the California Privacy Rights Act (CPRA) and Europe’s General Data Protection Regulation (GDPR). Inadequate control of test data can lead to incomplete test coverage, unreliable test results, or delays in testing processes due to unavailable or inconsistent datasets. == Techniques and tools == Test data management leverages various techniques for preparing and controlling data used in testing. These include the generation of synthetic data, the extraction of subsets from production datasets, and the modification of data to remove or obscure sensitive information. A key technical requirement in these processes is maintaining referential integrity, or ensuring that relationships between data entities remain consistent across different tables and systems after masking or subsetting. Data virtualization is also used to provide access to datasets without full replication. These methods may be implemented using software tools that automate data preparation, masking, and distribution.

SmartAction

SmartAction Company LLC is a U.S.-based software company that develops artificial intelligence–driven virtual agents for customer service applications, including voice-based interactive voice response (IVR) systems, chat, and SMS. The company was founded in 2009 by inventor and entrepreneur Peter Voss and is headquartered in Fort Worth, Texas. == History == In 2001, Peter Voss founded Adaptive AI, Inc., a research and development company focused on artificial intelligence concepts. In 2009, Voss founded SmartAction Company, LLC to commercialize customer-service automation software derived from this work. The company’s initial products focused on automating inbound and outbound calls for contact center environments. In November 2022, Kyle Johnson was appointed chief executive officer, succeeding Gary Davis, who had served as CEO since 2020. In 2024, SmartAction was acquired by Capacity, an AI-powered customer support automation company based in St. Louis, Missouri. == Technology == SmartAction develops cloud-based voice automation software that integrates speech recognition and natural language processing to support automated customer interactions in contact center environments. The platform supports automated handling of common customer service tasks and is designed to integrate with enterprise systems.

Vidby

Vidby AG (stylized in lower-case) is a start-up based in Rotkreuz, Switzerland specializing in AI language translation for videos. Founded by Alexander Konovalov (uk:Олександр Коновалов) and Eugen von Rubinberg in September 2021, the company has especially garnered attention for its use in translating speeches given by President Volodymyr Zelenskyy during the Russian invasion of Ukraine. == History == Vidby AG was founded by Alexander Konovalov and Eugen von Rubinberg. Konovalov is a native of Ukraine and retains Ukrainian citizenship; Rubinberg came to Switzerland from Germany and holds German citizenship. Both are residents of Switzerland. The latter founded his first business, a trading company, at age 16. In 2013, the business partners launched a consumer-oriented video-call translation service called DROTR (Droid Translator) AG, utilizing a Konovalov-created AI-powered language translation technology enabling simultaneous translation of messages, voice and video calls in 104 languages (written), with 44 available in spoken form. This was the world's first video calling app with translation. The technology was pronounced a competitor of Skype and Viber by Forbes and claimed first prize at the "Innovative Breakthrough 2013" Competition. In 2021, with a new business-oriented focus, DROTR became Vidby, with the former Google technology partners Konovalov and Rubinberg remaining at the helm, each with the title Co-CEO. While headquartered in Switzerland, Vidby's development team is, according to the company's founders, based in Ukraine. The technology behind Vidby has an accuracy level variously reported as up to 99 percent or 99 to 100 percent, equalling the highest level of human translation. Additionally, the technology is capable of removing the original language while maintaining ambient sounds. Currently, some 70 languages plus 60 dialects are possible with the algorithm-based technology. == Notable use == In addition to its use with speeches delivered by Pope Francis, the technology has been provided to Ukrainian authorities and embassies during the ongoing military conflict with Russia free of remuneration. By July, 2022, some 70 speeches given by President Zelenskyy totalling 650 minutes had been translated into 30 languages, for a total of over 10,000 minutes of video material. Of its use in translating Zelenskyy's wartime speeches, Konovalov has said, "Like any citizen, I want to help defend my country." Notable corporate clients of Vidby include Samsung, Siemens, Cisco, Kärcher, Generali and McDonald's Corporation; an academic client is Harvard University. Google Cloud Technology Partner status of Vidby was confirmed officially after a six-month audit in December 2022. Denys Krasnikov, a Vidby co-founder, is responsible for cooperation with Google, YouTube, Microsoft, and other key partners. After the launch of multilingual YouTube channels, Vidby started AI translating and dubbing creators' videos for this new type of channel at the end of February 2023. == Accolades == Vidby headed a list of the five best video translation services as named by TechRadar Deutschland in September, 2022. In the same month, Tech Times named Vidby #1 in their list of the five best such services. It similarly topped a list of the five best content translation technologies as judged by European Business Review in October, 2022. Prior to these lead-position rankings (August, 2022), it was featured as Business Insider's special start-up recommendation (German: "Unser Lesetipp auf Gründerszene"). In 2023, YouTube recognized Vidby as its recommended vendor.

The Machine That Won the War (short story)

"The Machine That Won the War" is a science fiction short story by American writer Isaac Asimov. The story first appeared in the October 1961 issue of The Magazine of Fantasy & Science Fiction, and was reprinted in the collections Nightfall and Other Stories (1969) and Robot Dreams (1986). It was also printed in a contemporary edition of Reader's Digest, illustrated. It is one of a loosely connected series of such stories concerning a fictional supercomputer called Multivac. == Plot summary == Three influential leaders of the human race meet in the aftermath of a successful war against the Denebians. Discussing how the vast and powerful Multivac computer was a decisive factor in the war, each of the men admits that in fact, he falsified his part of the decision process because he felt that the situation was too complex to follow normal procedures. John Henderson, Multivac's Chief Programmer, admits that he altered the data being fed to Multivac, since the populace could not be trusted to report accurate information in the current situation. Max Jablonski then admits that he altered the data that Multivac produced, since he knew that Multivac was not in good working order due to manpower and spare parts shortage. Finally, Lamar Swift, executive director of the Solar Federation, reveals that he had not trusted the reports produced by Multivac, and had made the final decisions purely on the toss of a coin.

Image

An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a projection on a surface, activation of electronic signals, or digital displays; they can also be reproduced through mechanical means, such as photography, printmaking, or photocopying. Images can also be animated through digital or physical processes. In the context of signal processing, an image is a distributed amplitude of color(s). In optics, the term image (or optical image) refers specifically to the reproduction of an object formed by light waves coming from the object. A volatile image exists or is perceived only for a short period. This may be a reflection of an object by a mirror, a projection of a camera obscura, or a scene displayed on a cathode-ray tube. A fixed image, also called a hard copy, is one that has been recorded on a material object, such as paper or textile. A mental image exists in an individual's mind as something one remembers or imagines. The subject of an image does not need to be real; it may be an abstract concept such as a graph or function or an imaginary entity. For a mental image to be understood outside of an individual's mind, however, there must be a way of conveying that mental image through the words or visual productions of the subject. == Characteristics == === Two-dimensional images === The broader sense of the word 'image' also encompasses any two-dimensional figure, such as a map, graph, pie chart, painting, or banner. In this wider sense, images can also be rendered manually, such as by drawing, the art of painting, or the graphic arts (such as lithography or etching). Additionally, images can be rendered automatically through printing, computer graphics technology, or a combination of both methods. A two-dimensional image does not need to use the entire visual system to be a visual representation. An example of this is a grayscale ("black and white") image, which uses the visual system's sensitivity to brightness across all wavelengths without taking into account different colors. A black-and-white visual representation of something is still an image, even though it does not fully use the visual system's capabilities. On the other hand, some processes can be used to create visual representations of objects that are otherwise inaccessible to the human visual system. These include microscopy for the magnification of minute objects, telescopes that can observe objects at great distances, X-rays that can visually represent the interior structures of the human body (among other objects), magnetic resonance imaging (MRI), positron emission tomography (PET scans), and others. Such processes often rely on detecting electromagnetic radiation that occurs beyond the light spectrum visible to the human eye and converting such signals into recognizable images. === Three-dimensional images === Aside from sculpture and other physical activities that can create three-dimensional images from solid material, some modern techniques, such as holography, can create three-dimensional images that are reproducible but intangible to human touch. Some photographic processes can now render the illusion of depth in an otherwise "flat" image, but "3-D photography" (stereoscopy) or "3-D film" are optical illusions that require special devices such as eyeglasses to create the illusion of depth. === Moving images === "Moving" two-dimensional images are actually illusions of movement perceived when still images are displayed in sequence, each image lasting less, and sometimes much less, than a fraction of a second. The traditional standard for the display of individual frames by a motion picture projector has been 24 frames per second (FPS) since at least the commercial introduction of "talking pictures" in the late 1920s, which necessitated a standard for synchronizing images and sounds. Even in electronic formats such as television and digital image displays, the apparent "motion" is actually the result of many individual lines giving the impression of continuous movement. This phenomenon has often been described as "persistence of vision": a physiological effect of light impressions remaining on the retina of the eye for very brief periods. Even though the term is still sometimes used in popular discussions of movies, it is not a scientifically valid explanation. Other terms emphasize the complex cognitive operations of the brain and the human visual system. "Flicker fusion", the "phi phenomenon", and "beta movement" are among the terms that have replaced "persistence of vision", though no one term seems adequate to describe the process. == Cultural and other uses == Image-making seems to have been common to virtually all human cultures since at least the Paleolithic era. Prehistoric examples of rock art—including cave paintings, petroglyphs, rock reliefs, and geoglyphs—have been found on every inhabited continent. Many of these images seem to have served various purposes: as a form of record-keeping; as an element of spiritual, religious, or magical practice; or even as a form of communication. Early writing systems, including hieroglyphics, ideographic writing, and even the Roman alphabet, owe their origins in some respects to pictorial representations. === Meaning and signification === Images of any type may convey different meanings and sensations for individual viewers, regardless of whether the image's creator intended them. An image may be taken simply as a more or less "accurate" copy of a person, place, thing, or event. It may represent an abstract concept, such as the political power of a ruler or ruling class, a practical or moral lesson, an object for spiritual or religious veneration, or an object—human or otherwise—to be desired. It may also be regarded for its purely aesthetic qualities, rarity, or monetary value. Such reactions can depend on the viewer's context. A religious image in a church may be regarded differently than the same image mounted in a museum. Some might view it simply as an object to be bought or sold. Viewers' reactions will also be guided or shaped by their education, class, race, and other contexts. The study of emotional sensations and their relationship to any given image falls into the categories of aesthetics and the philosophy of art. While such studies inevitably deal with issues of meaning, another approach to signification was suggested by the American philosopher, logician, and semiotician Charles Sanders Peirce. "Images" are one type of the broad category of "signs" proposed by Peirce. Although his ideas are complex and have changed over time, the three categories of signs that he distinguished stand out: The "icon," which relates to an object by resemblance to some quality of the object. A painted or photographed portrait is an icon by virtue of its resemblance to the painting's or photograph's subject. A more abstract representation, such as a map or diagram, can also be an icon. The "index," which relates to an object by some real connection. For example, smoke may be an index of fire, or the temperature recorded on a thermometer may be an index of a patient's illness or health. The "symbol," which lacks direct resemblance or connection to an object but whose association is arbitrarily assigned by the creator or dictated by cultural and historical habit, convention, etc. The color red, for example, may connote rage, beauty, prosperity, political affiliation, or other meanings within a given culture or context; the Swedish film director Ingmar Bergman claimed that his use of the color in his 1972 film Cries and Whispers came from his personal visualization of the human soul. A single image may exist in all three categories at the same time. The Statue of Liberty provides an example. While there have been countless two-dimensional and three-dimensional "reproductions" of the statue (i.e., "icons" themselves), the statue itself exists as an "icon" by virtue of its resemblance to a human woman (or, more specifically, previous representations of the Roman goddess Libertas or the female model used by the artist Frederic-Auguste Bartholdi). an "index" representing New York City or the United States of America in general due to its placement in New York Harbor, or with "immigration" from its proximity to the immigration center at Ellis Island. a "symbol" as a visualization of the abstract concept of "liberty" or "freedom" or even "opportunity" or "diversity". === Critiques of imagery === The nature of images, whether three-dimensional or two-dimensional, created for a specific purpose or only for aesthetic pleasure, has continued to provoke questions and even condemnation at different times and places. In his dialogue, The Republic, the Greek philosopher Plato described our apparent reality as a copy of a higher order of universal forms.

Midjourney

Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco–based "independent research lab" Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusion. It is one of the technologies of the AI boom. The tool was launched into open beta on July 12, 2022. The Midjourney team is led by David Holz, who co-founded Leap Motion. Holz told The Register in August 2022 that the company was already profitable. Users generate images with Midjourney using Discord bot commands or the official website. == History == Midjourney, Inc. was founded in San Francisco, California, by David Holz, previously a co-founder of Leap Motion. The Midjourney image generation platform entered open beta on July 12, 2022. On March 14, 2022, the Midjourney Discord server launched with a request to post high-quality photographs to Twitter and Reddit for systems training. === Model versions === The company has been working on improving its algorithms, releasing new model versions every few months. Version 2 of their algorithm was launched in April 2022, and version 3 on July 25. On November 5, 2022, the alpha iteration of version 4 was released to users. Starting from the 4th version, MJ models were trained on Google TPUs. On March 15, 2023, the alpha iteration of version 5 was released. The 5.1 model is more opinionated than version 5, applying more of its own stylization to images, while the 5.1 RAW model adds improvements while working better with more literal prompts. The version 5.2 included a new "aesthetics system", and the ability to "zoom out" by generating surroundings to an existing image. On December 21, 2023, the alpha iteration of version 6 was released. The model was trained from scratch over a nine month period. Support was added for better text rendition and a more literal interpretation of prompts. == Functionality == Midjourney is accessible through a Discord bot or by accessing their website. Users can use Midjourney through Discord either through their official Discord server, by directly messaging the bot, or by inviting the bot to a third-party server. To generate images, users use the /imagine command and type in a prompt; the bot then returns a set of four images, which users are given the option to upscale. To generate images on the website, users initially needed to have generated at least 1,000 images through the bot; this limitation has since been removed. === Vary (Region) + remix feature === Midjourney released a Vary (Region) feature on September 5, 2023, as part of MidJourney V5.2. This feature allows users to select a specific area of an image and apply variations only to that region while keeping the rest of the image unchanged. === Midjourney web interface === Midjourney introduced its web interface to make its tools more accessible, moving beyond its initial reliance on Discord. This web-based platform was launched in August 2024 alongside the release of Midjourney version 6.1. The web editor consolidates tools such as image editing, panning, zooming, region variation, and inpainting into a single interface. The introduction of the web interface also syncs conversations between Midjourney's Discord channels and web rooms, further enhancing collaboration across both platforms. This shift was in response to growing competition from other AI image generation platforms like Adobe Firefly and Google’s Imagen, which had already launched as native web apps with integration into popular design tools. === Image Weight === This feature lets users control how much influence an uploaded image has on the final output. By adjusting the "image weight" parameter, users can prioritize either the content of the prompt or the characteristics of the image. For instance, setting a higher weight will ensure that the generated result closely follows the image's structure and details, while a lower weight allows the text prompt to have more influence over the final output. === Style Reference === With Style Reference, users can upload an image to use as a stylistic guide for their creation. This tool enables MidJourney to extract the style—whether it is the color palette, texture, or overall atmosphere—from the reference image and apply it to a newly generated image. The feature allows users to fine-tune the aesthetics of their creations by integrating specific artistic styles or moods. === Character Reference === The Character Reference feature allows for a more targeted approach in defining characters. Users can upload an image of a character, and the system uses that image as a reference to generate similar characters in the output. This feature is particularly useful in maintaining consistency in appearance for characters across different images. == Uses == Midjourney's founder, David Holz, told The Register that artists use Midjourney for rapid prototyping of artistic concepts to show to clients before starting work themselves. The advertising industry quickly adopted AI tools such as Midjourney, DALL-E, and Stable Diffusion to create original content and brainstorm ideas. Architects have described using the software to generate mood boards for the early stages of projects, as an alternative to searching Google Images. === Notable usage and controversy === The program was used by the British magazine The Economist to create the front cover for an issue in June 2022. In Italy, the leading newspaper Corriere della Sera published a comic created with Midjourney by writer Vanni Santoni in August 2022. Charlie Warzel used Midjourney to generate two images of Alex Jones for Warzel's newsletter in The Atlantic. The use of an AI-generated cover was criticised by people who felt it was taking jobs from artists. Warzel called his action a mistake in an article about his decision to use generated images. Last Week Tonight with John Oliver included a 10-minute segment on Midjourney in an episode broadcast in August 2022. A Midjourney image called Théâtre D'opéra Spatial won first place in the digital art competition at the 2022 Colorado State Fair. Jason Allen, who wrote the prompt that led Midjourney to generate the image, printed the image onto a canvas and entered it into the competition using the name Jason M. Allen via Midjourney. Other digital artists were upset by the news. Allen was unapologetic, insisting that he followed the competition's rules. The two category judges were unaware that Midjourney used AI to generate images, although they later said that had they known this, they would have awarded Allen the top prize anyway. In December 2022, Midjourney was used to generate the images for an AI-generated children's book that was created over a weekend. Titled Alice and Sparkle, the book features a young girl who builds a robot that becomes self-aware. The creator, Ammaar Reeshi, used Midjourney to generate a large number of images, from which he chose 13 for the book. Both the product and process drew criticism. One artist wrote that "the main problem... is that it was trained off of artists' work. It's our creations, our distinct styles that we created, that we did not consent to being used." In 2023, the realism of AI-based text-to-image generators, such as Midjourney, DALL-E, or Stable Diffusion, reached such a high level that it led to a significant wave of viral AI-generated photos. Widespread attention was gained by a Midjourney-generated photo of Pope Francis wearing a white puffer coat, the fictional arrest of Donald Trump, and a hoax of an attack on the Pentagon, as well as the usage in professional creative arts. Research has suggested that the images Midjourney generates can be biased. For example, even neutral prompts in one study returned unequal results on the aspects of gender, skin color, and location. A study by researchers at the nonprofit group Center for Countering Digital Hate found the tool to be easy to use to generate racist and conspiratorial images. In October 2023, Rest of World reported that Midjourney tends to generate images based on national stereotypes. In 2024, a Frontiers journal published a paper which contained gibberish figures generated with Midjourney, one of which was a diagram of a rat with large testicles and a large penis towering over himself. The paper was retracted a day after the images went viral on Twitter. ==== Content moderation and censorship in Midjourney ==== Prior to May 2023, Midjourney implemented a moderation mechanism predicated on a banned word system. This method prohibited the use of language associated with explicit content, such as sexual or pornographic themes, as well as extreme violence. Moreover, the system also banned certain individual words, including those of religious and political figures, such as Allah or General Secretary of the Chinese Communist Party Xi Jinping. This practice occasionally stirred controversy due to perceiv