AI Email Management

AI Email Management — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Color quantization

    Color quantization

    In computer graphics, color quantization or color image quantization is quantization applied to color spaces; it is a process that reduces the number of distinct colors used in an image, usually with the intention that the new image should be as visually similar as possible to the original image. Computer algorithms to perform color quantization on bitmaps have been studied since the 1970s. Color quantization is critical for displaying images with many colors on devices that can only display a limited number of colors, usually due to memory limitations, and enables efficient compression of certain types of images. The name "color quantization" is primarily used in computer graphics research literature; in applications, terms such as optimized palette generation, optimal palette generation, or decreasing color depth are used. Some of these are misleading, as the palettes generated by standard algorithms are not necessarily the best possible. == Algorithms == Most standard techniques treat color quantization as a problem of clustering points in three-dimensional space, where the points represent colors found in the original image and the three axes represent the three color channels. Almost any three-dimensional clustering algorithm can be applied to color quantization, and vice versa. After the clusters are located, typically the points in each cluster are averaged to obtain the representative color that all colors in that cluster are mapped to. The three color channels are usually red, green, and blue, but another popular choice is the Lab color space, in which Euclidean distance is more consistent with perceptual difference. The most popular algorithm by far for color quantization, invented by Paul Heckbert in 1979, is the median cut algorithm. Many variations on this scheme are in use. Before this time, most color quantization was done using the population algorithm or population method, which essentially constructs a histogram of equal-sized ranges and assigns colors to the ranges containing the most points. A more modern popular method is clustering using octrees, first conceived by Gervautz and Purgathofer and improved by Xerox PARC researcher Dan Bloomberg. If the palette is fixed, as is often the case in real-time color quantization systems such as those used in operating systems, color quantization is usually done using the "straight-line distance" or "nearest color" algorithm, which simply takes each color in the original image and finds the closest palette entry, where distance is determined by the distance between the two corresponding points in three-dimensional space. In other words, if the colors are ( r 1 , g 1 , b 1 ) {\displaystyle (r_{1},g_{1},b_{1})} and ( r 2 , g 2 , b 2 ) {\displaystyle (r_{2},g_{2},b_{2})} , we want to minimize the Euclidean distance: ( r 1 − r 2 ) 2 + ( g 1 − g 2 ) 2 + ( b 1 − b 2 ) 2 . {\displaystyle {\sqrt {(r_{1}-r_{2})^{2}+(g_{1}-g_{2})^{2}+(b_{1}-b_{2})^{2}}}.} This effectively decomposes the color cube into a Voronoi diagram, where the palette entries are the points and a cell contains all colors mapping to a single palette entry. There are efficient algorithms from computational geometry for computing Voronoi diagrams and determining which region a given point falls in; in practice, indexed palettes are so small that these are usually overkill. Color quantization is frequently combined with dithering, which can eliminate unpleasant artifacts such as banding that appear when quantizing smooth gradients and give the appearance of a larger number of colors. Some modern schemes for color quantization attempt to combine palette selection with dithering in one stage, rather than perform them independently. A number of other much less frequently used methods have been invented that use entirely different approaches. The Local K-means algorithm, conceived by Oleg Verevka in 1995, is designed for use in windowing systems where a core set of "reserved colors" is fixed for use by the system and many images with different color schemes might be displayed simultaneously. It is a post-clustering scheme that makes an initial guess at the palette and then iteratively refines it. In the early days of color quantization, the k-means clustering algorithm was deemed unsuitable because of its high computational requirements and sensitivity to initialization. In 2011, M. Emre Celebi reinvestigated the performance of k-means as a color quantizer. He demonstrated that an efficient implementation of k-means outperforms a large number of color quantization methods. The high-quality but slow NeuQuant algorithm reduces images to 256 colors by training a Kohonen neural network "which self-organises through learning to match the distribution of colours in an input image. Taking the position in RGB-space of each neuron gives a high-quality colour map in which adjacent colours are similar." It is particularly advantageous for images with gradients. Finally, one of the newer methods is spatial color quantization, conceived by Puzicha, Held, Ketterer, Buhmann, and Fellner of the University of Bonn, which combines dithering with palette generation and a simplified model of human perception to produce visually impressive results even for very small numbers of colors. It does not treat palette selection strictly as a clustering problem, in that the colors of nearby pixels in the original image also affect the color of a pixel. See sample images. == History and applications == In the early days of PCs, it was common for video adapters to support only 2, 4, 16, or (eventually) 256 colors due to video memory limitations; they preferred to dedicate the video memory to having more pixels (higher resolution) rather than more colors. Color quantization helped to justify this tradeoff by making it possible to display many high color images in 16- and 256-color modes with limited visual degradation. Many operating systems automatically perform quantization and dithering when viewing high color images in a 256 color video mode, which was important when video devices limited to 256 color modes were dominant. Modern computers can now display millions of colors at once, far more than can be distinguished by the human eye, limiting this application primarily to mobile devices and legacy hardware. Nowadays, color quantization is mainly used in GIF and PNG images. GIF, for a long time the most popular lossless and animated bitmap format on the World Wide Web, only supports up to 256 colors, necessitating quantization for many images. Some early web browsers constrained images to use a specific palette known as the web colors, leading to severe degradation in quality compared to optimized palettes. PNG images support 24-bit color, but can often be made much smaller in filesize without much visual degradation by application of color quantization, since PNG files use fewer bits per pixel for palettized images. The infinite number of colors available through the lens of a camera is impossible to display on a computer screen; thus converting any photograph to a digital representation necessarily involves some quantization. Practically speaking, 24-bit color is sufficiently rich to represent almost all colors perceivable by humans with sufficiently small error as to be visually identical (if presented faithfully), within the available color space. However, the digitization of color, either in a camera detector or on a screen, necessarily limits the available color space. Consequently there are many colors that may be impossible to reproduce, regardless of how many bits are used to represent the color. For example, it is impossible in typical RGB color spaces (common on computer monitors) to reproduce the full range of green colors that the human eye is capable of perceiving. With the few colors available on early computers, different quantization algorithms produced very different-looking output images. As a result, a lot of time was spent on writing sophisticated algorithms to be more lifelike. === Quantization for image compression === Many image file formats support indexed color. A whole-image palette typically selects 256 "representative" colors for the entire image, where each pixel references any one of the colors in the palette, as in the GIF and PNG file formats. A block palette typically selects 2 or 4 colors for each block of 4x4 pixels, used in BTC, CCC, S2TC, and S3TC. === Editor support === Many bitmap graphics editors contain built-in support for color quantization, and will automatically perform it when converting an image with many colors to an image format with fewer colors. Most of these implementations allow the user to set exactly the number of desired colors. Examples of such support include: Photoshop's Mode→Indexed Color function supplies a number of quantization algorithms ranging from the fixed Windows system and Web palettes to the proprietary Local and Global algorithms for generating palettes suited to a particu

    Read more →
  • Language Weaver

    Language Weaver

    Language Weaver is the machine translation (MT) technology and brand of RWS. The brand name was revived in 2021 following the acquisition of SDL and Iconic Translation Machines Ltd. and the merging of the respective teams and technologies. Language Weaver was formerly a standalone company that was acquired by SDL in 2010. == History == Language Weaver was a Los Angeles, California–based company founded in 2002 as a spin-out company from the University of Southern California. The company was founded to commercialise a statistical approach to automatic language translation and natural language processing known as statistical machine translation (SMT). The company's name is a reference to one of the pioneers of machine translation — Warren Weaver — who first proposed the idea of using computers to ‘decode’ or ‘decrypt’ language in a memorandum back in 1947. Language Weaver’s statistical approach to machine translation was cutting-edge at the time, and a significant improvement over previous approaches such as Rule-Based MT. Language Weaver grew steadily over an 8 year period, with staff numbers totalling 96 across offices in US, Europe, and Japan. The company had significant business with Government organisations where its name continues to hold strong recognition to this day. In July 2010, Language Weaver was acquired by SDL plc for $42.5 million and the company was renamed SDL Language Weaver. == SDL Language Weaver == SDL Language Weaver was the primary machine translation technology at SDL where, over time, it evolved from SMT to syntax-based MT, to Neural Machine Translation. The Language Weaver brand was retired in 2015 in favour of SDL BeGlobal for the cloud-based solution, and SDL Enterprise Translation Server for the on-premise solution. Later, these products were rebranded again as SDL Machine Translation Cloud and SDL Machine Translation Edge respectively. == 2021 Relaunch == The Language Weaver brand was revived in 2021 following the acquisition of SDL by RWS, and the merger of the SDL MT and Iconic Translation Machines teams and technologies. The combined technologies of both companies, based on state-of-the-art Transformer-based Neural Machine Translation, are now sold as "Language Weaver" for cloud-based MT, and "Language Weaver Edge" for on-premise MT. == Supported languages == As of September 2021, Language Weaver supports the following languages and language varieties:

    Read more →
  • Is an AI Text-to-video Tool Worth It in 2026?

    Is an AI Text-to-video Tool Worth It in 2026?

    In search of the best AI text-to-video tool? An AI text-to-video tool is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI text-to-video tool slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Abeba Birhane

    Abeba Birhane

    Abeba Birhane is an Ethiopian-born cognitive scientist who works at the intersection of complex adaptive systems, machine learning, algorithmic bias, and critical race studies. Birhane's work with Vinay Prabhu uncovered that large-scale image datasets commonly used to develop AI systems, including ImageNet and 80 Million Tiny Images, carried racist and misogynistic labels and offensive images. She has been recognized by VentureBeat as a top innovator in computer vision and named as one of the 100 most influential persons in AI 2023 by TIME magazine. == Early life and education == Birhane was born in Ethiopia. She received her Bachelors of Science in Psychology and a Bachelors of Arts in Philosophy from The Open University. In 2015, she completed her Master of Science in Cognitive Science and, in 2021, her Ph.D. at the Complex Software Lab in the School of Computer Science at University College Dublin. == Career and research == Birhane studied the impacts of emerging AI technologies and how they shape individuals and local communities. She found that AI algorithms tend to disproportionately impact vulnerable groups such as older workers, trans people, immigrants, and children. Her research on relational ethics won the best paper award at NeurIPS’s Black in AI workshop in 2019. She has also studied and written about algorithmic colonization driven by corporate agendas. Her work in decolonizing computational sciences addressed the inherited oppressions in current systems especially towards women of color. In 2020, Birhane and Vinay Prabhu, principal machine learning scientist at UnifyID, published a paper examining the problematic data collection, labelling, classification, and consequences of large image datasets. These datasets, including ImageNet and MIT's 80 Million Tiny Images, have been used to develop thousands of AI algorithms and systems. Birhane and Prabhu found that they contained many racist and misogynistic labels and slurs as well as offensive images. This resulted in MIT voluntarily and formally taking down the 80 Million Tiny Images dataset. More recently, Birhane has worked with Rediet Abebe, George Obaido, and Sekou Remy on researching the barriers to data sharing in Africa. They found that power imbalances are significant in the data sharing process, even when the data comes from Africa. Their research was published at the ACM Conference on Fairness, Accountability, and Transparency. In 2024, Birhane established the AI Accountability Lab research group at Trinity College Dublin. == Selected awards == 2019 NeurIPS Black in AI Workshop Best Paper Award 2020 Venture Beat AI Innovations Award in the category Computer Vision Innovation (received with Vinay Prabhu) 2021 100 Brilliant Women in AI Ethics Hall of Fame Honoree 2022 Lero Director’s Prize for PhD/PostDoctoral Contribution. 2023 100 Most Influential People in AI by TIME magazine

    Read more →
  • Alerts.in.ua

    Alerts.in.ua

    alerts.in.ua is an online service that visualizes information about air alerts and other threats on the map of Ukraine. == History == The idea of the site appeared in the first weeks of the 2022 Russian invasion of Ukraine, during the development of other projects related to alerting the population about alarms. So, on March 2, 2022, the "Lviv Siren" bot was created, which reported on air alarms in Lviv on Twitter. Later, the idea arose to monitor alarms all over Ukraine and display them on a map. However, the lack of a single official source reporting alarms made this task much more difficult. On March 15, 2022, the Ajax Systems company announced the creation of the official Telegram channel "Air Alarm". This channel receives signals from the "Air Alarm" application and instantly publishes messages about the start and end of alarms in different regions of Ukraine. This immediately solved the problem with the source of information and gave impetus to the further implementation of the project. On March 22, 2022, the first version of the "Air Alarm Map" website was published, located on the war.ukrzen.in.ua domain. The map quickly gained popularity in social networks. It, like several other similar projects, began to be widely distributed by the mass media: Suspilne, Novyi Kanal, UNIAN, DW, Fakty ICTV, Vikna TV, Ukrainian Radio, STB, Espresso, dev.ua, itc.ua and state bodies: Center for Countering Disinformation at the National Security and Defense Council of Ukraine, Verkhovna Rada of Ukraine, Khmelnytska OVA, etc. On April 8, 2022, the site moved to the alerts.in.ua domain, where it is still available today. On August 25, 2022, the service began monitoring local official channels in addition to the main "Air Alarm". On September 11, 2022, the English version of the site was published. On March 22, 2023, its own Android application was published. The project is actively developing and has its own community. == Description == The main part of the site is a map of Ukraine, on which the regions where an air alert or other threats have been declared are highlighted in real time. As of October 16, 2022, 5 types of threats are supported: Air alarm. The threat of artillery fire. The threat of street fighting. Chemical threat. Nuclear threat. Additionally, based on media reports, information is published about other dangerous events, such as explosions, demining, etc. On the site, you can view the history of announced alarms with links to sources. Alarm statistics for different time periods are also available. For developers, there is an API that allows you to develop your own services based on information about declared alarms. The site is available in Ukrainian, English, Polish and Japanese. == Use == The map is used by: To monitor the situation in the country and the region. To illustrate the alarms announced in the mass media: TSN, Ukrainian truth, Channel 24, Suspilne, RBC Ukraine, Gromadske, Glavkom. As a map of alarms in mobile applications, there is Alarm and AirAlert. As an API for its services, including alternative alarm maps, Telegram, Viber channels, Discord bots, IoT projects, etc. == Statistics == 89.5% of users use the map from a mobile phone, 10% from a PC and 1% from a tablet. Top 6 countries by visit: Ukraine, United States, Poland, Germany, Great Britain and Japan . == Alternative projects == eMap was created by the developer Vadym Klymenko. AlarmMap is an online from the Ukrainian office of Agroprep. The official map of air alarms was developed by Ajax Systems together with the developer Artem Lemeshev, Stfalcon with the support of the Ministry of Statistics.

    Read more →
  • General Regionally Annotated Corpus of Ukrainian

    General Regionally Annotated Corpus of Ukrainian

    General Regionally Annotated Corpus of the Ukrainian Language (GRAC, Ukrainian: Генеральний регіонально анотований корпус української мови, romanized: Heneralnyi rehionalno anotovanyi korpus ukrainskoi movy, ГРАК, Ukrainian грак for rook) is a text corpus of the Ukrainian language comprising more than 2 billion tokens, intended for linguistic research in grammar, vocabulary, and the history of the Ukrainian literary language, as well as for use in compiling dictionaries and grammars. The corpus can be used for language study and also for preparing teaching materials, textbooks, learner’s dictionaries, and exercises using examples from real texts, taking into account frequency and collocational patterns, and so on. The corpus is not a model of standard Ukrainian: it may contain words and combinations that do not match current norms of the literary language. The corpus covers the period from 1816 to 2025, and as of 29 November 2025 it contains more than 812,000 texts by about 35,000 authors. == Composition of the corpus == In the 10th version of the corpus, available for searching from 20 October 2020, 35% consists of fiction. Some fiction genres are highlighted separately: children’s literature, folklore, dramatic works, and scripts. Among non-fiction texts: journalistic writing, including newspaper collections from 1888–1893, 1905, 1913–1918, 1919–1943, modern newspapers from different regions, and texts from online news/information sites; memoirs, letters, and diaries, including a sizeable corpus of Facebook texts representing blogs by people from all regions of Ukraine and the diaspora; scholarly and educational texts: monographs, dissertations, academic articles, textbooks; large subcorpora of academic literature in history, ethnography, philosophy, and law are singled out separately; religious texts, including two Ukrainian translations of the Bible; speeches and interviews. Some dictionaries that include phrasal examples and phraseology have also been incorporated, including the Ukrainian dictionary by Borys Hrinchenko and the Russian-Ukrainian idiomatic dictionary by I. Vyrhan and M. Pylynska. Using the corpus tools, these dictionaries can be searched not only for words, but also for lexico-grammatical patterns within examples and phraseological expressions. About 20% of the texts in the corpus are translations. The corpus includes translations from more than 80 languages, most of all from English and Russian. == Dating == Texts in the corpus are dated by the year of writing, or by the latest year in which a work could have been written; translated texts are dated by the year the translation was produced. A publication year may also be indicated, corresponding to the edition from which the text is taken. == Regional annotation == The corpus’s regional annotation is based on the modern administrative division of Ukraine. The corpus includes texts from all oblasts of Ukraine and from Crimea. A single text may belong to several regional subcorpora (if the author or translator was born, studied, or lived for a long time in different regions). In addition to regional subcorpora, there are subcorpora of works by authors of the Ukrainian diaspora (USA, Canada, Poland, Germany, the United Kingdom, France, etc.). These are mostly texts by emigrants of the 1940s, and to a lesser extent of 1917–1920s. == Morphological annotation == GRAC is based on the morphological analysis system nlp_uk, developed by specialists from the r2u group. The program analyzes the text and, for each word form, determines the lemma (lexeme) and tags (grammatical features). == Research based on the corpus == Research on the Ukrainian language has been carried out using the corpus, including studies of the historical dynamics of language norms, and letter and letter-combination frequencies for font development.

    Read more →
  • AI Humanizers Reviews: What Actually Works in 2026

    AI Humanizers Reviews: What Actually Works in 2026

    Curious about the best AI humanizer? An AI humanizer is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI humanizer slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Anna Korhonen

    Anna Korhonen

    Anna-Leena Korhonen is a Finnish computer scientist who works in England as professor of natural language processing at the University of Cambridge, where she is co-director of the Language Technology Lab and the Institute for Technology and Humanity, fellow of the Alan Turing Institute, director of the Centre for Human Inspired Artificial Intelligence, fellow of the European Laboratory for Learning and Intelligent Systems, and a senior research fellow of Churchill College, Cambridge. Her research interests include natural language processing, the applications of natural language processing in health, and the social consequences of AI-based language tools. == Education and career == Korhonen studied linguistics as an undergraduate at the University of Helsinki. After a master's degree in linguistics at the University of Reading, she completed a Ph.D. in computer science at the University of Cambridge. Her 2002 doctoral dissertation, Subcategorization acquisition, was supervised by Ted Briscoe. After postdoctoral research at the University of Pennsylvania and at the National Institute of Informatics in Japan, she returned to Cambridge in 2005 as a senior research associate and Royal Society University Research Fellow. She became a reader in computational linguistics in 2014, professor of natural language processing in 2017, director of the Centre for Human Inspired Artificial Intelligence in 2022, and co-director of the Institute for Technology and Humanity in 2024. == Recognition == Korhonen was named as a Fellow of the Association for Computational Linguistics in 2023, "for significant contributions to lexical acquisition, multilingual and low resource NLP, socially beneficial language applications, and services to the ACL community". She was elected to the Academia Europaea in 2025.

    Read more →
  • Jaggies

    Jaggies

    Jaggies are visual artifacts in raster images, most frequently from aliasing, which in turn is often caused by non-linear mixing effects producing high-frequency components, or missing or poor anti-aliasing filtering prior to sampling. Jaggies are stair-like lines that appear where there should be "smooth" straight lines or curves. For example, when a nominally straight, un-aliased line steps across one pixel either horizontally or vertically, a "dogleg" occurs halfway through the line, where it crosses the threshold from one pixel to the other. Jaggies should not be confused with most compression artifacts, which are a different phenomenon. == Causes == Jaggies occur due to the "staircase effect". This is because a line represented in raster mode is approximated by a sequence of pixels. Jaggies can occur for a variety of reasons, the most common being that the output device (display monitor or printer) does not have sufficient resolution to portray a smooth line. In addition, jaggies often occur when a bit-mapped image is scaled to a higher resolution. This is one of the advantages that vector graphics have over bitmapped graphics – a vector image can be losslessly scaled to any arbitrary resolution or stretched infinitely in either axis without introducing jaggies. == Solutions == The effect of jaggies can be reduced by a graphics technique known as spatial anti-aliasing. Anti-aliasing smooths out jagged lines by surrounding them with transparent pixels to simulate the appearance of fractionally-filled pixels when viewed at a distance. The downside of anti-aliasing is that it reduces contrast – rather than sharp black/white transitions, there are shades of gray – and the resulting image can appear fuzzy. This is an inescapable trade-off: if the resolution is insufficient to display the desired detail, the output will either be jagged, fuzzy, or some combination thereof. While machine learning-based upscaling techniques such as DLSS can be used to infer this missing information, other types of artifacts may be introduced in the process. In real-time 3D rendering such as in video games, various anti-aliasing techniques are used to remove jaggies created by the edges of polygons and other contrasting lines. Since anti-aliasing can impose a significant performance overhead, games for home computers often allow users to choose the level and type of anti-aliasing in use in order to optimize their experience, whereas on consoles this setting is typically fixed for each title to ensure a consistent experience. While anti-aliasing is generally implemented through graphics APIs like DirectX and Vulkan, some consoles such as the Xbox 360 and PlayStation 3 are also capable of anti-aliasing to little direct performance cost by way of dedicated hardware which performs anti-aliasing on the contents of the framebuffer once it has been rendered by the GPU. Jaggies in bitmaps, such as sprites and surface materials, are most often dealt with by separate texture filtering routines, which are far easier to perform than anti-aliasing filtering. Texture filtering became ubiquitous on PCs after the introduction of 3Dfx's Voodoo GPU. == Notable uses of the term == In the 1985 game Rescue on Fractalus! for the Atari 8-bit computers, the graphics depicting the cockpit of the player's spacecraft contains two window struts, which are not anti-aliased and are therefore very "jagged". The developers made fun of this and named the in-game enemies "Jaggi", and also initially titled the game Behind Jaggi Lines!. The latter idea was scrapped by the marketing department before release.

    Read more →
  • Vera Demberg

    Vera Demberg

    Vera Demberg (born 1981) is a German computational linguist and professor of computer science and computational linguistics at Saarland University. Her research interests include cognitive models of human language comprehension, natural language generation, experimental psycholinguistics, multimodal language processing in a dual-task setting, and experimental and computational discourse research and pragmatics. == Career and research == Vera Demberg studied computational linguistics at the Institute for Machine Language Processing at the University of Stuttgart from 2001 to 2006. She then completed a Master's degree in Artificial Intelligence at the University of Edinburgh from 2004 to 2005. She received her Ph.D. from the Department of Computer Science there from 2006 to 2010. Her dissertation paper, titled “Broad-Coverage Model of Prediction in Human Sentence Processing”, was awarded the Cognitive Science Society's “Glushko Dissertation Prize in Cognitive Science” in 2011. In her work, she designed a model of human sentence processing that can be used to predict difficulties in processing at the syntactic level. From 2010 to 2016, Vera Demberg led an independent research group on cognitive models of human language processing and their application to speech dialog systems in the Cluster of Excellence “Multimodal Computing and Interaction” at the University of Saarland. In 2016, she was appointed there to a professorship in computer science and computational linguistics. Demberg's professorship is in the Department of Computer Science (Faculty of Mathematics and Computer Science). She is also a co-opted professor in the Department of Linguistics and Language Technology (Faculty of Philosophy). Since 2020, she has led the ERC Starting Grant “Individualized Interaction in Discourse”. The project conducts research on how to make linguistic interaction with computer systems more natural. She has authored and co-authored numerous papers on the study of computational linguistics and natural language processing. According to Google Scholar, Vera Demberg has an H-index of 30. == Publications == Vera Demberg has authored more than 200 papers; please refer to her scholar page at https://scholar.google.com/citations?user=l2CFSAMAAAAJ == Awards == 2011: Cognitive Science Society Glushko Dissertation Prize in Cognitive Science 2020: ERC Starting Grant “Individualized Interaction in Discourse” 2024: Member of the Academy of Sciences and Literature

    Read more →
  • Dissociated press

    Dissociated press

    Dissociated press is a parody generator (a computer program that generates nonsensical text). The generated text is based on another text using the Markov chain technique. The name is a play on "Associated Press" and the psychological term dissociation (although word salad is more typical of conditions like aphasia and schizophrenia – which is, however, frequently confused with dissociative identity disorder by laypeople). An implementation of the algorithm is available in Emacs. Another implementation is available as a Perl module in CPAN, Games::Dissociate. == The algorithm == The algorithm starts by printing a number of consecutive words (or letters) from the source text. Then it searches the source text for an occurrence of the few last words or letters printed out so far. If multiple occurrences are found, it picks a random one, and proceeds with printing the text following the chosen occurrence. After a predetermined length of text is printed out, the search procedure is repeated for the newly printed ending. Considering that words and phrases tend to appear in specific grammatical contexts, the resulting text usually seems correct grammatically, and if the source text is uniform in style, the result appears to be of similar style and subject, and takes some effort on the reader's side to recognize as not genuine. Still, the randomness of the assembly process deprives it of any logical flow - the loosely related parts are connected in a nonsensical way, creating a humorously abstract, random result. == Examples == Here is a short example of word-based Dissociated Press applied to the Jargon File: wart: n. A small, crocky feature that sticks out of an array (C has no checks for this). This is relatively benign and easy to spot if the phrase is bent so as to be not worth paying attention to the medium in question. Here is a short example of letter-based Dissociated Press applied to the same source: window sysIWYG: n. A bit was named aften /bee´t@/ prefer to use the other guy's re, especially in every cast a chuckle on neithout getting into useful informash speech makes removing a featuring a move or usage actual abstractionsidered interj. Indeed spectace logic or problem! == History == The dissociated press algorithm is described in HAKMEM (1972) Item #176. The name "dissociated press" is first known to have been associated with the Emacs implementation. Brian Hayes discussed a Travesty algorithm in Scientific American in November 1983. The article provided a garbled William Faulkner passage: When he got on the table, he come in. He never come out of my own pocket as a measure of protecting the company against riot and bloodshed. And when he said. "You tell me a bus ticket, let alone write out no case histories. Then the law come back with a knife!" Hugh Kenner and Joseph O'Rourke of Johns Hopkins University discussed their frequency table-based Travesty generator for microcomputers in BYTE in November 1984. The article included the Turbo Pascal source for two versions of the generator, one using Hayes' algorithm and another using Claude Shannon's Hellbat algorithm. Murray Lesser offered a compiled BASIC version in the magazine in July 1985, in September 1985 Peter Wayner offered a version that used tree data structures instead of frequency tables, and in December 1985 Neil J. Rubenking offered a version written in Turbo Pascal that stored frequency information in a B-tree.

    Read more →
  • F-score

    F-score

    In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all samples predicted to be positive, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and recall is also known as sensitivity in diagnostic binary classification. The F1 score is the harmonic mean of the precision and recall. It thus symmetrically represents both precision and recall in one metric. The more generic F β {\displaystyle F_{\beta }} score applies additional weights, valuing one of precision or recall more than the other. The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if the precision or the recall is zero. == Etymology == The name F-measure is believed to be named after a different F function in Van Rijsbergen's book, when introduced to the Fourth Message Understanding Conference (MUC-4, 1992). == Definition == The traditional F-measure or balanced F-score (F1 score) is the harmonic mean of precision and recall: F 1 = 2 r e c a l l − 1 + p r e c i s i o n − 1 = 2 p r e c i s i o n ⋅ r e c a l l p r e c i s i o n + r e c a l l = 2 T P 2 T P + F P + F N {\displaystyle F_{1}={\frac {2}{\mathrm {recall} ^{-1}+\mathrm {precision} ^{-1}}}=2{\frac {\mathrm {precision} \cdot \mathrm {recall} }{\mathrm {precision} +\mathrm {recall} }}={\frac {2\mathrm {TP} }{2\mathrm {TP} +\mathrm {FP} +\mathrm {FN} }}} With precision = TP / (TP + FP) and recall = TP / (TP + FN), it follows that the numerator of F1 is the sum of their numerators and the denominator of F1 is the sum of their denominators. If FP=FN F 1 = 2 T P 2 T P + 2 F P = T P T P + F P {\displaystyle F_{1}={\frac {2\mathrm {TP} }{2\mathrm {TP} +2\mathrm {FP} }}={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FP} }}} or F 1 = 2 T P 2 T P + 2 F N = T P T P + F N {\displaystyle F_{1}={\frac {2\mathrm {TP} }{2\mathrm {TP} +2\mathrm {FN} }}={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FN} }}} So, F1 = precision = recall If TP=FP=FN F 1 = 2 T P 2 T P + 2 F P = 2 T P 4 T P = 1 2 = 0.5 {\displaystyle F_{1}={\frac {2\mathrm {TP} }{2\mathrm {TP} +2\mathrm {FP} }}={\frac {2\mathrm {TP} }{4\mathrm {TP} }}={\frac {1}{2}}=0.5} or F 1 = 2 T P 2 T P + 2 F N = 2 T P 4 T P = 1 2 = 0.5 {\displaystyle F_{1}={\frac {2\mathrm {TP} }{2\mathrm {TP} +2\mathrm {FN} }}={\frac {2\mathrm {TP} }{4\mathrm {TP} }}={\frac {1}{2}}=0.5} To see it as a harmonic mean, note that F 1 − 1 = 1 2 ( r e c a l l − 1 + p r e c i s i o n − 1 ) {\displaystyle F_{1}^{-1}={\frac {1}{2}}(\mathrm {recall} ^{-1}+\mathrm {precision} ^{-1})} . === Fβ score === A more general F score, F β {\displaystyle F_{\beta }} , that uses a positive real factor β {\displaystyle \beta } , where β {\displaystyle \beta } is chosen such that recall is considered β {\displaystyle \beta } times as important as precision, is: F β = β 2 + 1 ( β 2 ⋅ r e c a l l − 1 ) + p r e c i s i o n − 1 = ( 1 + β 2 ) ⋅ p r e c i s i o n ⋅ r e c a l l ( β 2 ⋅ p r e c i s i o n ) + r e c a l l {\displaystyle F_{\beta }={\frac {\beta ^{2}+1}{(\beta ^{2}\cdot \mathrm {recall} ^{-1})+\mathrm {precision} ^{-1}}}={\frac {(1+\beta ^{2})\cdot \mathrm {precision} \cdot \mathrm {recall} }{(\beta ^{2}\cdot \mathrm {precision} )+\mathrm {recall} }}} To see that as a weighted harmonic mean, note that F β − 1 = 1 β + β − 1 ( β ⋅ r e c a l l − 1 + β − 1 ⋅ p r e c i s i o n − 1 ) {\displaystyle F_{\beta }^{-1}={\frac {1}{\beta +\beta ^{-1}}}(\beta \cdot \mathrm {recall} ^{-1}+\beta ^{-1}\cdot \mathrm {precision} ^{-1})} . In terms of Type I and type II errors this becomes: F β = ( 1 + β 2 ) ⋅ T P ( 1 + β 2 ) ⋅ T P + β 2 ⋅ F N + F P = ( 1 + β 2 ) ⋅ T P ( T P + F N ) ⋅ β 2 + ( T P + F P ) {\displaystyle F_{\beta }={\frac {(1+\beta ^{2})\cdot \mathrm {TP} }{(1+\beta ^{2})\cdot \mathrm {TP} +\beta ^{2}\cdot \mathrm {FN} +\mathrm {FP} }}\,={\frac {(1+\beta ^{2})\cdot \mathrm {TP} }{(\mathrm {TP} +\mathrm {FN} )\cdot \beta ^{2}+(\mathrm {TP} +\mathrm {FP} )}}\,} Two commonly used values for β {\displaystyle \beta } are 2, which weighs recall higher than precision, and 1/2, which weighs recall lower than precision. The F-measure was derived so that F β {\displaystyle F_{\beta }} "measures the effectiveness of retrieval with respect to a user who attaches β {\displaystyle \beta } times as much importance to recall as precision". It is based on Van Rijsbergen's effectiveness measure E = 1 − ( α p + 1 − α r ) − 1 {\displaystyle E=1-\left({\frac {\alpha }{p}}+{\frac {1-\alpha }{r}}\right)^{-1}} Their relationship is: F β = 1 − E {\displaystyle F_{\beta }=1-E} where α = 1 1 + β 2 {\displaystyle \alpha ={\frac {1}{1+\beta ^{2}}}} == Diagnostic testing == This is related to the field of binary classification where recall is often termed "sensitivity". == Dependence of the F-score on class imbalance == Precision-recall curve, and thus the F β {\displaystyle F_{\beta }} score, explicitly depends on the ratio r {\displaystyle r} of positive to negative test cases. This means that comparison of the F-score across different problems with differing class ratios is problematic. One way to address this issue (see e.g., Siblini et al., 2020) is to use a standard class ratio r 0 {\displaystyle r_{0}} when making such comparisons. == Applications == The F-score is often used in the field of information retrieval for measuring search, document classification, and query classification performance. It is particularly relevant in applications which are primarily concerned with the positive class and where the positive class is rare relative to the negative class. Earlier works focused primarily on the F1 score, but with the proliferation of large scale search engines, performance goals changed to place more emphasis on either precision or recall and so F β {\displaystyle F_{\beta }} is seen in wide application. The F-score is also used in machine learning. However, the F-measures do not take true negatives into account, hence measures such as the Matthews correlation coefficient, Informedness or Cohen's kappa may be preferred to assess the performance of a binary classifier. The F-score has been widely used in the natural language processing literature, such as in the evaluation of named entity recognition and word segmentation. == Properties == The F1 score is the Dice coefficient of the set of retrieved items and the set of relevant items. The F1-score of a classifier which always predicts the positive class converges to 1 as the probability of the positive class increases. The F1-score of a classifier which always predicts the positive class is equal to 2 proportion_of_positive_class / ( 1 + proportion_of_positive_class ), since the recall is 1, and the precision is equal to the proportion of the positive class. If the scoring model is uninformative (cannot distinguish between the positive and negative class) then the optimal threshold is 0 so that the positive class is always predicted. F1 score is concave in the true positive rate. == Criticism == David Hand and others criticize the widespread use of the F1 score since it gives equal importance to precision and recall. In practice, different types of mis-classifications incur different costs. In other words, the relative importance of precision and recall is an aspect of the problem. According to Davide Chicco and Giuseppe Jurman, the F1 score is less truthful and informative than the Matthews correlation coefficient (MCC) in binary evaluation classification. David M W Powers has pointed out that F1 ignores the True Negatives and thus is misleading for unbalanced classes, while kappa and correlation measures are symmetric and assess both directions of predictability - the classifier predicting the true class and the true class predicting the classifier prediction, proposing separate multiclass measures Informedness and Markedness for the two directions, noting that their geometric mean is correlation. Another source of critique of F1 is its lack of symmetry. It means it may change its value when dataset labeling is changed - the "positive" samples are named "negative" and vice versa. This criticism is met by the P4 metric definition, which is sometimes indicated as a symmetrical extension of F1. Finally, Ferrer and Dyrland et al. argue that the expected cost (or its counterpart, the expected utility) is the only principled metric for evaluation of classification decisions, having various advantages over the F-score and the MCC. Both works show that the F-score can result in wrong conclusions about the absolute and relative quality of systems. == Difference from Fowlkes–Mallows index == While the F-measur

    Read more →
  • Trigger list

    Trigger list

    Trigger list in its most general meaning refers to a list whose items are used to initiate ("trigger") certain actions. == United States: Private financial information == In the United States, when a person applies for a mortgage loan, the lender makes a credit inquiry about the potential borrower from the national credit bureaus, Equifax, Experian and TransUnion. Unless the borrower is opted out, the credit bureaus put the applicants onto a "trigger list" of "leads" about persons who are interested in new loans. These lists are sold to numerous lenders all over the United States, and soon after the application the applicant starts receiving offers from all parts of the country. The trigger lists contain a significant amount of personal financial information. Among the buyers of trigger lists are "lead generators" which resell filtered information to borrowers, e.g., of people who live in a certain area and have a certain credit score. While the Federal Trade Commission considers the market of "trigger lists" to be a legal business, many people and organizations (such as the National Association of Mortgage Brokers) consider this a serious breach of privacy and lobby for putting this practice under regulatory controls. As of now, American consumers may opt-out from "trigger lists" by calling 1-888-5-OPTOUT (1-888-567-8688). == Nuclear non-proliferation == The Zangger Committee and the Nuclear Suppliers Group maintain lists of items that may contribute to nuclear proliferation; The nuclear non-proliferation treaty forbids its members to export such items to non-treaty members. these items are said to trigger the countries' responsibilities under the NPT, hence the name.

    Read more →
  • Sequential minimal optimization

    Sequential minimal optimization

    Sequential minimal optimization (SMO) is an algorithm for solving the quadratic programming (QP) problem that arises during the training of support-vector machines (SVM). It was invented by John Platt in 1998 at Microsoft Research. SMO is widely used for training support vector machines and is implemented by the popular LIBSVM tool. The publication of the SMO algorithm in 1998 has generated a lot of excitement in the SVM community, as previously available methods for SVM training were much more complex and required expensive third-party QP solvers. == Optimization problem == Consider a binary classification problem with a dataset (x1, y1), ..., (xn, yn), where xi is an input vector and yi ∈ {-1, +1} is a binary label corresponding to it. A soft-margin support vector machine is trained by solving a quadratic programming problem, which is expressed in the dual form as follows: max α ∑ i = 1 n α i − 1 2 ∑ i = 1 n ∑ j = 1 n y i y j K ( x i , x j ) α i α j , {\displaystyle \max _{\alpha }\sum _{i=1}^{n}\alpha _{i}-{\frac {1}{2}}\sum _{i=1}^{n}\sum _{j=1}^{n}y_{i}y_{j}K(x_{i},x_{j})\alpha _{i}\alpha _{j},} subject to: 0 ≤ α i ≤ C , for i = 1 , 2 , … , n , {\displaystyle 0\leq \alpha _{i}\leq C,\quad {\mbox{ for }}i=1,2,\ldots ,n,} ∑ i = 1 n y i α i = 0 {\displaystyle \sum _{i=1}^{n}y_{i}\alpha _{i}=0} where C is an SVM hyperparameter and K(xi, xj) is the kernel function, both supplied by the user; and the variables α i {\displaystyle \alpha _{i}} are Lagrange multipliers. == Algorithm == SMO is an iterative algorithm for solving the optimization problem described above. SMO breaks this problem into a series of smallest possible sub-problems, which are then solved analytically. Because of the linear equality constraint involving the Lagrange multipliers α i {\displaystyle \alpha _{i}} , the smallest possible problem involves two such multipliers. Then, for any two multipliers α 1 {\displaystyle \alpha _{1}} and α 2 {\displaystyle \alpha _{2}} , the constraints are reduced to: 0 ≤ α 1 , α 2 ≤ C , {\displaystyle 0\leq \alpha _{1},\alpha _{2}\leq C,} y 1 α 1 + y 2 α 2 = k , {\displaystyle y_{1}\alpha _{1}+y_{2}\alpha _{2}=k,} and this reduced problem can be solved analytically: one needs to find a minimum of a one-dimensional quadratic function. k {\displaystyle k} is the negative of the sum over the rest of terms in the equality constraint, which is fixed in each iteration. The algorithm proceeds as follows: Find a Lagrange multiplier α 1 {\displaystyle \alpha _{1}} that violates the Karush–Kuhn–Tucker (KKT) conditions for the optimization problem. Pick a second multiplier α 2 {\displaystyle \alpha _{2}} and optimize the pair ( α 1 , α 2 ) {\displaystyle (\alpha _{1},\alpha _{2})} . Repeat steps 1 and 2 until convergence. When all the Lagrange multipliers satisfy the KKT conditions (within a user-defined tolerance), the problem has been solved. Although this algorithm is guaranteed to converge, heuristics are used to choose the pair of multipliers so as to accelerate the rate of convergence. This is critical for large data sets since there are n ( n − 1 ) / 2 {\displaystyle n(n-1)/2} possible choices for α i {\displaystyle \alpha _{i}} and α j {\displaystyle \alpha _{j}} . == Related work == The first approach to splitting large SVM learning problems into a series of smaller optimization tasks was proposed by Bernhard Boser, Isabelle Guyon, and Vladimir Vapnik. It is known as the "chunking algorithm". The algorithm starts with a random subset of the data, solves this problem, and iteratively adds examples which violate the optimality conditions. One disadvantage of this algorithm is that it is necessary to solve QP-problems scaling with the number of SVs. On real world sparse data sets, SMO can be more than 1000 times faster than the chunking algorithm. In 1997, E. Osuna, R. Freund, and F. Girosi proved a theorem which suggests a whole new set of QP algorithms for SVMs. By the virtue of this theorem a large QP problem can be broken down into a series of smaller QP sub-problems. A sequence of QP sub-problems that always add at least one violator of the Karush–Kuhn–Tucker (KKT) conditions is guaranteed to converge. The chunking algorithm obeys the conditions of the theorem, and hence will converge. The SMO algorithm can be considered a special case of the Osuna algorithm, where the size of the optimization is two and both Lagrange multipliers are replaced at every step with new multipliers that are chosen via good heuristics. The SMO algorithm is closely related to a family of optimization algorithms called Bregman methods or row-action methods. These methods solve convex programming problems with linear constraints. They are iterative methods where each step projects the current primal point onto each constraint.

    Read more →
  • How to Choose an AI Text-to-video Tool

    How to Choose an AI Text-to-video Tool

    Comparing the best AI text-to-video tool? An AI text-to-video tool is software that uses machine learning to help you get more done — it lowers the barrier so anyone can produce professional output. Privacy matters too: check whether your data trains the model and whether a no-log or enterprise tier is available. Whether you are a beginner or a pro, the right AI text-to-video tool slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →