AI Code Ui

AI Code Ui — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Fantavision

    Fantavision

    Fantavision is an animation program by Scott Anderson for the Apple II and published by Broderbund in 1985. Versions were released for the Apple IIGS (1987), Amiga (1988), and MS-DOS (1988). Fantavision allows the creation of vector graphics animations using the mouse and keyboard. The user creates frames, and the software generates the frames between them. Because this is done in real-time, it allows for creative exploration and quick changes. The program uses a graphical user interface in the style of the Macintosh with pull-down menus and black text on a white background. Advertisements claimed Fantavision a revolutionary breakthrough that brings the animation features of "tweening" and "transforming" to home computers. == Reception == Compute! in 1989 called Fantavision the best animation program for the IBM PC, although it noted the inability to draw curves. == Reviews == Games #70

    Read more →
  • PROMT

    PROMT

    ProMT is a lead Russian developer of language translation software for businesses and private users since 1991. The company provides on-premises software based on neural technologies. == History == On March 6, 1998, ProMT launched a free online translation services, which is now known as PROMT.One. In 1997, ProMT and the French company Softissimo developed a line of products for the European company Reverso. == Technology == Historically, ProMT systems used rule-based machine translation (RBMT) technology. In 2011 a hybrid approach which combined rule-based and statistical MT was implemented. In 2019, ProMT introduced its new neural technology and flagship solution - PROMT Neural Translation Server. Since then all MT systems developed by ProMT are based on neural machine translation. The software can run on Microsoft Windows, Linux, MacOS, iOS and Android and works in offline mode providing secure machine translation. As of 2025, it translates 62 languages from and to English, German, and Russian.

    Read more →
  • Mehryar Mohri

    Mehryar Mohri

    Mehryar Mohri is a professor and theoretical computer scientist at the Courant Institute of Mathematical Sciences. He is also heading the Machine Learning Theory (ML Theory) team at Google Research. == Career == Prior to joining the Courant Institute, Mohri was a research department head and later technology leader at AT&T Bell Labs, where he was a member of the technical staff for about ten years. Mohri has also taught as an assistant professor at the University of Paris 7 (1992-1993) and Ecole Polytechnique (1992-1994). == Research == Mohri's main area of research is machine learning, in particular learning theory. He is also an expert in automata theory and algorithms. He is the author of several core algorithms that have served as the foundation for the design of many deployed speech recognition and natural language processing systems. == Publications == Mohri is the author of the reference book Foundations of Machine Learning used as a textbook in many graduate-level machine learning courses. Mohri is also a member of the Lothaire group of mathematicians with the pseudonym M. Lothaire and contributed to the book on Applied Combinatorics on Words. He is the author of more than 250 conference and journal publications. == Organizational affiliations == Mohri is currently the President of the Association for Algorithmic Learning Theory (AALT) and the Steering Committee Chair for the ALT conference. He is also Editorial Board member of Machine Learning and TheoretiCS, Action Editor of the Journal of Machine Learning Research (JMLR) and a member of the advisory board for the Journal of Automata, Languages and Combinatorics.

    Read more →
  • Lior Ron (business executive)

    Lior Ron (business executive)

    Lior Ron (born March 16, 1977) is an Israeli businessman. He is the founder, chairman and former CEO of logistics technology company Uber Freight, co-founder of self-driving truck company Otto, and COO of self-driving technology company Waabi. == Early life and education == Ron grew up in Israel near Haifa. He attended the Technion – Israel Institute of Technology in Haifa, where he earned a bachelor's degree in computer science in 1997. He then joined Israeli Army Intelligence, where he served until 2004. After the Army, he earned a master's degree in computer science at Technion, incorporating artificial intelligence as he developed a biomedical device to assist patients suffering with Parkinson's disease. He then moved to California and earned an MBA from The Stanford Graduate School of Business. His undergraduate work and master's thesis were centered around AI when it was still in its early stages. == Career == === Google === In 2007, Ron joined Google as the Product Lead for Google Maps. He then worked at Motorola Mobility after it was acquired by Google, and in Google's robotics research effort. === Otto === In 2016, Ron left Google to found Otto, a company that makes self-driving kits to retrofit big rig trucks. Quoted in Wired, Ron said he left Google because he “felt an obligation to bring this technology to society sooner rather than later.” Otto launched in May 2016, and was acquired by Uber in late July of the same year. The Uber partnership allowed Ron and Otto the opportunity to develop a freight marketplace for truck drivers. === Uber Freight === On May 18, 2017, Ron and Uber launched Uber Freight, a unit of Uber initially designed as an app connecting long-haul truck drivers with companies in need of cargo shipping, with Ron as CEO. In August 2018, Uber Freight launched a new digital platform focused on shippers, to help them find the right driver for their needs. In 2021, Uber Freight acquired Transplace for $2.25 billion, expanding its services to include managed transportation, logistics software, and consulting. With Ron as CEO, Uber Freight has evolved into a full-scale logistics technology company for shippers and drivers, as Ron introduced more advanced generative AI capabilities to Uber Freight's software and Insights AI logistics platform. In September 2024, the company announced it manages nearly $20 billion in freight, and serves one in three Fortune 500 companies. In May 2025, the company launched the transportation industry's first large-scale AI-powered logistics network, with its large language model embedded directly into its transportation management system. === Waabi === On August 12, 2025, it was reported that Ron had been named chief operating officer of Waabi, a company developing autonomous driving technology using artificial intelligence. He remains as chairman of Uber Freight, with Rebecca Tinucci taking over as CEO. == Controversy == Ron co-founded Otto with Anthony Levandowski, who faces a lawsuit brought in 2017 from Google's parent company Alphabet that alleges Levandowski stole trade secrets while working for Alphabet's self-driving car division before he and Ron co-founded Otto.

    Read more →
  • ZygoteBody

    ZygoteBody

    ZygoteBody, formerly Google Body, is a web application by Zygote Media Group that renders manipulable 3D anatomical models of the human body. Several layers, from muscle tissues down to blood vessels, can be removed or made transparent to allow better study of individual body parts. Most of the body parts are labelled and are searchable. == Technology == The human models are based on data from the Zygote Media Group. The website uses JavaScript and WebGL technology to display 3D images inside the web browser without requiring the installation of external browser plug-ins. == History == ZygoteBody was launched as Google Body on December 15, 2010. On April Fools' Day 2011, users were greeted with the anatomy of a cow on the home page. The cow model is still available as part of the open-3d-viewer open source project. As part of the wind down on Google Labs, it was announced that Google Body will be shut down but will continue to be maintained by Zygote as ZygoteBody. On October 13, 2011, the Google Body site was shut down. Then, on January 9, 2012, ZygoteBody was launched and core code base (with the Google Cow model as a demo) was made available as an open source project called open-3d-viewer.

    Read more →
  • F-score

    F-score

    In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all samples predicted to be positive, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and recall is also known as sensitivity in diagnostic binary classification. The F1 score is the harmonic mean of the precision and recall. It thus symmetrically represents both precision and recall in one metric. The more generic F β {\displaystyle F_{\beta }} score applies additional weights, valuing one of precision or recall more than the other. The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if the precision or the recall is zero. == Etymology == The name F-measure is believed to be named after a different F function in Van Rijsbergen's book, when introduced to the Fourth Message Understanding Conference (MUC-4, 1992). == Definition == The traditional F-measure or balanced F-score (F1 score) is the harmonic mean of precision and recall: F 1 = 2 r e c a l l − 1 + p r e c i s i o n − 1 = 2 p r e c i s i o n ⋅ r e c a l l p r e c i s i o n + r e c a l l = 2 T P 2 T P + F P + F N {\displaystyle F_{1}={\frac {2}{\mathrm {recall} ^{-1}+\mathrm {precision} ^{-1}}}=2{\frac {\mathrm {precision} \cdot \mathrm {recall} }{\mathrm {precision} +\mathrm {recall} }}={\frac {2\mathrm {TP} }{2\mathrm {TP} +\mathrm {FP} +\mathrm {FN} }}} With precision = TP / (TP + FP) and recall = TP / (TP + FN), it follows that the numerator of F1 is the sum of their numerators and the denominator of F1 is the sum of their denominators. If FP=FN F 1 = 2 T P 2 T P + 2 F P = T P T P + F P {\displaystyle F_{1}={\frac {2\mathrm {TP} }{2\mathrm {TP} +2\mathrm {FP} }}={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FP} }}} or F 1 = 2 T P 2 T P + 2 F N = T P T P + F N {\displaystyle F_{1}={\frac {2\mathrm {TP} }{2\mathrm {TP} +2\mathrm {FN} }}={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FN} }}} So, F1 = precision = recall If TP=FP=FN F 1 = 2 T P 2 T P + 2 F P = 2 T P 4 T P = 1 2 = 0.5 {\displaystyle F_{1}={\frac {2\mathrm {TP} }{2\mathrm {TP} +2\mathrm {FP} }}={\frac {2\mathrm {TP} }{4\mathrm {TP} }}={\frac {1}{2}}=0.5} or F 1 = 2 T P 2 T P + 2 F N = 2 T P 4 T P = 1 2 = 0.5 {\displaystyle F_{1}={\frac {2\mathrm {TP} }{2\mathrm {TP} +2\mathrm {FN} }}={\frac {2\mathrm {TP} }{4\mathrm {TP} }}={\frac {1}{2}}=0.5} To see it as a harmonic mean, note that F 1 − 1 = 1 2 ( r e c a l l − 1 + p r e c i s i o n − 1 ) {\displaystyle F_{1}^{-1}={\frac {1}{2}}(\mathrm {recall} ^{-1}+\mathrm {precision} ^{-1})} . === Fβ score === A more general F score, F β {\displaystyle F_{\beta }} , that uses a positive real factor β {\displaystyle \beta } , where β {\displaystyle \beta } is chosen such that recall is considered β {\displaystyle \beta } times as important as precision, is: F β = β 2 + 1 ( β 2 ⋅ r e c a l l − 1 ) + p r e c i s i o n − 1 = ( 1 + β 2 ) ⋅ p r e c i s i o n ⋅ r e c a l l ( β 2 ⋅ p r e c i s i o n ) + r e c a l l {\displaystyle F_{\beta }={\frac {\beta ^{2}+1}{(\beta ^{2}\cdot \mathrm {recall} ^{-1})+\mathrm {precision} ^{-1}}}={\frac {(1+\beta ^{2})\cdot \mathrm {precision} \cdot \mathrm {recall} }{(\beta ^{2}\cdot \mathrm {precision} )+\mathrm {recall} }}} To see that as a weighted harmonic mean, note that F β − 1 = 1 β + β − 1 ( β ⋅ r e c a l l − 1 + β − 1 ⋅ p r e c i s i o n − 1 ) {\displaystyle F_{\beta }^{-1}={\frac {1}{\beta +\beta ^{-1}}}(\beta \cdot \mathrm {recall} ^{-1}+\beta ^{-1}\cdot \mathrm {precision} ^{-1})} . In terms of Type I and type II errors this becomes: F β = ( 1 + β 2 ) ⋅ T P ( 1 + β 2 ) ⋅ T P + β 2 ⋅ F N + F P = ( 1 + β 2 ) ⋅ T P ( T P + F N ) ⋅ β 2 + ( T P + F P ) {\displaystyle F_{\beta }={\frac {(1+\beta ^{2})\cdot \mathrm {TP} }{(1+\beta ^{2})\cdot \mathrm {TP} +\beta ^{2}\cdot \mathrm {FN} +\mathrm {FP} }}\,={\frac {(1+\beta ^{2})\cdot \mathrm {TP} }{(\mathrm {TP} +\mathrm {FN} )\cdot \beta ^{2}+(\mathrm {TP} +\mathrm {FP} )}}\,} Two commonly used values for β {\displaystyle \beta } are 2, which weighs recall higher than precision, and 1/2, which weighs recall lower than precision. The F-measure was derived so that F β {\displaystyle F_{\beta }} "measures the effectiveness of retrieval with respect to a user who attaches β {\displaystyle \beta } times as much importance to recall as precision". It is based on Van Rijsbergen's effectiveness measure E = 1 − ( α p + 1 − α r ) − 1 {\displaystyle E=1-\left({\frac {\alpha }{p}}+{\frac {1-\alpha }{r}}\right)^{-1}} Their relationship is: F β = 1 − E {\displaystyle F_{\beta }=1-E} where α = 1 1 + β 2 {\displaystyle \alpha ={\frac {1}{1+\beta ^{2}}}} == Diagnostic testing == This is related to the field of binary classification where recall is often termed "sensitivity". == Dependence of the F-score on class imbalance == Precision-recall curve, and thus the F β {\displaystyle F_{\beta }} score, explicitly depends on the ratio r {\displaystyle r} of positive to negative test cases. This means that comparison of the F-score across different problems with differing class ratios is problematic. One way to address this issue (see e.g., Siblini et al., 2020) is to use a standard class ratio r 0 {\displaystyle r_{0}} when making such comparisons. == Applications == The F-score is often used in the field of information retrieval for measuring search, document classification, and query classification performance. It is particularly relevant in applications which are primarily concerned with the positive class and where the positive class is rare relative to the negative class. Earlier works focused primarily on the F1 score, but with the proliferation of large scale search engines, performance goals changed to place more emphasis on either precision or recall and so F β {\displaystyle F_{\beta }} is seen in wide application. The F-score is also used in machine learning. However, the F-measures do not take true negatives into account, hence measures such as the Matthews correlation coefficient, Informedness or Cohen's kappa may be preferred to assess the performance of a binary classifier. The F-score has been widely used in the natural language processing literature, such as in the evaluation of named entity recognition and word segmentation. == Properties == The F1 score is the Dice coefficient of the set of retrieved items and the set of relevant items. The F1-score of a classifier which always predicts the positive class converges to 1 as the probability of the positive class increases. The F1-score of a classifier which always predicts the positive class is equal to 2 proportion_of_positive_class / ( 1 + proportion_of_positive_class ), since the recall is 1, and the precision is equal to the proportion of the positive class. If the scoring model is uninformative (cannot distinguish between the positive and negative class) then the optimal threshold is 0 so that the positive class is always predicted. F1 score is concave in the true positive rate. == Criticism == David Hand and others criticize the widespread use of the F1 score since it gives equal importance to precision and recall. In practice, different types of mis-classifications incur different costs. In other words, the relative importance of precision and recall is an aspect of the problem. According to Davide Chicco and Giuseppe Jurman, the F1 score is less truthful and informative than the Matthews correlation coefficient (MCC) in binary evaluation classification. David M W Powers has pointed out that F1 ignores the True Negatives and thus is misleading for unbalanced classes, while kappa and correlation measures are symmetric and assess both directions of predictability - the classifier predicting the true class and the true class predicting the classifier prediction, proposing separate multiclass measures Informedness and Markedness for the two directions, noting that their geometric mean is correlation. Another source of critique of F1 is its lack of symmetry. It means it may change its value when dataset labeling is changed - the "positive" samples are named "negative" and vice versa. This criticism is met by the P4 metric definition, which is sometimes indicated as a symmetrical extension of F1. Finally, Ferrer and Dyrland et al. argue that the expected cost (or its counterpart, the expected utility) is the only principled metric for evaluation of classification decisions, having various advantages over the F-score and the MCC. Both works show that the F-score can result in wrong conclusions about the absolute and relative quality of systems. == Difference from Fowlkes–Mallows index == While the F-measur

    Read more →
  • Rada Mihalcea

    Rada Mihalcea

    Rada Mihalcea is the Janice M. Jenkins Collegiate Professor of Computer Science and Engineering at the University of Michigan. She has made significant contributions to natural language processing, multimodal processing, computational social science, and AI for Social Good. With Paul Tarau, she invented the TextRank Algorithm, which is a classic algorithm widely used for text summarization. == Career == Mihalcea has a Ph.D. in Computer Science and Engineering from Southern Methodist University (2001) and a Ph.D. in Linguistics, Oxford University (2010). In 2017 she was named Director of the Artificial Intelligence Laboratory at University of Michigan, Computer Science and Engineering. In 2018, Mihalcea was elected as vice president for the Association for Computational Linguistics (ACL). In 2021, she was elected the president for ACL. She is a professor of Computer Science and Engineering at the University of Michigan, where she also leads the Language and Information Technologies (LIT) Lab. Before joining UofM, she was a professor at North Texas University between 2002-2013. A prolific researcher, Mihalcea has authored or coauthored over 500 articles since 1998 on topics ranging from semantic analysis of text to lie detection. Her work has been cited over 50,000 times on Google Scholar, which made her one of the most cited scholars in Multimodal Interaction and Computational Social Science. In 2008, Mihalcea received the Presidential Early Career Award for Scientists and Engineers (PECASE) She is an ACM Fellow (since 2019), AAAI Fellow (since 2021), and ACL Fellow (since 2025). Mihalcea is an outspoken promoter of diversity in computer science. She also supports an expansion of the traditional analysis of educational success, which tends to focus on academic behaviour, to include student life, personality and background outside of the classroom. Mihalcea leads Girls Encoded, a program designed to develop the pipeline of women in computer science as well as to retain the women who have entered into the program. == Awards == Elected to American Academy of Arts & Sciences, 2026 ACL Fellow, 2025 "for significant contributions to graph-based language processing, computational social science, and the advancement of NLP for social good." AAAI Fellow, 2021 "for significant contributions to natural language processing and computational social science". ACM Fellow, 2019 "for contributions to natural language processing, with innovations in data-driven and graph-based language processing". Sarah Goddard Power Award, 2019. Carol Hollenshead Award, 2018. Presidential Early Career Award for Scientists and Engineers (PECASE), 2009. Awarded by President Barack Obama. == Research == Mihalcea is known for her research in natural language processing, multimodal processing, computational social sciences. In a collaboration she leads at the University of Michigan, Mihalcea has created software that can detect human lying. In a study of video clips of high profile court cases, a computer was more accurate at detecting deception than human judges. Mihalcea's lie-detection software uses machine learning techniques to analyze video clips of actual trials. In her 2015 study, the team used clips from The Innocence Project, a national organization that works to reexamine cases where individuals were tried without the benefit of DNA testing with the aim of exonerating wrongfully convicted individuals. After identifying common human gestures, they transcribed the audio from the video clips of trials and analyzed how often subjects labeled deceptive used various words and phrases. The system was 75% accurate in identifying which subjects were deceptive among 120 videos. That puts Mihalcea's algorithm on par with the most commonly accepted form of lie detection, polygraph tests, which are roughly 85 percent accurate when testing guilty people and 56 percent accurate when testing the innocent. She notes there are still improvements to be made — in particular to account for cultural and demographic differences. A possibly unique advantage of Mihalcea's study was the real world, high stakes nature of the footage analyzed in the study. In laboratory experiments, it is difficult to create a setting that motivates people to truly lie. In 2018, Mihalcea and her collaborators worked on an algorithm-based system that identifies linguistic cues in fake news stories. It successfully found fakes up to 76% of the time, compared to a human success rate of 70%. == Publications == === Books === Rada Mihalcea and Dragomir Radev, Graph-based Natural Language Processing and Information Retrieval, Cambridge U. Press, 2011. Gabe Ignatow and Rada Mihalcea, Text Mining: A Guidebook for the Social Sciences, SAGE, 2016. Gabe Ignatow and Rada Mihalcea, An Introduction to Text Mining: Research Design, Data Collection, and Analysis, SAGE, 2017. === Journals and conferences === Textrank: Bringing order into text. R. Mihalcea, P. Tarau. Proceedings of the 2004 conference on empirical methods in natural language processing. 2004 Corpus-based and knowledge-based measures of text semantic similarity. R. Mihalcea, C. Corley, C. Strapparava. AAAI 6, 775-780. 2006 Wikify!: linking documents to encyclopedic knowledge. R. Mihalcea, A. Csomai. Proceedings of the sixteenth ACM conference on Conference on information and information management. 2007 Learning to identify emotions in text. C. Strapparava, R. Mihalcea. Proceedings of the 2008 ACM symposium on Applied computing, 1556-1560. 2008 Semeval-2007 task 14: Affective text. C. Strapparava, R. Mihalcea. Proceedings of the Fourth International Workshop on Semantic Evaluations. 2007 Learning multilingual subjective language via cross-lingual projections. R. Mihalcea, C. Banea, J. Wiebe. Proceedings of the 45th annual meeting of the association of computational linguistics. 2007 Graph-based ranking algorithms for sentence extraction, applied to text summarization. R. Mihalcea. Proceedings of the ACL Interactive Poster and Demonstration Sessions. 2004 Falcon: Boosting knowledge for answer engines. S. Harabagiu, D. Moldovan, M. Pasca, R. Mihalcea, M. Surdeanu, Razvan Bunescu, Roxana Girju, Vasile Rus, Paul Morarescu. TREC 9, 479-488. 2000 Measuring the semantic similarity of texts. C. Corley, R. Mihalcea. Proceedings of the ACL workshop on empirical modeling of semantic equivalence and entailment. 2005 R Mihalcea (2007). "Using wikipedia for automatic word-sense disambiguation". Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference. CiteSeerX 10.1.1.74.3561. - see also Word-sense disambiguation Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. R. Sinha, R. Mihalcea. International Conference on Semantic Computing (ICSC 2007), 363-369. 2007 == Personal life == Mihalcea was born in Cluj-Napoca, Romania, where she attended the Technical University of Cluj-Napoca. She can speak Romanian, English, Italian, and French. Mihalcea has two children - Zara (b. 2009) and Caius (b. 2013). They were both born in Dallas, Texas. She is married to an associate professor of engineering at the University of Michigan–Flint - Mihai Burzo. They met while they were both completing Ph.D.s at Southern Methodist University in 2001 and have often collaborated on research, such as the 2015 study on lie detection.

    Read more →
  • How to Choose an Conversational AI Platform

    How to Choose an Conversational AI Platform

    Trying to pick the best conversational AI platform? An conversational AI platform is software that uses machine learning to help you get more done — it scales effortlessly from a single task to thousands. The best picks balance beginner-friendly simplicity with the depth power users need, and they ship updates often. Whether you are a beginner or a pro, the right conversational AI platform slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training

    Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective. This method has enabled broad applications across multiple domains, including cross-modal retrieval, text-to-image generation, and aesthetic ranking. == Algorithm == The CLIP method trains a pair of models contrastively. One model takes in a piece of text as input and outputs a single vector representing its semantic content. The other model takes in an image and similarly outputs a single vector representing its visual content. The models are trained so that the vectors corresponding to semantically similar text-image pairs are close together in the shared vector space, while those corresponding to dissimilar pairs are far apart. To train a pair of CLIP models, one would start by preparing a large dataset of image-caption pairs. During training, the models are presented with batches of N {\displaystyle N} image-caption pairs. Let the outputs from the text and image models be respectively v 1 , . . . , v N , w 1 , . . . , w N {\displaystyle v_{1},...,v_{N},w_{1},...,w_{N}} . Two vectors are considered "similar" if their dot product is large. The loss incurred on this batch is the multi-class N-pair loss, which is a symmetric cross-entropy loss over similarity scores: − 1 N ∑ i ln ⁡ e v i ⋅ w i / T ∑ j e v i ⋅ w j / T − 1 N ∑ j ln ⁡ e v j ⋅ w j / T ∑ i e v i ⋅ w j / T {\displaystyle -{\frac {1}{N}}\sum _{i}\ln {\frac {e^{v_{i}\cdot w_{i}/T}}{\sum _{j}e^{v_{i}\cdot w_{j}/T}}}-{\frac {1}{N}}\sum _{j}\ln {\frac {e^{v_{j}\cdot w_{j}/T}}{\sum _{i}e^{v_{i}\cdot w_{j}/T}}}} In essence, this loss function encourages the dot product between matching image and text vectors ( v i ⋅ w i {\displaystyle v_{i}\cdot w_{i}} ) to be high, while discouraging high dot products between non-matching pairs. The parameter T > 0 {\displaystyle T>0} is the temperature, which is parameterized in the original CLIP model as T = e − τ {\displaystyle T=e^{-\tau }} where τ ∈ R {\displaystyle \tau \in \mathbb {R} } is a learned parameter. Other loss functions are possible. For example, Sigmoid CLIP (SigLIP) proposes the following loss function: L = 1 N ∑ i , j ∈ 1 : N f ( ( 2 δ i , j − 1 ) ( e τ w i ⋅ v j + b ) ) {\displaystyle L={\frac {1}{N}}\sum _{i,j\in 1:N}f((2\delta _{i,j}-1)(e^{\tau }w_{i}\cdot v_{j}+b))} where f ( x ) = ln ⁡ ( 1 + e − x ) {\displaystyle f(x)=\ln(1+e^{-x})} is the negative log sigmoid loss, and the Dirac delta symbol δ i , j {\displaystyle \delta _{i,j}} is 1 if i = j {\displaystyle i=j} else 0. == CLIP models == While the original model was developed by OpenAI, subsequent models have been trained by other organizations as well. === Image model === The image encoding models used in CLIP are typically vision transformers (ViT). The naming convention for these models often reflects the specific ViT architecture used. For instance, "ViT-L/14" means a "vision transformer large" (compared to other models in the same series) with a patch size of 14, meaning that the image is divided into 14-by-14 pixel patches before being processed by the transformer. The size indicator ranges from B, L, H, G (base, large, huge, giant), in that order. Other than ViT, the image model is typically a convolutional neural network, such as ResNet (in the original series by OpenAI), or ConvNeXt (in the OpenCLIP model series by LAION). Since the output vectors of the image model and the text model must have exactly the same length, both the image model and the text model have fixed-length vector outputs, which in the original report is called "embedding dimension". For example, in the original OpenAI model, the ResNet models have embedding dimensions ranging from 512 to 1024, and for the ViTs, from 512 to 768. Its implementation of ViT was the same as the original one, with one modification: after position embeddings are added to the initial patch embeddings, there is a LayerNorm. Its implementation of ResNet was the same as the original one, with 3 modifications: In the start of the CNN (the "stem"), they used three stacked 3x3 convolutions instead of a single 7x7 convolution, as suggested by. There is an average pooling of stride 2 at the start of each downsampling convolutional layer (they called it rect-2 blur pooling according to the terminology of ). This has the effect of blurring images before downsampling, for antialiasing. The final convolutional layer is followed by a multiheaded attention pooling. ALIGN a model with similar capabilities, trained by researchers from Google used EfficientNet, a kind of convolutional neural network. === Text model === The text encoding models used in CLIP are typically Transformers. In the original OpenAI report, they reported using a Transformer (63M-parameter, 12-layer, 512-wide, 8 attention heads) with lower-cased byte pair encoding (BPE) with 49152 vocabulary size. Context length was capped at 76 for efficiency. Like GPT, it was decoder-only, with only causally-masked self-attention. Its architecture is the same as GPT-2. Like BERT, the text sequence is bracketed by two special tokens [SOS] and [EOS] ("start of sequence" and "end of sequence"). Take the activations of the highest layer of the transformer on the [EOS], apply LayerNorm, then a final linear map. This is the text encoding of the input sequence. The final linear map has output dimension equal to the embedding dimension of whatever image encoder it is paired with. These models all had context length 77 and vocabulary size 49408. ALIGN used BERT of various sizes. == Dataset == === WebImageText === The CLIP models released by OpenAI were trained on a dataset called "WebImageText" (WIT) containing 400 million pairs of images and their corresponding captions scraped from the internet. The total number of words in this dataset is similar in scale to the WebText dataset used for training GPT-2, which contains about 40 gigabytes of text data. The dataset contains 500,000 text-queries, with up to 20,000 (image, text) pairs per query. The text-queries were generated by starting with all words occurring at least 100 times in English Wikipedia, then extended by bigrams with high mutual information, names of all Wikipedia articles above a certain search volume, and WordNet synsets. The dataset is private and has not been released to the public, and there is no further information on it. ==== Data preprocessing ==== For the CLIP image models, the input images are preprocessed by first dividing each of the R, G, B values of an image by the maximum possible value, so that these values fall between 0 and 1, then subtracting by [0.48145466, 0.4578275, 0.40821073], and dividing by [0.26862954, 0.26130258, 0.27577711]. The rationale was that these are the mean and standard deviations of the images in the WebImageText dataset, so this preprocessing step roughly whitens the image tensor. These numbers slightly differ from the standard preprocessing for ImageNet, which uses [0.485, 0.456, 0.406] and [0.229, 0.224, 0.225]. If the input image does not have the same resolution as the native resolution (224×224 for all except ViT-L/14@336px, which has 336×336 resolution), then the input image is first scaled by bicubic interpolation, so that its shorter side is the same as the native resolution, then the central square of the image is cropped out. === Others === ALIGN used over one billion image-text pairs, obtained by extracting images and their alt-tags from online crawling. The method was described as similar to how the Conceptual Captions dataset was constructed, but instead of complex filtering, they only applied a frequency-based filtering. Later models trained by other organizations had published datasets. For example, LAION trained OpenCLIP with published datasets LAION-400M, LAION-2B, and DataComp-1B. == Training == In the original OpenAI CLIP report, they reported training 5 ResNet and 3 ViT (ViT-B/32, ViT-B/16, ViT-L/14). Each was trained for 32 epochs. The largest ResNet model took 18 days to train on 592 V100 GPUs. The largest ViT model took 12 days on 256 V100 GPUs. All ViT models were trained on 224×224 image resolution. The ViT-L/14 was then boosted to 336×336 resolution by FixRes, resulting in a model. They found this was the best-performing model. In the OpenCLIP series, the ViT-L/14 model was trained on 384 A100 GPUs on the LAION-2B dataset, for 160 epochs for a total of 32B samples seen. == Applications == === Cross-modal retrieval === CLIP's cross-modal retrieval enables the alignment of visual and textual data in a shared latent space, allowing users to retrieve images based on text descriptions and vice versa, without the need for explicit image annotations. In text-to-image retrieval, users input descriptive text, and CLIP retrieves images with matching embeddings. In image-to-text retrieval, images are used to find related text content. CLIP’s ability to connect vis

    Read more →
  • Best AI Writing Assistants in 2026

    Best AI Writing Assistants in 2026

    In search of the best AI writing assistant? An AI writing assistant is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI writing assistant slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Markov chain geostatistics

    Markov chain geostatistics

    Markov chain geostatistics uses Markov chain spatial models, simulation algorithms and associated spatial correlation measures (e.g., transiogram) based on the Markov chain random field theory, which extends a single Markov chain into a multi-dimensional random field for geostatistical modeling. A Markov chain random field is still a single spatial Markov chain. The spatial Markov chain moves or jumps in a space and decides its state at any unobserved location through interactions with its nearest known neighbors in different directions. The data interaction process can be well explained as a local sequential Bayesian updating process within a neighborhood. Because single-step transition probability matrices are difficult to estimate from sparse sample data and are impractical in representing the complex spatial heterogeneity of states, the transiogram, which is defined as a transition probability function over the distance lag, is proposed as the accompanying spatial measure of Markov chain random fields.

    Read more →
  • Unsupervised learning

    Unsupervised learning

    Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, where a small portion of the data is tagged, and self-supervision. Some researchers consider self-supervised learning a form of unsupervised learning. Conceptually, unsupervised learning divides into the aspects of data, training, algorithm, and downstream applications. Typically, the dataset is harvested cheaply "in the wild", such as massive text corpus obtained by web crawling, with only minor filtering (such as Common Crawl). This compares favorably to supervised learning, where the dataset (such as the ImageNet1000) is typically constructed manually, which is much more expensive. There are algorithms designed specifically for unsupervised learning, such as clustering algorithms like k-means, dimensionality reduction techniques like principal component analysis (PCA), Boltzmann machine learning, and autoencoders. After the rise of deep learning, most large-scale unsupervised learning has been done by training general-purpose neural network architectures by gradient descent, adapted to performing unsupervised learning by designing an appropriate training procedure. Sometimes a trained model can be used as-is, but more often they are modified for downstream applications. For example, the generative pretraining method trains a model to generate a textual dataset, before finetuning it for other applications, such as text classification. As another example, autoencoders are trained to produce good features, which can then be used as a module for other models, such as in a latent diffusion model. == Tasks == Tasks are often categorized as discriminative (recognition) or generative (imagination). Often but not always, discriminative tasks use supervised methods and generative tasks use unsupervised (see Venn diagram); however, the separation is very hazy. For example, object recognition favors supervised learning but unsupervised learning can also cluster objects into groups. Furthermore, as progress marches onward, some tasks employ both methods, and some tasks swing from one to another. For example, image recognition started off as heavily supervised, but became hybrid by employing unsupervised pre-training, and then moved towards supervision again with the advent of dropout, ReLU, and adaptive learning rates. A typical generative task is as follows. At each step, a datapoint is sampled from the dataset, and part of the data is removed, and the model must infer the removed part. This is particularly clear for the denoising autoencoders and BERT. == Neural network architectures == === Training === During the learning phase, an unsupervised network tries to mimic the data it is given and uses the error in its mimicked output to correct itself (i.e. correct its weights and biases). Sometimes the error is expressed as a low probability that the erroneous output occurs, or it might be expressed as an unstable high energy state in the network. In contrast to supervised methods' dominant use of backpropagation, unsupervised learning also employs other methods including: Hopfield learning rule, Boltzmann learning rule, Contrastive Divergence, Wake Sleep, Variational Inference, Maximum Likelihood, Maximum A Posteriori, Gibbs Sampling, and backpropagating reconstruction errors or hidden state reparameterizations. See the table below for more details. === Energy === An energy function is a macroscopic measure of a network's activation state. In Boltzmann machines, it plays the role of the Cost function. This analogy with physics is inspired by Ludwig Boltzmann's analysis of a gas' macroscopic energy from the microscopic probabilities of particle motion p ∝ e − E / k T {\displaystyle p\propto e^{-E/kT}} , where k is the Boltzmann constant and T is temperature. In the RBM network the relation is p = e − E / Z {\displaystyle p=e^{-E}/Z} , where p {\displaystyle p} and E {\displaystyle E} vary over every possible activation pattern and Z = ∑ All Patterns e − E ( pattern ) {\displaystyle \textstyle {Z=\sum _{\scriptscriptstyle {\text{All Patterns}}}e^{-E({\text{pattern}})}}} . To be more precise, p ( a ) = e − E ( a ) / Z {\displaystyle p(a)=e^{-E(a)}/Z} , where a {\displaystyle a} is an activation pattern of all neurons (visible and hidden). Hence, some early neural networks bear the name Boltzmann Machine. Paul Smolensky calls − E {\displaystyle -E\,} the Harmony. A network seeks low energy which is high Harmony. === Networks === This table shows connection diagrams of various unsupervised networks, the details of which will be given in the section Comparison of Networks. Circles are neurons and edges between them are connection weights. As network design changes, features are added on to enable new capabilities or removed to make learning faster. For instance, neurons change between deterministic (Hopfield) and stochastic (Boltzmann) to allow robust output, weights are removed within a layer (RBM) to hasten learning, or connections are allowed to become asymmetric (Helmholtz). Of the networks bearing people's names, only Hopfield worked directly with neural networks. Boltzmann and Helmholtz came before artificial neural networks, but their work in physics and physiology inspired the analytical methods that were used. === History === === Specific Networks === Here, we highlight some characteristics of select networks. The details of each are given in the comparison table below. Hopfield Network Ferromagnetism inspired Hopfield networks. A neuron corresponds to an iron domain with binary magnetic moments Up and Down, and neural connections correspond to the domain's influence on each other. Symmetric connections enable a global energy formulation. During inference the network updates each state using the standard activation step function. Symmetric weights and the right energy functions guarantees convergence to a stable activation pattern. Asymmetric weights are difficult to analyze. Hopfield nets are used as Content Addressable Memories (CAM). Boltzmann Machine These are stochastic Hopfield nets. Their state value is sampled from this pdf as follows: suppose a binary neuron fires with the Bernoulli probability p(1) = 1/3 and rests with p(0) = 2/3. One samples from it by taking a uniformly distributed random number y, and plugging it into the inverted cumulative distribution function, which in this case is the step function thresholded at 2/3. The inverse function = { 0 if x <= 2/3, 1 if x > 2/3 }. Sigmoid Belief Net Introduced by Radford Neal in 1992, this network applies ideas from probabilistic graphical models to neural networks. A key difference is that nodes in graphical models have pre-assigned meanings, whereas Belief Net neurons' features are determined after training. The network is a sparsely connected directed acyclic graph composed of binary stochastic neurons. The learning rule comes from Maximum Likelihood on p(X): Δwij ∝ {\displaystyle \propto } sj (si - pi), where pi = 1 / ( 1 + eweighted inputs into neuron i ). sj's are activations from an unbiased sample of the posterior distribution and this is problematic due to the Explaining Away problem raised by Judea Perl. Variational Bayesian methods uses a surrogate posterior and blatantly disregard this complexity. Deep Belief Network Introduced by Hinton, this network is a hybrid of RBM and Sigmoid Belief Network. The top 2 layers is an RBM and the second layer downwards form a sigmoid belief network. One trains it by the stacked RBM method and then throw away the recognition weights below the top RBM. As of 2009, 3-4 layers seems to be the optimal depth. Helmholtz machine These are early inspirations for the Variational Auto Encoders. Its 2 networks combined into one—forward weights operates recognition and backward weights implements imagination. It is perhaps the first network to do both. Helmholtz did not work in machine learning but he inspired the view of "statistical inference engine whose function is to infer probable causes of sensory input". the stochastic binary neuron outputs a probability that its state is 0 or 1. The data input is normally not considered a layer, but in the Helmholtz machine generation mode, the data layer receives input from the middle layer and has separate weights for this purpose, so it is considered a layer. Hence this network has 3 layers. Variational autoencoder These are inspired by Helmholtz machines and combines probability network with neural networks. An Autoencoder is a 3-layer CAM network, where the middle layer is supposed to be some internal representation of input patterns. The encoder neural network is a probability distribution qφ(z given x) and the decoder network is pθ(x given z). The weights are named phi & theta rather than W and V as in Helmholtz—a cosmetic difference. These 2 networks h

    Read more →
  • Color clock

    Color clock

    The color clock, or color timer, is a part of the video circuitry of computer graphics hardware that works with analog color television systems. The clock is timed to match the timing of the color standard it works with, typically NTSC or PAL, ensuring that the data being read from the computer memory to create the image on-screen is in sync with the display. Depending on the speed of the color clock, the product of the resolution and number of colors is defined. Slow color clocks of many early games consoles and home computers resulted in limited color palettes at the highest resolutions.

    Read more →
  • HFST

    HFST

    Helsinki Finite-State Technology (HFST) is a computer programming library and set of utilities for natural language processing with finite-state automata and finite-state transducers. It is free and open-source software, released under a mix of the GNU General Public License version 3 (GPLv3) and the Apache License. == Features == The library functions as an interchanging interface to multiple backends, such as OpenFST, foma and SFST. The utilities comprise various compilers, such as hfst-twolc (a compiler for morphological two-level rules), hfst-lexc (a compiler for lexicon definitions) and hfst-regexp2fst (a regular expression compiler). Functions from Xerox's proprietary scripting language xfst is duplicated in hfst-xfst, and the pattern matching utility pmatch in hfst-pmatch, which goes beyond the finite-state formalism in having recursive transition networks (RTNs). The library and utilities are written in C++, with an interface to the library in Python and a utility for looking up results from transducers ported to Java and Python. Transducers in HFST may incorporate weights depending on the backend. For performing FST operations, this is currently only possible via the OpenFST backend. HFST provides two native backends, one designed for fast lookup (hfst-optimized-lookup), the other for format interchange. Both of them can be weighted. == Uses == HFST has been used for writing various linguistic tools, such as spell-checkers, hyphenators, and morphologies. Morphological dictionaries written in other formalisms have also been converted to HFST's formats.

    Read more →
  • Isolation forest

    Isolation forest

    Isolation forest is an unsupervised learning algorithm for anomaly detection that works on the principle of isolating anomalies, instead of the most common techniques of profiling normal points. In statistics, an anomaly (a.k.a. outlier) is an observation or event that deviates so much from other events to arouse suspicion it was generated by a different mean. For example, the graph in Fig.1 represents ingress traffic to a web server, expressed as the number of requests in 3-hours intervals, for a period of one month. It is quite evident by simply looking at the picture that some points (marked with a red circle) are unusually high, to the point of inducing suspect that the web server might have been under attack at that time. On the other hand, the flat segment indicated by the red arrow also seems unusual and might possibly be a sign that the server was down during that time period. Anomalies in a big dataset may follow very complicated patterns, which are difficult to detect "by eye" in the great majority of cases. This is the reason why the field of anomaly detection is well suited for the application of machine learning techniques. The most common techniques employed for anomaly detection are based on the construction of a profile of what is "normal": anomalies are reported as those instances in the dataset that do not conform to the normal profile. Isolation Forest uses a different approach: instead of trying to build a model of normal instances, it explicitly isolates anomalous points in the dataset. The main advantage of this approach is the possibility of exploiting sampling techniques to an extent that is not allowed to the profile-based methods, creating a very fast algorithm with a low memory demand. == History == The Isolation Forest (iForest) algorithm was initially proposed by Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou in 2008. The authors took advantage of two quantitative properties of anomalous data points in a sample, that is: they are the minority consisting of fewer instances and they have attribute-values that are very different from those of normal instances Since anomalies are typically few and very different from the other points in the sample, they must be easier to "isolate" compared to normal points. On the basis of this principle, Isolation Forest builds an ensemble of "Isolation Trees" (iTrees) for the data set and marks as anomalies the points that have short average path lengths on the iTrees. In a later paper, published in 2012 the same authors described a set of experiments to prove that iForest: has a low linear time complexity and a small memory requirement is able to deal with high dimensional data with irrelevant attributes can be trained with or without anomalies in the training set can provide detection results with different levels of granularity without re-training In 2013 Zhiguo Ding and Minrui Fei proposed a framework based on iForest to resolve the problem of detecting anomalies in streaming data. More application of iForest to streaming data are described in papers by Swee Chuan Tan et al., G. A. Susto et al. and Yu Weng et al. One of the main problems of the application of iForest to anomaly detection was not with the model itself, but rather in the way the "anomaly score" was computed. This problem was highlighted by Sahand Hariri, Matias Carrasco Kind and Robert J. Brunner in a 2018 paper, wherein they proposed an improved iForest model named Extended Isolation Forest (EIF). In the same paper the authors describe the improvements made to the original model and how they are able to enhance the consistency and reliability of the anomaly score produced for a given data point. == Algorithm == At the basis of the Isolation Forest algorithm there is the tendency of anomalous instances in a dataset to be easier to separate from the rest of the sample (isolate), compared to normal points. In order to isolate a data point the algorithm recursively generates partitions on the sample by randomly selecting an attribute and then randomly selecting a split value for the attribute, between the minimum and maximum values allowed for that attribute. An example of random partitioning in a 2D dataset of normally distributed points is given in Fig. 2 for a non-anomalous point and Fig. 3 for a point that's more likely to be an anomaly. It is apparent from the pictures how anomalies require fewer random partitions to be isolated, compared to normal points. From a mathematical point of view, recursive partitioning can be represented by a tree structure named Isolation Tree, while the number of partitions required to isolate a point can be interpreted as the length of the path, within the tree, to reach a terminating node starting from the root. For example, the path length of point xi in Fig. 2 is greater than the path length of xj in Fig. 3. More formally, let X = { x1, ..., xn } be a set of d-dimensional points and X' ⊂ X a subset of X. An Isolation Tree (iTree) is defined as a data structure with the following properties: for each node T in the Tree, T is either an external-node with no child, or an internal-node with one "test" and exactly two daughter nodes (Tl, Tr) a test at node T consists of an attribute q and a split value p such that the test q < p determines the traversal of a data point to either Tl or Tr. In order to build an iTree, the algorithm recursively divides X' by randomly selecting an attribute q and a split value p, until either (i) the node has only one instance or (ii) all data at the node have the same values. When the iTree is fully grown, each point in X is isolated at one of the external nodes. Intuitively, the anomalous points are those (easier to isolate, hence) with the smaller path length in the tree, where the path length h(xi) of point x i ∈ X {\displaystyle x_{i}\in X} is defined as the number of edges xi traverses from the root node to get to an external node. A probabilistic explanation of iTree is provided in the iForest original paper. == Properties of Isolation Forest == Sub-sampling: since iForest does not need to isolate all of normal instances, it can frequently ignore the big majority of the training sample. As a consequence, iForest works very well when the sampling size is kept small, a property that is in contrast with the great majority of existing methods, where large sampling size is usually desirable. Swamping: when normal instances are too close to anomalies, the number of partitions required to separate anomalies increases, a phenomena known as swamping, which makes it more difficult for iForest to discriminate between anomalies and normal points. One of the main reasons for swamping is the presence of too many data for the purpose of anomaly detection, which implies one possible solution to the problem is sub-sampling. Since iForest respond very well to sub-sampling in terms of performance, the reduction of the number of points in the sample is also a good way to reduce the effect of swamping. Masking: when the number of anomalies is high it is possible that some of those aggregate in a dense and large cluster, making it more difficult to separate the single anomalies and, in turn, to detect such points as anomalous. Similarly to swamping, this phenomena (known as "masking") is also more likely when the number of points in the sample is big, and can be alleviated through sub-sampling. High Dimensional Data: one of the main limitation to standard, distance-based methods is their inefficiency in dealing with high dimensional datasets:. The main reason for that is, in a high dimensional space every point is equally sparse, so using a distance-based measure of separation is pretty ineffective. Unfortunately, high-dimensional data also affects the detection performance of iForest, but the performance can be vastly improved by adding a features selection test like Kurtosis to reduce the dimensionality of the sample space. Normal Instances Only: iForest performs well even if the training set does not contain any anomalous point, the reason being that iForest describes data distributions in such a way that high values of the path length h(xi) correspond to the presence of data points. As a consequence, the presence of anomalies is pretty irrelevant to iForest's detection performance. == Anomaly Detection with Isolation Forest == Anomaly detection with Isolation Forest is a process composed of two main stages: in the first stage, a training dataset is used to build iTrees as described in previous sections. in the second stage, each instance in test set is passed through the iTrees build in the previous stage, and a proper "anomaly score" is assigned to the instance using the algorithm described below Once all the instances in the test set have been assigned an anomaly score, it is possible to mark as "anomaly" any point whose score is greater than a predefined threshold, which depends on the domain the analysis is being applied to. === Anomaly Score === Th

    Read more →