AI Headshot Examples

AI Headshot Examples — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Flo (app)

    Flo (app)

    Flo is a period-tracking app that provides menstrual cycle, ovulation and pregnancy tracking as well as perimenopause symptom tracking that was developed by Flo Health, Inc. It has over 380 million downloads worldwide and over 70 million monthly active users as of November 2024. In mid-2024, it reached unicorn status, and became Europe’s first femtech unicorn. The company has been accused of sharing users' sensitive health data with third parties without consent and misleading its users about data practices. == History == Flo Health, Inc. was co-founded in 2015 by Dmitry and Yuri Gurski, in Belarus. Their backgrounds helped build the first version of the software having experience in other fitness and health apps. Dmitry serves as the company's CEO. The company's development hubs are in London, Amsterdam and Vilnius. In 2016, the company raised $1 million in seed round funding from Flint Capital and Haxus Venture Fund. In 2017, Flo received an investment of $5 million from Flint Capital and model Natalia Vodianova with Vodianova helping develop an awareness campaign for the company. In 2018, Flo received an investment of $6 million from Mangrove Capital Partners, with participation from Flint Capital and Haxus, giving the company a valuation of $200 million. In mid-2019, Flo received an additional investment of $7.5 million led by Founders Fund. In 2020, the Federal Trade Commission alleged that Flo had misled users about its handling of health information to third parties including Google, Facebook, AppsFlyer, and Flurry since 2016. These allegations followed a 2019 report by The Wall Street Journal in reference to Facebook. The company reached a settlement in 2021 and was required to notify users of how their personal information was shared and obtain permission before any further information was shared. The agreement also required that Flo to undertake an independent privacy audit which it completed in March 2022. In early September 2021, Flo announced it closed $50M in a Series B financing, bringing the total capital raised to $65 million and company valuation to $800M led by VNV Global and Target Global. In March 2024, the Supreme Court of British Columbia certified a class action suit against Flo for sharing intimate data with Facebook and other third parties without user knowledge. In July 2024, Flo announced it raised more than $200M in Series C financing from General Atlantic bringing its valuation beyond $1 billion. As of November 2024, the app had over 380 million downloads world wide, and over 70 million monthly active users. In 2025, Flo adopted a data intelligence platform from Databricks to power its analytics and AI features, allowing users personalized cycle predictions. In 2025, a class action lawsuit in California was settled for $56 million with Flo paying $8 million and Google paying $48 million. == Features and privacy == Flo was initially created as a period and ovulation tracking application. It now provides reminders of upcoming menstrual cycles and a place to record various other health symptoms such as contraceptive methods, vaginal discharge (leukorrhea), water intake, pains, mood swings, and sexual activity. The application is available on iOS and Android. Flo is free to download and the free basic version gives you access to period and ovulation tracking and predictions, symptom tracking, cycle history, and anonymous mode. In Pregnancy mode, the app provides tracking features and educational material for pregnancy. In October 2023, Flo launched Flo for Partners, a feature that allows users to share their Flo data with their partner. In September 2022, as a response to Roe v. Wade being overturned, Flo sped up the release of a feature called "Anonymous Mode". Flo said this mode allows users to access the app without any personal identifiers such as name, email address, or technical identifiers being associated with their health data. Flo said it uses a technology called Oblivious HTTP to help protect user privacy in Anonymous Mode. == Recognition == Flo was named to Bloomberg’s Top 25 UK Startups to Watch for 2024. Flo's Anonymous Mode feature was recognized on both Fast Company's World Changing Ideas 2023 and TIME's Best Inventions List 2023. Flo is a CES 2019 Innovation Awards Honoree in the Software and Mobile Applications category.

    Read more →
  • Is an AI Resume Builder Worth It in 2026?

    Is an AI Resume Builder Worth It in 2026?

    Looking for the best AI resume builder? An AI resume builder is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI resume builder slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.

    Read more →
  • Nabil Ali

    Nabil Ali

    Nabil Ali Mohammed Abd AL Azeez (Arabic:نبيل علي) (3 January 1938 – 27 January 2016) was an Egyptian scientist, writer, and intellectual who worked in the field of natural language processing and computational linguistics. Ali is considered a pioneer of Arabic language computing, making significant innovations in early computational linguistics. == Education and career == Ali earned a bachelor's degree in Aeronautical Engineering in 1960, and a master's degree in 1967. In 1971, he earned a PhD in Aeronautics. From 1961 to 1972 Ali worked as an engineering officer in the Egyptian Air Force, specializing in maintenance and training. In 1972, he shifted focus to computing, and from 1972 to 1977 he worked as a computer manager at Egyptair. While in this position, Ali introduced the first automated reservation system for airlines in the Arab world. He later held various computing positions in Egypt, Kuwait, Europe, Canada and the US. Ali started working for Sakhr Software, an Arabic language technology company, in 1983. From 1985 to 1999, he was vice president of Sakhr's council for Research and Development. As a director of the Multilingual Advanced Systems Foundation and project manager at the Egyptian National Company for Scientific and Technical Information, Ali did extensive research on information culture and artificial intelligence relating to the Arabic language. Over the course of his career, Ali developed more than 20 educational programs relating to computational linguistics. He developed the first Arabic lexical database and the first knowledge base for Arabic poetry, as well as many other pieces of Arabic language software. == Awards == 1994: General Book Authority Award for Best Book (in the field of future studies). 2003: General Book Authority Award for Best Culture Book (in the field of "Challenges of the Information Age"). 2007: General Book Authority "Innovation in Information Technology" Award. 2012: King Faisal International Award, with Professor Ali Helmy Mousa, in the field of computer processing of the Arabic Language. == Works == Arabic Language and Computer (Research study), Dar Localization, 1988. Al Arab and the Information Age, Knowledge World Series No. 184, April 1994. Arab Culture and the Information Age: A Vision for the Future of Arab Culture Discourse, World of Knowledge Series, No. 265 January 2001. The Digital Gap: an Arab Vision for a Knowledge Society (in partnership with Dr. Nadia Hegazy), World of Knowledge Series, No. 318 August 2005. The Arab Mind and the Knowledge Society: Manifestations of the Crisis and Suggestions for Solutions, Part 1, The World of Knowledge Series, No. 369, November 2009. The Arab Mind and the Knowledge Society: Manifestations of the Crisis and Suggestions for Solutions, Part 2, The World of Knowledge Series, No. 370, December 2009. == Tribute == On 3 January 2020, Google Doodle celebrated Nabil Ali Mohamed's 82nd Birthday.

    Read more →
  • Jaime Carbonell

    Jaime Carbonell

    Jaime Guillermo Carbonell (July 29, 1953 – February 28, 2020) was a computer scientist who made seminal contributions to the development of natural language processing tools and technologies. His research in machine translation resulted in the development of several state-of-the-art language translation and artificial intelligence systems. He earned his B.S. degrees in Physics and in Mathematics from MIT in 1975 and did his Ph.D. under Dr. Roger Schank at Yale University in 1979. He joined Carnegie Mellon University as an assistant professor of computer science in 1979 and moved to Pittsburgh. He was affiliated with the Language Technologies Institute, Computer Science Department, Machine Learning Department, and Computational Biology Department at Carnegie Mellon. His interests spanned several areas of artificial intelligence, language technologies and machine learning. In particular, his research focused on areas such as text mining (extraction, categorization, novelty detection) and in new theoretical frameworks such as a unified utility-based theory bridging information retrieval, summarization, free-text question-answering and related tasks. He also worked on machine translation, both high-accuracy knowledge-based MT and machine learning for corpus-based MT (such as generalized example-based MT). == Career == Carbonell was the Allen Newell Professor of Computer Science and head of the Language Technologies Institute at Carnegie Mellon University. He joined Carnegie Mellon in 1979, and became a key faculty member in the artificial intelligence area. He was appointed full professor in 1987, Newell Chair in 1995, and University Professor in 2012. He completed his undergraduate studies at MIT. He received dual degrees in Mathematics and Physics. He received his Ph.D. in computer science from Yale University in 1979. At the time of his appointment, Carbonell was the youngest chaired professor in the School of Computer Science at CMU. His research spanned several areas of computer science, mostly in artificial intelligence, including: machine learning, data and text mining, natural language processing, very-large-scale knowledge bases, translingual information retrieval and automated summarization. He wrote more than 300 technical papers and gave over 500 invited or refereed-paper presentations (colloquia, seminars, panels, conferences, keynotes, etc.). He died following a long illness on February 28, 2020. Mona Talat Diab became the director of CMU's Language Technologies Institute in 2023. == Research == Carbonell created MMR (maximal marginal relevance) technology for text summarization and informational novelty detection in search engines, invention of transformational analogy, a generalized method for case-based reasoning (CBR) to re-use, modify and compose past successful plans for increasingly complex problems and knowledge-based interlingual machine translation. He was instrumental in setting up the Computational Biolinguistics Program, a joint venture between Carnegie Mellon and the University of Pittsburgh, which combines Language Technologies and Machine Learning to model and predict genomic, proteomic and glycomic 3D structures. Carbonell also did work in machine learning. He organized the first four machine learning conferences, starting with CMU in 1981. The Language Technologies Institute (LTI), founded and directed by Carbonell, achieved top honors in multiple areas. These areas include machine translation, search engines (including founding of Lycos by Michael Mauldin, one of Carbonell’s PhD students), speech synthesis, and education. LTI remains the original, largest and best-known institute for language technologies, with over $12M in annual funding and 200 researchers (faculty, staff, PhD students, MS students, visiting scholars etc.). Carbonell made major technical contributions in several fields, including (1) Creation of MMR (maximal marginal relevance) technology for text summarization and informational novelty detection in search engines,(2) Proactive machine learning for multi-source cost-sensitive active learning, (3) Linked conditional random fields for predicting tertiary and quaternary protein folds, (4) Symmetric optimal phrasal alignment method for trainable example-based and statistical machine translation, (5) Series- anomaly modeling for financial fraud detection and syndromic surveillance, (6) Knowledge-based interlingual machine translation, (7) Robust case-frame parsing, (8) Seeded version-space learning and (9) Invention of transformational and derivational analogy, generalized methods for case-based reasoning (CBR) to re-use, modify and compose past successful plans for increasingly complex problems. The teams led by Carbonell achieved top honors in many areas such as first scalable high-accuracy interlingual machine translation (1991), first speech-to-speech machine translation (1992), first large-scale spider and search engine (1994), and first trainable, large-scale protein-structure topology predictor (2005). Modern machine learning, co-founded by Carbonell, Michalski and Mitchell, is a fundamental enabling technology in search engines, data mining and social networking. Starting in 1980, he co-edited the first three books on ML, launched the ML conferences and was a co-founder and editor-in-chief of ML Journal. Carbonell’s innovations have led to several successful start-ups: Carnegie Group (AI expertsystems), Lycos (web search), Wisdom (financial optimization & ML), Carnegie Speech (spoken-language tutoring), Dynamix (data mining and pattern discovery), and Meaningful Machines (context-based machine translation). Carbonell was the founding director of The Language Technology Institute, the preeminent global institution in language studies, unparalleled in size and scope and has since been adopted/imitated in Germany (DFKI), Japan (Tokyo Univ.), and the US (Johns Hopkins). == Awards and honors == Okawa Prize, 2015 Best paper award, “Translingual Search” w/Yang, International Joint Conference on AI, 1997 Allen Newell endowed chair, Carnegie Mellon University, 1995 Elected fellow of AAAI, 1991 Computer Science teaching award, Carnegie Mellon University, 1987 Sperry Fellowship for excellence in AI research, 1986 Herbert Simon teaching award, 1986 "Recognition of Service" award from the ACM for the SIGART presidency, 1983–1985 Provided congressional testimony on machine translation, 1990 == Selected works == === Books === 1983. (with Ryszard S. Michalski & Tom M. Mitchell, Eds.) Machine learning: An artificial intelligence approach. Los Altos, CA: Morgan Kaufmann. 1986. (with Ryszard S. Michalski & Tom Mitchell, Eds.) Machine learning: An artificial intelligence approach. Vol. II. Los Altos, CA: Morgan-Kaufmann. 1986. (with Ryszard S. Michalski & Tom Mitchell, Eds.) Machine Learning: A Guide to Current Research. Kluwer Academic Publishers. == Contributions == “Protein Quaternary Fold Recognition Using Conditional Graphical Models” IJCAI 2007 (w/Liu et al.) “Context-Based Machine Translation” AMTA 2006 (w/Klein et al.) “SCRFs: A New Approach for Protein Fold Recognition,’’ Journal of Computational Biology, 13,2, 2006 (w/Liu et al) “MT for Resource-Poor Languages Using Elicitation-Based Learning” Machine Translation, 2004 ‘‘Learning Approaches for Detecting and Tracking News Events,’’ IEEE Trans I.S., 14, 4, 2000 (w/Yang)

    Read more →
  • FloodAlerts

    FloodAlerts

    FloodAlerts is a software application, developed by software specialists Shoothill, which takes real-time flooding information, and displays the data on an interactive Bing map, updating and warning its users when they, their premises or the routes they need to travel could be at risk of flooding. == History == FloodAlerts was launched in 2012, originally as the world's first Facebook flood warning app. == Operation == FloodAlerts is made available free of charge to individuals. Users are able to set up their own monitored locations and receive alerts via the application or their Facebook wall if the locations they are monitoring are at imminent risk of flooding. Hosted in the Cloud, using the Microsoft Windows Azure platform, the FloodAlerts application processes the data received from the Environment Agency, automatically creates the required map tiles, pins and alerts and displays them on an interactive Bing map, updating the content every 15 minutes. Users are able to see the latest information on the map without having to refresh their browser. FloodAlerts can also be provided as a customised risk management solution to businesses that require infrastructure or asset safety monitoring in areas where water levels are rising or receding. == Awards and recognition == FloodAlerts has received The Guardian and Virgin Media Business's 2012 Innovation Nation Awards and was shortlisted as a finalist for a further two national awards: the UK IT Industry Awards for Innovation and Entrepreneurship and The Institution of Engineering and Technology Innovation Awards for Information Technology. == In the press == The FloodAlerts application was reviewed on the BBC website. It was also reviewed on BBC Click.

    Read more →
  • Is an AI Paragraph Rewriter Worth It in 2026?

    Is an AI Paragraph Rewriter Worth It in 2026?

    In search of the best AI paragraph rewriter? An AI paragraph rewriter is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI paragraph rewriter slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • AI Code Generators: Free vs Paid (2026)

    AI Code Generators: Free vs Paid (2026)

    Looking for the best AI code generator? An AI code generator is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI code generator slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • HFST

    HFST

    Helsinki Finite-State Technology (HFST) is a computer programming library and set of utilities for natural language processing with finite-state automata and finite-state transducers. It is free and open-source software, released under a mix of the GNU General Public License version 3 (GPLv3) and the Apache License. == Features == The library functions as an interchanging interface to multiple backends, such as OpenFST, foma and SFST. The utilities comprise various compilers, such as hfst-twolc (a compiler for morphological two-level rules), hfst-lexc (a compiler for lexicon definitions) and hfst-regexp2fst (a regular expression compiler). Functions from Xerox's proprietary scripting language xfst is duplicated in hfst-xfst, and the pattern matching utility pmatch in hfst-pmatch, which goes beyond the finite-state formalism in having recursive transition networks (RTNs). The library and utilities are written in C++, with an interface to the library in Python and a utility for looking up results from transducers ported to Java and Python. Transducers in HFST may incorporate weights depending on the backend. For performing FST operations, this is currently only possible via the OpenFST backend. HFST provides two native backends, one designed for fast lookup (hfst-optimized-lookup), the other for format interchange. Both of them can be weighted. == Uses == HFST has been used for writing various linguistic tools, such as spell-checkers, hyphenators, and morphologies. Morphological dictionaries written in other formalisms have also been converted to HFST's formats.

    Read more →
  • Pydio

    Pydio

    Pydio Cells, previously known as just Pydio and formerly known as AjaXplorer, is an open-source file-sharing and synchronisation software that runs on the user's own server or in the cloud. == Presentation == The project was created by musician Charles Du Jeu (current CEO and CTO) in 2007 under the name AjaXplorer. The name was changed in 2013 and became Pydio (an acronym for Put Your Data in Orbit). In May 2018, Pydio switched from PHP to Go with the release of Pydio Cells. The PHP version reached end-of-life state on 31 December 2019. Pydio Cells runs on any server supporting a recent Go version. Windows/Linux/macOS on the Intel architecture are directly supported; a fully functional working ARM implementation is under active development. Pydio Cells has been developed from scratch using the Go programming language; release 4.0.0 introduced code refactoring to fully support the Go modular structure as well as grid computing. Nevertheless, the web-based interface of Cells is very similar to the one from Pydio 8 (in PHP), and it successfully replicates most of its features, while adding a few more. There is also a new synchronisation client (also written in Go). The PHP version has been phased out as the company's focus is moving to Pydio Cells, with community feedback on the new features. According to the company, the switch to the new environment was made "to overcome inherent PHP limitations and provide you with a future-proof and modern solution for collaborating on documents". From a technical point of view, Pydio differs from solutions such as Google Drive or Dropbox. Pydio is not based on a public cloud; instead, the software connects to the user's existing storage (such as SAN / Local FS, SAMBA / CIFS, (s)FTP, NFS, S3-compatible cloud storage, Azure Blob Storage, Google Cloud Storage) as well as to the existing user directories (LDAP / AD, OAuth2 / OIDC SSO, SAML / Azure ADFS SSO, RADIUS, Shibboleth...), which allows companies to keep their data inside their infrastructure, according to their data security policy and user rights management. The software is built in a modular perspective; up to Pydio 8, various plugins allowed administrators to implement extra features. On the server side, Pydio Cells is deployed as a collection of independent microservices communicating among themselves using gRPC and logging user actions via Activity Streams 2.0 (AS2). Pydio Cells microservices are built with the Go Micro framework (using an embedded NATS server). A standard installation will deploy all required services on the same physical server, but for the purposes of performance, reliability and high availability, these can now be spread across several different servers (even in geographically separate locations) according to the 12-factors architecture pattern. Pydio Cells is available either through a free and open-source community distribution (Pydio Cells Home), or a commercially-licensed enterprise distribution (in two variants, Pydio Cells Connect and Pydio Cells Enterprise), which add features not available in the community distribution as well as additional levels of support beyond the community forums. == Features == File sharing between different internal users and across other Pydio instances SSL/TLS Encryption WebDAV file server Creation of dedicated workspaces, for each line of business / project / client, with a dedicated user rights management for each workspace. File-sharing with external users (private links, public links, password protection, download limitation, etc.) Online viewing and editing of documents with Collabora Office (Pydio Cells Enterprise also offers OnlyOffice integration) Preview and editing of image files Integrated audio and video reader Activity stream ('timeline') for all actions taken by users Integrated chat platform Client applications are available for all major desktop and mobile platforms.

    Read more →
  • Eugene Charniak

    Eugene Charniak

    Eugene Charniak (June 2, 1946 – June 13, 2023) was a professor of computer Science and cognitive Science at Brown University. He held an A.B. in Physics from the University of Chicago and a Ph.D. from M.I.T. in Computer Science. His research was in the area of language understanding or technologies which relate to it, such as knowledge representation, reasoning under uncertainty, and learning. Since the early 1990s he was interested in statistical techniques for language understanding. His research in this area included work in the subareas of part-of-speech tagging, probabilistic context-free grammar induction, and, more recently, syntactic disambiguation through word statistics, efficient syntactic parsing, and lexical resource acquisition through statistical means. He was a Fellow of the American Association of Artificial Intelligence and was previously a Councilor of the organization. He was also honored with the 2011 Association for Computational Linguistics Lifetime Achievement Award and awarded the 2011 Calvin & Rose G Hoffman Prize. In 2011, he was named a fellow of the Association for Computational Linguistics. In 2015, he won the Association for the Advancement of Artificial Intelligence (AAAI) Classic Paper Award for a paper (“Statistical Parsing with a Context-Free Grammar and Word Statistics”) that he presented at the Fourteenth National Conference on Artificial Intelligence in 1997. == Books == He published six books: Computational Semantics, (with Yorick Wilks), Amsterdam: North-Holland (1976) Artificial Intelligence Programming (now in a second edition) (with Chris Riesbeck, Drew McDermott, and James Meehan), Hillsdale NJ: Lawrence Erlbaum Associates (1980, 1987) Introduction to Artificial Intelligence (with Drew McDermott), Reading MA: Addison-Wesley (1985) Statistical Language Learning, Cambridge: MIT Press (1993) Introduction to Deep Learning, Cambridge: MIT Press (2019) AI & I: An Intellectual History of Artificial Intelligence, Cambridge: MIT Press (2024)

    Read more →
  • Bidyut Baran Chaudhuri

    Bidyut Baran Chaudhuri

    Bidyut Baran Chaudhuri (B. B. Chauduri) is a senior computer scientist and an emeritus professor of Techno India University in West Bengal, India. He is also adjuncted to Indian Statistical Institute, where he was a professor for about three decades. He was the founding Head of Computer Vision and Pattern Recognition Unit (which was established in 1994) of ISI. Moreover, he was a J.C. Bose Fellow and Indian National Academy of Engineering Distinguished Professor at ISI. He was the vice-president of the Society for Natural Language Technology Research (SNLTR). His primary research contributes to the fields of computer vision, image processing and pattern recognition. He is a pioneer of "Indian language script OCR". == Education == Chaudhuri received his BSc (Hons.), BTech and MTech degrees from University of Calcutta, India in 1969, 1972 and 1974, respectively and PhD Degree from Indian Institute of Technology Kanpur in 1980. He did his post-doc work during 1981-1982 from Queen's University, U.K, through Leverhulme Overseas Fellowship. He also worked as a visiting faculty at Tech University, Hannover during 1986-87 as well as at GSF Institute of Radiation Protection (now Leibnitz Institute), Munich in 1990 and 1992. == Awards and recognition == Chaudhuri has been elected as a Life Fellow of IEEE "for contributions to pattern recognition, especially Indian language script OCR, document processing and natural language processing". He has become a Fellow of International Association for Pattern Recognition (IAPR) "for contributions to character recognition and speech synthesis in Indian language". He is also Fellow of The World Academy of Sciences (TWAS), Indian National Science Academy (INSA), Indian National Academy of Engineering (INAE), National Academy of Sciences (NASI), and Institute of Electronics and Telecommunication Engineering (IETE). In 2011, Chaudhuri received the Om Prakash Bhasin Award for his contribution in the field of electronics and information technology. Chaudhuri's interview on some of his works has been reported in Indian newspaper as well. He is within world's top 2% scientists and top-10 Indian AI scientists according to a study conducted by Stanford University. He has also been featured as top-10 machine learning researcher from India.

    Read more →
  • Frederick Jelinek

    Frederick Jelinek

    Frederick Jelinek (18 November 1932 – 14 September 2010) was a Czech-American researcher in information theory, automatic speech recognition, and natural language processing. He is well known for his oft-quoted statement, "Every time I fire a linguist, the performance of the speech recognizer goes up". Jelinek was born in Czechoslovakia before World War II and emigrated with his family to the United States in the early years of the communist regime. He studied engineering at the Massachusetts Institute of Technology and taught for 10 years at Cornell University before accepting a job at IBM Research. In 1961, he married Czech screenwriter Milena Jelinek. At IBM, his team advanced approaches to computer speech recognition and machine translation. After IBM, he went to head the Center for Language and Speech Processing at Johns Hopkins University for 17 years, where he was still working on the day he died. == Personal life == Jelinek was born on November 18, 1932, as Bedřich Jelínek in Kladno to Vilém and Trude Jelínek. His father was Jewish; his mother was born in Switzerland to Czech Catholic parents and had converted to Judaism. Jelínek senior, a dentist, had planned early to escape Nazi occupation and flee to England; he arranged for a passport, visa, and the shipping of his dentistry materials. The couple planned to send their son to an English private school. However, Vilém decided to stay at the last minute and was eventually sent to the Theresienstadt concentration camp, where he died in 1945. The family was forced to move to Prague in 1941, but Frederick, his sister and mother—thanks to the latter's background—escaped the concentration camps. After the war, Jelinek entered in the gymnasium, despite having missed several years of schooling because education of Jewish children had been forbidden since 1942. His mother, anxious that her son should get a good education, made great efforts for their emigration, especially when it became clear he would not be allowed to even attempt the graduation examination. His mother hoped her son would become a physician, but Jelinek dreamed of being a lawyer. He studied engineering in evening classes at the City College of New York and received stipends from the National Committee for a Free Europe that allowed him to study at the Massachusetts Institute of Technology. About his choice of specialty, he said: "Fortunately, to electrical engineering there belonged a discipline whose aim was not the construction of physical systems: the theory of information". He obtained his Ph.D. in 1962, with Robert Fano as his adviser. In 1957, Jelinek paid an unexpected visit to Prague. He had been in Vienna and applied for a visa, hoping to see his former acquaintances again. He met with his old friend Miloš Forman, who introduced him to film student Milena Tobolová—whose screenplay had been the basis for the movie Easy Life (Snadný život). His flight back to the U.S. had a stopover in Munich, during which he called her to propose. Tobolová was considered a dissident and the authorities were not happy with her film. Jelinek asked for help from Jerome Wiesner and Cyrus Eaton, the latter who lobbied Nikita Khrushchev. Following the inauguration of John F. Kennedy, a group of Czech dissidents were allowed to emigrate in January 1961. Thanks to the lobbying, the future Milena Jelinek was one of them. After completing his graduate studies, Jelinek, who had developed an interest in linguistics, had plans to work with Charles F. Hockett at Cornell University. However these fell through and during the next ten years he continued to study information theory. Having previously worked at IBM during a sabbatical, he began full-time work there in 1972—at first on leave for Cornell, but permanently from 1974. He remained there for over twenty years. Although at first he had been offered a regular research job, upon his arrival he learned that Josef Raviv had recently been promoted to head of the newly opened IBM Haifa Research Laboratory, and became head of the Continuous Speech Recognition group at the Thomas J. Watson Research Center. Despite his team's successes in this area, Jelinek's work remained little known in his home country because Czech scientists were not allowed to participate in key conferences. After the 1989 fall of communism, Jelinek helped establish scientific relationships, regularly visiting to lecture and helping to persuade IBM to establish a computing centre at Charles University. In 1993, he retired from IBM and went to Johns Hopkins University's Center for Language and Speech Processing, where he was director and Julian Sinclair Smith Professor of Electrical and Computer Engineering. He was still working there at the time of his death; Jelinek died of a heart attack at the close of an otherwise normal workday in mid-September 2010. He was survived by his wife, daughter and son, sister, stepsister, and three grandchildren, including Sophie Gold Jelinek. == Research and legacy == Information theory was a fashionable scientific approach in the mid '50s. However, pioneer Claude Shannon wrote in 1956 that this trendiness was dangerous. He said, "Our fellow scientists in many different fields, attracted by the fanfare and by the new avenues opened to scientific analysis, are using these ideas in their own problems ... It will be all too easy for our somewhat artificial prosperity to collapse overnight when it is realized that the use of a few exciting words like information, entropy, redundancy, do not solve all our problems." During the next decade, a combination of factors shut down the application of information theory to natural language processing (NLP) problems—in particular machine translation. One factor was the 1957 publication of Noam Chomsky's Syntactic Structures, which stated, "probabilistic models give no insight into the basic problems of syntactic structure". This accorded well with the philosophy of the artificial intelligence research of the time, which promoted rule-based approaches. The other factor was the 1966 ALPAC report, which recommended that the government should stop funding research into machine translation. ALPAC chairman John Pierce later said that the field was filled with "mad inventors or untrustworthy engineers". He said that the underlying linguistic problems must be solved before attempts at NLP could be reasonably made. These elements essentially halted research in the field. Jelinek had begun to develop an interest in linguistics after the immigration of his wife, who initially enrolled in the MIT linguistics program with the help of Roman Jakobson. Jelinek often accompanied her to Chomsky's lectures, and even discussed the possibility of changing orientation with his adviser. Fano was "really upset", and after the failure of his project with Hockett at Cornell, he did not return to this field of research until starting work at IBM. The scope of research at IBM was considerably different from that of most other teams. According to Mark Liberman, "While [Jelinek] was leading IBM's effort to solve the general dictation problem during the decade or so following 1972, most other U.S. companies and academic researchers were working on very limited problems ... or were staying out of the field entirely". Jelinek regarded speech recognition as an information theory problem—a noisy channel, in this case the acoustic signal—which some observers considered a daring approach. The concept of perplexity was introduced in their first model, New Raleigh Grammar, which was published in 1976 as the paper "Continuous Speech Recognition by Statistical Methods" in the journal Proceedings of the IEEE. According to Young, the basic noisy channel approach "reduced the speech recognition problem to one of producing two statistical models". Whereas New Raleigh Grammar was a hidden Markov model, their next model, called Tangora, was broader and involved n-grams, specifically trigrams. Even though "it was obvious to everyone that this model was hopelessly impoverished", it was not improved upon until Jelinek presented another paper in 1999. The same trigram approach was applied to phones in single words. Although the identification of parts of speech turned out not to be very useful for speech recognition, tagging methods developed during these projects are now used in various NLP applications. The incremental research techniques developed at IBM eventually became dominant in the field after DARPA, in the mid-80s, returned to NLP research and imposed that methodology to participating teams, shared common goals, data, and precise evaluation metrics. The Continuous Speech Recognition Group's research, which required large amounts of data to train the algorithms, eventually led to the creation of the Linguistic Data Consortium. In the 1980s, although the broader problem of speech recognition remained unsolved, they sought to apply the methods developed to other problems; machine translat

    Read more →
  • Vision transformer

    Vision transformer

    A vision transformer (ViT) is a transformer designed for computer vision. A ViT decomposes an input image into a series of patches (rather than text into tokens), serializes each patch into a vector, and maps it to a smaller dimension with a single matrix multiplication. These vector embeddings are then processed by a transformer encoder as if they were token embeddings. ViTs were designed as alternatives to convolutional neural networks (CNNs) in computer vision applications. They have different inductive biases, training stability, and data efficiency. Compared to CNNs, ViTs are less data efficient, but have higher capacity. Some of the largest modern computer vision models are ViTs, such as one with 22B parameters. Subsequent to its publication, many variants were proposed, with hybrid architectures with both features of ViTs and CNNs. ViTs have found application in image recognition, image segmentation, weather prediction, and autonomous driving. == History == Transformers were introduced in Attention Is All You Need (2017), and have found widespread use in natural language processing. A 2019 paper applied ideas from the Transformer to computer vision. Specifically, they started with a ResNet, a standard convolutional neural network used for computer vision, and replaced all convolutional kernels by the self-attention mechanism found in a Transformer. It resulted in superior performance. However, it is not a Vision Transformer. In 2020, an encoder-only Transformer was adapted for computer vision, yielding the ViT, which reached state of the art in image classification, overcoming the previous dominance of CNN. The masked autoencoder (2022) extended ViT to work with unsupervised training. The vision transformer and the masked autoencoder, in turn, stimulated new developments in convolutional neural networks. Subsequently, there was cross-fertilization between the previous CNN approach and the ViT approach. In 2021, some important variants of the Vision Transformers were proposed. These variants are mainly intended to be more efficient, more accurate or better suited to a specific domain. Two studies improved efficiency and robustness of ViT by adding a CNN as a preprocessor. The Swin Transformer achieved state-of-the-art results on some object detection datasets such as COCO, by using convolution-like sliding windows of attention mechanism, and the pyramid process in classical computer vision. == Overview == The basic architecture, used by the original 2020 paper, is as follows. In summary, it is a BERT-like encoder-only Transformer. The input image is of type R H × W × C {\displaystyle \mathbb {R} ^{H\times W\times C}} , where H , W , C {\displaystyle H,W,C} are height, width, channel (RGB). It is then split into square-shaped patches of type R P × P × C {\displaystyle \mathbb {R} ^{P\times P\times C}} . For each patch, the patch is pushed through a linear operator, to obtain a vector ("patch embedding"). The position of the patch is also transformed into a vector by "position encoding" (the paper tried no embedding, 1D embedding, 2D embedding, and relative embedding: 1D was adopted). The two vectors are added, then pushed through several Transformer encoders. The attention mechanism in a ViT repeatedly transforms representation vectors of image patches, incorporating more and more semantic relations between image patches in an image. This is analogous to how in natural language processing, as representation vectors flow through a transformer, they incorporate more and more semantic relations between words, from syntax to semantics. The above architecture turns an image into a sequence of vector representations. To use these for downstream applications, an additional head needs to be trained to interpret them. For example, to use it for classification, one can add a shallow MLP on top of it that outputs a probability distribution over classes. The original paper uses a linear-GeLU-linear-softmax network. == Variants == === Original ViT === The original ViT was an encoder-only Transformer supervise-trained to predict the image label from the patches of the image. As in the case of BERT, it uses a special token in the input side, and the corresponding output vector is used as the only input of the final output MLP head. The special token is an architectural hack to allow the model to compress all information relevant for predicting the image label into one vector. Transformers found their initial applications in natural language processing tasks, as demonstrated by language models such as BERT and GPT-3. By contrast the typical image processing system uses a convolutional neural network (CNN). Well-known projects include Xception, ResNet, EfficientNet, DenseNet, and Inception. Transformers measure the relationships between pairs of input tokens (words in the case of text strings), termed attention. The cost is quadratic in the number of tokens. For images, the basic unit of analysis is the pixel. However, computing relationships for every pixel pair in a typical image is prohibitive in terms of memory and computation. Instead, ViT computes relationships among pixels in various small sections of the image (e.g., 16x16 pixels), at a drastically reduced cost. The sections (with positional embeddings) are placed in a sequence. The embeddings are learnable vectors. Each section is arranged into a linear sequence and multiplied by the embedding matrix. The result, with the position embedding is fed to the transformer. === Architectural improvements === ==== Pooling ==== After the ViT processes an image, it produces some embedding vectors. These must be converted to a single class probability prediction by some kind of network. In the original ViT and Masked Autoencoder, they used a dummy [CLS] token, in emulation of the BERT language model. The output at [CLS] is the classification token, which is then processed by a LayerNorm-feedforward-softmax module into a probability distribution. Global average pooling (GAP) does not use the dummy token, but simply takes the average of all output tokens as the classification token. It was mentioned in the original ViT as being equally good. Multihead attention pooling (MAP) applies a multiheaded attention block to pooling. Specifically, it takes as input a list of vectors x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} , which might be thought of as the output vectors of a layer of a ViT. The output from MAP is M u l t i h e a d e d A t t e n t i o n ( Q , V , V ) {\displaystyle \mathrm {MultiheadedAttention} (Q,V,V)} , where q {\displaystyle q} is a trainable query vector, and V {\displaystyle V} is the matrix with rows being x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} . This was first proposed in the Set Transformer architecture. Later papers demonstrated that GAP and MAP both perform better than BERT-like pooling. A variant of MAP was proposed as class attention, which applies MAP, then feedforward, then MAP again. Re-attention was proposed to allow training deep ViT. It changes the multiheaded attention module. === Masked Autoencoder === The Masked Autoencoder took inspiration from denoising autoencoders and context encoders. It has two ViTs put end-to-end. The first one ("encoder") takes in image patches with positional encoding, and outputs vectors representing each patch. The second one (called "decoder", even though it is still an encoder-only Transformer) takes in vectors with positional encoding and outputs image patches again. ==== Training ==== During training, input images (224px x 224 px in the original implementation) are split along a designated number of lines on each axis, producing image patches. A certain percentage of patches are selected to be masked out by mask tokens, while all others are retained in the image. The network is tasked with reconstructing the image from the remaining unmasked patches. Mask tokens in the original implementation are learnable vector quantities. A linear projection with positional embeddings is then applied to the vector of unmasked patches. Experiments varying mask ratio on networks trained on the ImageNet-1K dataset found 75% mask ratios achieved high performance on both finetuning and linear-probing of the encoder's latent space. The MAE processes only unmasked patches during training, increasing the efficiency of data processing in the encoder and lowering the memory usage of the transformer. A less computationally-intensive ViT is used for the decoder in the original implementation of the MAE. Masked patches are added back to the output of the encoder block as mask tokens and both are fed into the decoder. A reconstruction loss is computed for the masked patches to assess network performance. ==== Prediction ==== In prediction, the decoder architecture is discarded entirely. The input image is split into patches by the same algorithm as in training, but no patches are masked out. A linear projection wi

    Read more →
  • Li Sheng (computer scientist)

    Li Sheng (computer scientist)

    Li Sheng (Chinese: 李生; born 1943), is a professor at the School of Computer Science and Engineering, Harbin Institute of Technology (HIT), China. He began his research on Chinese-English machine translation in 1985, making himself one of the earliest Chinese scholars in this field. After that, he pursued in vast topics of natural language processing, including machine translation, information retrieval, question answering and applied artificial intelligence. He was the final review committee member for computer area in NSF China. Born and raised in Heilongjiang province, he graduated in 1965 from the computer specialty of HIT, which is one of the earliest computer specialties in Chinese universities. Then he started to work as a staff in the Computer specialty of HIT, which was finally granted as a department in 1985. Also from 1985, he was appointed to undertake a series administrative positions in HIT, e.g. Dean of Computer Department(1987–1988), Director of R&D Division (1988–1990), Chief R&D Officer and several other key leading positions in HIT. Resigned all his administrative positions in 2004, Li devoted himself as the director of MOE-Microsoft Join Key Lab of NLP& Speech (HIT), making it a leading NLP research group with more than 100 staffs and students working on various aspects of NLP. So far, the lab has already been granted for dozens of technology awards by the ministries of central government and local provincial government of China. Its research progresses are reported annually in top tier conferences including ACL, IJCAI, SIGIR etc. As one of the pioneers in NLP research in China, he contributes NLP in China not only in technology innovations but also in talents education. So far, his research group has graduated more than 60 Ph.D. and almost 200 M.E with NLP major. Most of them are now working as the chief researcher in various NLP groups of universities and companies in China, including several world-known NLP scholars, such as Wang Haifeng of Baidu, Zhou Ming of Microsoft Research, Zhang Min (张民) of Soochow University (China), and Zhao Tiejun (赵铁军) and Liu Ting (刘挺) of HIT. Owing to his contributions in Chinese language processing, Li was elected as the President of Chinese Information Processing Society of China (CIPSC) in 2011. He scaled this top level academic organization in China up to more than 3000 registered members, and promoted NLP into several national projects for research or industry development. In addition, the CIPSC is now enhancing its co-operations with world NLP organizations including ACL. == Machine Intelligence & Translation Laboratory (MI&TLAB) == Originates from Machine Translation Research Group of Computer Science Department, Harbin Institute of Technology, which was started Li in 1985. It is one of the earliest institutions engaged in MT research in China, featured by its investigations into Chinese-English machine translation. It is now running under the Research Center on Language Technology, School of Computer Science and Technology, HIT. Details for staffs and publications can be found at https://mitlab.hit.edu.cn. == MOE-MS Joint Key Lab of Natural Language Processing and Speech (HIT) == In June, 2000, the Joint HIT-Microsoft Machine Translation Lab was founded by MI&T Lab and Microsoft Research (China). It was the third joint lab established by Microsoft Research (China) with Chinese universities, and the only one focusing on Machine Translation. Based on this jointly lab, the cooperation between HIT and Microsoft gradually extended to the areas of machine translation, information retrieval, speech recognition and processing, natural language understanding. In Oct, 2004, the joint key lab was granted as one of the 10 joint key labs supported by the Microsoft Research of Asia and Ministry of Education in China. In July 2006, the Shenzhen extension of the lab was launched. More than 200 staff and students have undertaken research projects, including some sponsored by the National Natural Science Foundation of China and the National 863 program of China. Since 2005, the lab has also been organizing a summer camp in Harbin Institute of Technology, and approximately 150 faculty members and students from universities in China have participated. This summer workshop was organized annually until 2014, when it was organized formally as the summer school series by Chinese Information Processing Society, China. Through the lab, a Microsoft Research of Asia-HIT joint PhD program was implemented in 2012. == CEMT-I MT System == In May 1989, CEMT-I passed the formal project appraisal in Harbin, China. Capable of translating technical paper titles from Chinese to English, it is not only the first MT system completed by Li and his group, but also the first Chinese-English Translation system that passed the technical appraisal by Chinese government according to the public reports. It was then awarded the Second Prize of Ministry Level Technology Innovation by the former National Aerospace Industry Corporation in 1990. == Daya Translation Workstation == Owing to the technical achievements by Li's group in Chinese-English machine translation, the former National Aerospace Industry Corporation of China sponsored a commercial system development of "Daya Translation Station (MT)" in 1993. Designed as a comprehensive English composition aid for Chinese users, this system was finished and put into the market in 1995. And in 1997, this system was awarded the Second Prize of Ministry Level Technology Innovation by the former National Aerospace Industry Corporation. == BT863 MT System == From 1994, the researches in Li's lab were supported by National 863 Hi-tech Research and Development Program. During this period, the BT863 system was explored to employ one engine for both Chinese-English and English-Chinese translation. This system was proved to be the best performance among Chinese-English MT systems in the formal technical evaluation of National 863 program, yielding the Third Prize of Ministry Level Technology Innovation by the former National Aerospace Industry Corporation in 1997. == Next Generation IR == This is a key project granted by NSF China (with a joint sponsorship from MSRA) started form 2008. In contrast to his previous NSF grants for different NLP issues, Li explored in his last PI project on key technologies in personalized IR, together with researchers from Tsinghua University and Institute of Software, Chinese Academy of Science. With impressive publications in top tier journals and conferences (including breakthrough publications in SIGIR of his own group), this projected was approved "A-level" achievements by the NSF China office in 2012.

    Read more →
  • The Best Free AI Presentation Maker for Beginners

    The Best Free AI Presentation Maker for Beginners

    Shopping for the best AI presentation maker? An AI presentation maker is software that uses machine learning to help you get more done — it keeps getting smarter as the underlying models improve. Pricing, accuracy, and the size of the model behind the tool are the three factors that most affect daily usefulness. Whether you are a beginner or a pro, the right AI presentation maker slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →