Triller is an American video-sharing social networking service that was first released for iOS and Android in 2015. The service allowed users to create and share short-form videos, including videos set to, or automatically synchronized to, music using artificial intelligence technology. It initially operated as a video editing app before adding social networking features. Triller gained prominence in 2020 as a competitor to the similar Chinese-owned app TikTok, mainly in the United States and India (after the service was banned in the latter country). The app's success would allow its parent company to expand into sports broadcasting and promotion; including the distribution of pay-per-view boxing events under the Triller Fight Club banner (such as Mike Tyson vs. Roy Jones Jr. and Jake Paul vs. Ben Askren) that incorporated live music performances and appearances by various celebrities and entertainment personalities. == History == === Launch and early years === Triller was launched in 2015 by co-founders David Leiberman and Sammy Rubin. The app was originally positioned as a video editor, using artificial intelligence to automatically edit distinct clips into music videos. They later launched Triller Famous, a page within the app that featured curated selections of user videos. In 2016, the app was purchased by Carnegie Technologies and converted into a social networking service by allowing users to follow each other and share their videos publicly. In 2019, Ryan Kavanaugh's Proxima Media made a majority investment. It is headquartered in Los Angeles, California, and is currently led by CEO Mahi de Silva. === Media exposure and controversies === On June 29, 2020, Government of India banned TikTok, among other apps stating that they were "prejudicial to [the] sovereignty and integrity" of India. Triller, which had planned to enter into the Indian market by the end of 2020, saw a spike from less than 1 million users to over 30 million users in the country overnight. In July 2020, Triller sued ByteDance, the Chinese parent company of TikTok, for infringing patents relating to video editing. In response, TikTok and ByteDance filed a lawsuit against Triller, alleging the litigation initiated by Triller has "cast a cloud" over TikTok's reputation and business dealings. That Summer, U.S. president Donald Trump signed an executive order which threatened to ban TikTok from operating within the United States, citing threats to national security, unless it was sold by ByteDance. The Trump administration stated that TikTok had until November 12, 2020, to assure the administration that the app did not pose any national security threats to the U.S. Following this order and news of possible purchases of TikTok's American operations by companies such as Oracle, Triller jumped from number 198 to number one in the App Store in the U.S., while TikTok dropped down to number three. The discussions surrounding TikTok's potential ban in the United States caused popular TikTok stars, including Charli D’Amelio and her family, to join Triller. Trump joined Triller himself and posted his first video on August 15, 2020. The video received over a million views within hours. On August 12, 2020, Triller partnered with B2B music company 7digital, which will provide Triller with access to its catalogue of 80 million tracks and automatically report usage data to Sony Music, Warner Music Group, Universal Music Group and Merlin Network. The number of Triller's app installations came under scrutiny when third-party analytics firm Apptopia estimated only 52 million lifetime installations of the app by August 2020, while Triller claimed 250 million. Triller threatened to sue Apptopia for publishing the report. By October 2020, Triller claimed to serve 100 million active monthly users, but this number was quickly disputed by six former employees interviewed by Business Insider. Within a few weeks of Triller's claim, employees shared screenshots of the company's internal analytics that showed less than 2.5 million active monthly users. On October 2, 2020, Triller signed licensing deals with the rights societies PRS for Music, GEMA, STIM and IMRO, and the publishers Concord, Downtown and Peermusic. On February 5, 2021, Universal Music Group (UMG) pulled its library from Triller, citing unpaid music royalties. They alleged that Triller "shamefully withheld payments owed to our artists" and refused to negotiate future music licensing. Triller responded with the assertion that "relevant artists" were already partnered with Triller, so a deal with UMG was unnecessary. The two companies reached an expanded licensing agreement in May 2021. On March 24, 2021, Triller signed a licensing agreement with the National Music Publishers' Association. == Features == The Triller app allows users to create music videos, skits, and lip-sync videos containing background music. The app's spotlight feature is its special auto-editing tool, which uses artificial intelligence to automatically stitch separate video clips together without the user having to do it themselves. The separate video clips are created to the same background music, but users are able to shoot multiple takes with different filters or edits each time. Once the auto-editing tool stitches the individual clips together, users can rearrange and replace clips as desired. Users can also customize videos by applying filters and text. When creating a video, users can choose to make a "music video" or a "social video". A "music video" allows users to add music and trim the audio to personal preference. Unlike the music video option, a "social video" does not require the user to add music in the background. The app's auto-editing tool is only used when making music videos, as it uses the background track to help arrange and synchronize the clips. Users can also link their accounts with Apple Music or Spotify to integrate their playlists. Incomplete videos that are yet to be shared appear in a user's "Projects" folder. Once finalized, a video can be shared with other users of the app or through social media platforms such as Facebook, Instagram, Twitter (X), WhatsApp, and YouTube. Any video on Triller can also be downloaded or shared through links, text messages, or direct messaging to other users within the app. The app is divided into three video feeds, consisting of videos from creators that the user follows, the "Social" feed (which showcases trending videos and those by verified users), and the "Music" feed (which exclusively features music videos). Triller accounts can be made either public or private. When the account is public, any user can view the videos on that account. When the account is private, only approved users can view the videos on that account. Users with private accounts can change the privacy settings of individual videos on their accounts from private to public, making the selected videos viewable to anyone on the app. In accordance with online child privacy laws in the United States, children under the age of 13 must receive parental consent in order to create an account on Triller. == User characteristics and behavior == In August 2020, Triller reported that it had been downloaded over 250 million times worldwide with average rating of 4.00. Mobile analytics firm Apptopia disputed the numbers and claimed they were inflated, suggesting that the app had only been downloaded 52 million times since it first launched in 2015. Apptopia pulled the report after Triller threatened to sue the company. The app has been downloaded 23.8 million times in the U.S., with users spending an average of more than 20 minutes per day. A large number of downloads come from India, where TikTok has been banned, as well as from various European and African countries. In October 2020, Triller CEO Mike Lu stated that the app has 100 million monthly active users (MAU). In February 2021, Billboard reported that Triller had "reported higher numbers of monthly active users to the public than it reports to [music] rights holders." CEO Lu argued that "there is no legal definition" of monthly and daily active users, and that "if someone is trying to compare TikTok's MAU/DAU to ours—which means they are saying we have the same definition of MAU/DAU—there is an inherent misunderstanding about Triller's business and business model. It’s like trying to compare a fish and a bicycle." In a public statement, Lu denied that the company had inflated its user metrics. Triller has attracted celebrity users like Chance the Rapper, King Von, LIl Tecca, Lil Mosey, Justin Bieber, Marshmello, The Weeknd, Alicia Keys, Cardi B, Eminem, Post Malone and Kevin Hart. The app is also used by TikTok stars such as Charli D’Amelio, Josh Richards, Noah Beck, Griffin Johnson, and Dixie D’Amelio. Triller has offered large sums of money, company equity, and advisory roles to encourage prominent TikTok users to move to Triller, such as The Sway Boys. Sway House member J
Machine-learned interatomic potential
Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs. Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials. Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations. These models thus remained confined to academia. Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results. Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair. Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules. == Gaussian Approximation Potential (GAP) == One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP), which combines compact descriptors of local atomic environments with Gaussian process regression to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon, silicon, phosphorus, and tungsten, as well as for multicomponent systems such as Ge2Sb2Te5 and austenitic stainless steel, Fe7Cr2Ni. == Equivariant graph neural networks == A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP (2021), MACE (2022), and GemNet-OC (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs. == Universal MLIPs and large-scale datasets == Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs. A key driver was the Open Catalyst Project (OC20, OC22), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 2023 and OpenDAC 2025), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech. These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022), addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time. Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA) in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field. == Applications == Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions. Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows. Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25) dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case. Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.
Kurt Keutzer
Kurt Keutzer (born November 9, 1955) is an American computer scientist. == Early life and education == Kurt Keutzer grew up in Indianapolis, Indiana. He earned a bachelor's degree in mathematics from Maharishi University of Management (formerly Mararishi International University) in 1978, and a PhD in computer science from Indiana University Bloomington in 1984. == Career == Keutzer joined Bell Labs in 1984, where he worked on logic synthesis. In 1991, he joined the electronic design automation company Synopsys, where he was promoted to chief technology officer. He subsequently joined the University of California, Berkeley as a professor in 1998. His research at Berkeley has focused on the intersection of high performance computing and machine learning. Working with a number of graduate students at Berkeley, Keutzer developed FireCaffe, which scaled the training of deep neural networks to over 100 GPUs. Later, with LARS and LAMB optimizers, they scaled it to over 1000 servers. Keutzer and his students also developed deep neural networks such as SqueezeNet, SqueezeDet, and SqueezeSeg, which can run efficiently on mobile devices. Keutzer co-founded DeepScale with his PhD student Forrest Iandola in 2015, and Keutzer served as the company's chief strategy officer. The firm was focused on developing deep neural networks for advanced driver assistance systems in passenger cars. On October 1, 2019, electric vehicle manufacturer Tesla, Inc. purchased DeepScale to augment and accelerate its self-driving vehicle work. == Honors and awards == Keutzer was named a Fellow of the IEEE in 1996. Recipient of DAC Most Influential Paper (MIP) award (24th DAC, 1987) for his "Dagon: technology binding and local optimization by DAG matching” publication. == Books by Keutzer == 1988. Dwight Hill, Don Shugard, John Fishburn, and Kurt Keutzer. Algorithms and Techniques for VLSI Layout Synthesis. Springer. 1994. Srinivas Devadas, Abhijit Ghosh, and Kurt Keutzer. Logic Synthesis. McGraw-Hill. 2002. David Chinnery and Kurt Keutzer. Closing the Gap Between ASIC & Custom: Tools and Techniques for High-Performance ASIC Design. Springer. (2nd edition appeared in 2007.) 2004. Pinhong Chen, Desmond A. Kirkpatrick, and Kurt Keutzer. Static Crosstalk-Noise Analysis: For Deep Sub-Micron Digital Designs. Springer. 2005. Matthias Gries and Kurt Keutzer. Building ASIPs: The Mescal Methodology. Springer.
Top 10 AI Subtitle Generators Compared (2026)
Curious about the best AI subtitle generator? An AI subtitle generator is software that uses machine learning to help you get more done — it combines speed, accuracy, and an interface that just works. Hands-on testing shows real-world results vary, so a short free trial is the smartest way to decide. Whether you are a beginner or a pro, the right AI subtitle generator slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.
Corpus linguistics
Corpus linguistics is an empirical method for the study of language by text corpus (plural corpora). Corpora are balanced, often stratified collections of authentic, "real world", text of speech or writing that aim to represent a given linguistic variety. Today, corpora are generally machine-readable data collections. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference. Large collections of text, though corpora may also be small in terms of running words, allow linguists to run quantitative analyses on linguistic concepts that may be difficult to test in a qualitative manner. The text-corpus method uses the body of texts in any natural language to derive the set of abstract rules which govern that language. Those results can be used to explore the relationships between that subject language and other languages which have undergone a similar analysis. The first such corpora were manually derived from source texts, but now that work is automated. Corpora have not only been used for linguistics research, they have been increasingly used to compile dictionaries (starting with The American Heritage Dictionary of the English Language in 1969) and reference grammars, with A Comprehensive Grammar of the English Language, published in 1985, as a first. Experts in the field have differing views about the annotation of a corpus. These views range from John McHardy Sinclair, who advocates minimal annotation so texts speak for themselves, to the Survey of English Usage team (University College, London), who advocate annotation as allowing greater linguistic understanding through rigorous recording. == History == Some of the earliest efforts at grammatical description were based at least in part on corpora of particular religious or cultural significance. For example, Prātiśākhya literature described the sound patterns of Sanskrit as found in the Vedas, and Pāṇini's grammar of classical Sanskrit was based at least in part on analysis of that same corpus. Similarly, the early Arabic grammarians paid particular attention to the language of the Quran. In the Western European tradition, scholars prepared concordances to allow detailed study of the language of the Bible and other canonical texts. === English corpora === A landmark in modern corpus linguistics was the publication of Computational Analysis of Present-Day American English in 1967. Written by Henry Kučera and W. Nelson Francis, the work was based on an analysis of the Brown Corpus, which is a structured and balanced corpus of one million words of American English from the year 1961. The corpus comprises 2000 text samples, from a variety of genres. The Brown Corpus was the first computerized corpus designed for linguistic research. Kučera and Francis subjected the Brown Corpus to a variety of computational analyses and then combined elements of linguistics, language teaching, psychology, statistics, and sociology to create a rich and variegated opus. A further key publication was Randolph Quirk's "Towards a description of English Usage" in 1960 in which he introduced the Survey of English Usage. Quirk's corpus was the first modern corpus to be built with the purpose of representing the whole language. Shortly thereafter, Boston publisher Houghton-Mifflin approached Kučera to supply a million-word, three-line citation base for its new American Heritage Dictionary, the first dictionary compiled using corpus linguistics. The AHD took the innovative step of combining prescriptive elements (how language should be used) with descriptive information (how it actually is used). Other publishers followed suit. The British publisher Collins' COBUILD monolingual learner's dictionary, designed for users learning English as a foreign language, was compiled using the Bank of English. The Survey of English Usage Corpus was used in the development of one of the most important Corpus-based Grammars, which was written by Quirk et al. and published in 1985 as A Comprehensive Grammar of the English Language. The Brown Corpus has also spawned a number of similarly structured corpora: the LOB Corpus (1960s British English), Kolhapur (Indian English), Wellington (New Zealand English), Australian Corpus of English (Australian English), the Frown Corpus (early 1990s American English), and the FLOB Corpus (1990s British English). Other corpora represent many languages, varieties and modes, and include the International Corpus of English, and the British National Corpus, a 100 million word collection of a range of spoken and written texts, created in the 1990s by a consortium of publishers, universities (Oxford and Lancaster) and the British Library. For contemporary American English, work has stalled on the American National Corpus, but the 400+ million word Corpus of Contemporary American English (1990–present) is now available through a web interface. The first computerized corpus of transcribed spoken language was constructed in 1971 by the Montreal French Project, containing one million words, which inspired Shana Poplack's much larger corpus of spoken French in the Ottawa-Hull area. === Multilingual corpora === In the 1990s, many of the notable early successes on statistical methods in natural-language programming (NLP) occurred in the field of machine translation, due especially to work at IBM Research. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. There are corpora in non-European languages as well. For example, the National Institute for Japanese Language and Linguistics in Japan has built a number of corpora of spoken and written Japanese. Sign language corpora have also been created using video data. === Ancient languages corpora === Besides these corpora of living languages, computerized corpora have also been made of collections of texts in ancient languages. An example is the Andersen-Forbes database of the Hebrew Bible, developed since the 1970s, in which every clause is parsed using graphs representing up to seven levels of syntax, and every segment tagged with seven fields of information. The Quranic Arabic Corpus is an annotated corpus for the Classical Arabic language of the Quran. This is a recent project with multiple layers of annotation including morphological segmentation, part-of-speech tagging, and syntactic analysis using dependency grammar. The Digital Corpus of Sanskrit (DCS) is a "Sandhi-split corpus of Sanskrit texts with full morphological and lexical analysis... designed for text-historical research in Sanskrit linguistics and philology." === Corpora from specific fields === Besides pure linguistic inquiry, researchers had begun to apply corpus linguistics to other academic and professional fields, such as the emerging sub-discipline of Law and Corpus Linguistics, which seeks to understand legal texts using corpus data and tools. The DBLP Discovery Dataset concentrates on computer science, containing relevant computer science publications with sentient metadata such as author affiliations, citations, or study fields. A more focused dataset was introduced by NLP Scholar, a combination of papers of the ACL Anthology and Google Scholar metadata. Corpora can also aid in translation efforts or in teaching foreign languages. == Methods == Corpus linguistics has generated a number of research methods, which attempt to trace a path from data to theory. Wallis and Nelson (2001) first introduced what they called the 3A perspective: Annotation, Abstraction and Analysis. Annotation consists of the application of a scheme to texts. Annotations may include structural markup, part-of-speech tagging, parsing, and numerous other representations. Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically includes linguist-directed search but may include e.g., rule-learning for parsers. Analysis consists of statistically probing, manipulating and generalising from the dataset. Analysis might include statistical evaluations, optimisation of rule-bases or knowledge discovery methods. Most lexical corpora today are part-of-speech-tagged (POS-tagged). However even corpus linguists who work with 'unannotated plain text' inevitably apply some method to isolate salient terms. In such situations annotation and abstraction are combined in a lexical search. The advantage of publishing an annotated corpus is that other users can then perform experiments on the corpus (through corpus managers). Linguists with other interests and differing perspectives than the originators' can exploit this work. By sharing data
Procreate (software)
Procreate is a raster graphics editor app for digital painting developed and published by the Australian company Savage Interactive for iOS and iPadOS. It was launched on the App Store in 2011. == Versions == === Procreate === Procreate for iPad was first released in 2011 by the Tasmanian software company Savage Interactive. In June 2013, Savage launched Procreate 2 in conjunction with iOS 7, adding new features such as higher resolution capabilities and more brush options. In 2016, Procreate became one of the top ten best-selling iPad apps on the App Store. In 2018, Procreate became the overall best selling iPad app. With iOS 26, Procreate adapted Liquid Glass into its software. As of March 2026, the most recent version of Procreate for the iPad is 5.4.9. === Procreate Pocket === Procreate Pocket was released to the App Store in December 2014. In 2018, Savage launched Procreate Pocket 2.0 to the App Store. In December 2018, Procreate Pocket received Apple's "App of the Year" award. As of September 2025, the most recent version of Procreate Pocket (for the iPhone) is 4.0.15. === Procreate Dreams === Procreate Dreams, their more recent app focused on 2D animation, was released on the App Store on November 22, 2023. While the application is commended for its intuitive interface and accessibility, some reviewers have noted that it may lack some key animations features, such as reference layers. In June 2024, Procreate Dreams received the 2024 Apple Design Award for Innovation. In December 2025, Savage Interactive released Procreate Dreams 2, a long awaited update and redesign to Procreate Dreams. == Features == The current versions of Procreate use Valkyrie, a proprietary graphics engine to allow customisable brush options and importing brushes from Adobe Photoshop. Procreate offers known features like layers, masks, and blending mode. Its biggest standout compared to other professional drawing software is its simple UI and comparatively easy learning curve. The app also allows for animation. Savage expanded upon Procreate's animation features with a companion app dedicated to 2D animation called Procreate Dreams, released in November 2023. On August 2024, Procreate announced that it would not be incorporating generative artificial intelligence into its software. Savage offers a free internet forum called Procreate Discussions in which users can ask for help, suggest ideas, and share user-generated content on the marketplace or the resources board. == Notable users == Concept artist Doug Chiang creates robot, vehicle, and creature designs for Star Wars in Procreate. Professional artists have also used Procreate to create the posters for Stranger Things, Logan, and Blade Runner 2049, as well as several covers for The New Yorker. It has also been professionally adopted at Marvel Comics, DC Comics, Disney Animation, and Pixar.
Yejin Choi
Yejin Choi (Korean: 최예진; born 1977) is the Dieter Schwarz Foundation Professor and Senior Fellow at the Department of Computer Science at Stanford University and the Stanford Institute for Human-Centered Artificial Intelligence (HAI) respectively. Her research considers natural language processing and computer vision. == Early life and education == Choi is from South Korea. She attended Seoul National University. After earning a bachelor's degree in Computer Science, Choi moved to the United States, where she joined Cornell University as a graduate student. There she worked with Claire Cardie on natural language processing. After earning her doctorate, Choi joined Stony Brook University as an Assistant Professor of Computer Science. At Stony Brook University Choi developed a statistical technique to identify fake hotel reviews. == Research and career == In 2018 Choi joined the Allen Institute for AI. Her research looks to endow computers with a statistical understanding of written language. She became interested in neural networks and their application in artificial intelligence. She started to assemble a knowledge base that became known as the atlas of machine commonsense (ATOMIC). By the time she had finished the creation of ATOMIC, the language model generative Pre-trained Transformer 2 (GPT-2) had been released. ATOMIC does not make use of linguistic rules, but combines the representations of different languages within a neural network. In 2020, Choi was endowed with the Brett Helsel Professorship, which she held until she became Chair of Computer Science in 2023. She has since made use of Commonsense Transformers (COMET) with Good old fashioned artificial intelligence (GOFAI). The approach combines symbolic reasoning and neural networks. She has developed computational models that can detect biases in language that work against people from underrepresented groups. For example, one study demonstrated that female film characters are portrayed as less powerful than their male counterparts. In 2023, Choi became The Wissner-Slivka Chair of Computer Science. Choi is also a scientific advisor to French research group Kyutai which is being funded by Xavier Niel, Rodolphe Saadé, Eric Schmidt, and others. In 2025, Stanford HAI announced the appointment of Choi as senior fellow and the Dieter Schwarz Foundation HAI Professor and Professor of Computer Science at Stanford University. == Awards and honours == 2013 International Conference on Computer Vision Marr Prize 2016 Institute of Electrical and Electronics Engineers AI One to Watch 2017 Facebook ParlAI Research Award 2018 Anita Borg Early Career Award 2020 Association for the Advancement of Artificial Intelligence Outstanding Paper Award 2021 Conference on Neural Information Processing Systems Outstanding Paper Award 2021 Association for Computational Linguistics Test-of-time Paper Award 2021 Conference on Computer Vision and Pattern Recognition Longuet-Higgins Prize 2022 North American Chapter of the Association for Computational Linguistics Best Paper Award 2022 International Conference on Machine Learning Outstanding Paper Award 2022 MacArthur Fellowship 2023 Association for Computational Linguistics Best Paper Award 2023 TIME100 Archived 2024-12-27 at the Wayback Machine AI 2023 2023 Empirical Methods in Natural Language Processing Outstanding Paper Award 2025 Association for Computational Linguistics Outstanding Paper Award 2025 Association for Computational Linguistics Best Demo Paper Award 2025 TIME100 AI 2025 == Select publications == Ott, Myle; Choi, Yejin; Cardie, Claire; Hancock, Jeffrey T. (2011). "Finding Deceptive Opinion Spam by Any Stretch of the Imagination". Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Portland, Oregon, USA: Association for Computational Linguistics: 309–319. arXiv:1107.4557. Bibcode:2011arXiv1107.4557O. ISBN 9781932432879. S2CID 2510724. Kulkarni, Girish; Premraj, Visruth; Ordonez, Vicente; Dhar, Sagnik; Li, Siming; Choi, Yejin; Berg, Alexander C.; Berg, Tamara L. (2013). "BabyTalk: Understanding and Generating Simple Image Descriptions". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (12): 2891–2903. Bibcode:2013ITPAM..35.2891K. CiteSeerX 10.1.1.225.5228. doi:10.1109/TPAMI.2012.162. ISSN 1939-3539. PMID 22848128. Choi, Yejin; Cardie, Claire; Riloff, Ellen; Patwardhan, Siddharth (2005). "Identifying sources of opinions with conditional random fields and extraction patterns". Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT '05. Morristown, NJ, USA: Association for Computational Linguistics. pp. 355–362. doi:10.3115/1220575.1220620.