AI Art Filter

AI Art Filter — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Hierarchical Risk Parity

    Hierarchical Risk Parity

    Hierarchical Risk Parity (HRP) is an advanced investment portfolio optimization framework developed in 2016 by Marcos López de Prado at Guggenheim Partners and Cornell University. HRP is a probabilistic graph-based alternative to the prevailing mean-variance optimization (MVO) framework developed by Harry Markowitz in 1952, and for which he received the Nobel Prize in economic sciences. HRP algorithms apply discrete mathematics and machine learning techniques to create diversified and robust investment portfolios that outperform MVO methods out-of-sample. HRP aims to address the limitations of traditional portfolio construction methods, particularly when dealing with highly correlated assets. Following its publication, HRP has been implemented in numerous open-source libraries, and received multiple extensions. == Key features == HRP portfolios have been proposed as a robust alternative to traditional quadratic optimization methods, including the Critical Line Algorithm (CLA) of Markowitz. HRP addresses three central issues commonly associated with quadratic optimizers: numerical instability, excessive concentration in a small number of assets, and poor out-of-sample performance. HRP leverages techniques from graph theory and machine learning to construct diversified portfolios using only the information embedded in the covariance matrix. Unlike quadratic programming methods, HRP does not require the covariance matrix to be invertible. Consequently, HRP remains applicable even in cases where the covariance matrix is ill-conditioned or singular—conditions under which standard optimizers fail. Monte Carlo simulations indicate that HRP achieves lower out-of-sample variance than CLA, despite the fact that minimizing variance is the explicit optimization objective of CLA. Furthermore, HRP portfolios exhibit lower realized risk compared to those generated by traditional risk parity methodologies. Empirical backtests have demonstrated that HRP would have historically outperformed conventional portfolio construction techniques. Algorithms within the HRP framework are characterized by the following features: Machine Learning Approach: HRP employs hierarchical clustering, a machine learning technique, to group similar assets based on their correlations. This allows the algorithm to identify the underlying hierarchical structure of the portfolio, and avoid that errors spread through the entire network. Risk-Based Allocation: The algorithm allocates capital based on risk, ensuring that assets only compete with similar assets for representation in the portfolio. This approach leads to better diversification across different risk sources, while avoiding the instability associated with noisy returns estimates. Covariance Matrix Handling: Unlike traditional methods like Mean-Variance Optimization, HRP does not require inverting the covariance matrix. This makes it more stable and applicable to portfolios with a large number of assets, particularly when the covariance matrix's condition number is high. == The problem: Markowitz's Curse == Portfolio construction is perhaps the most recurrent financial problem. On a daily basis, investment managers must build portfolios that incorporate their views and forecasts on risks and returns. Despite the theoretical elegance of Markowitz's mean-variance framework, its practical implementation is hindered by several limitations that undermine the reliability of solutions derived from the Critical Line Algorithm (CLA). A principal concern is the high sensitivity of optimal portfolios to small perturbations in expected returns: even minor forecasting errors can result in significantly different allocations (Michaud, 1998). Given the inherent difficulty of producing accurate return forecasts, numerous researchers have advocated for approaches that forgo expected returns entirely and instead rely solely on the covariance structure of asset returns. This has given rise to risk-based allocation methods, among which risk parity is a widely cited example (Jurczenko, 2015). While eliminating return forecasts mitigates some instability, it does not eliminate it. Quadratic programming techniques employed in portfolio optimization require the inversion of a positive-definite covariance matrix, meaning all eigenvalues must be strictly positive. When the matrix is numerically ill-conditioned—that is, when the ratio of its largest to smallest eigenvalue (its condition number) is large—matrix inversion becomes unreliable and prone to significant numerical errors (Bailey and López de Prado, 2012). The condition number of a covariance, correlation, or any symmetric (and thus diagonalizable) matrix is defined as the absolute value of the ratio between its largest and smallest eigenvalues in modulus. The figure on the right presents the sorted eigenvalues of several correlation matrices; the condition number is represented by the ratio of the first to last eigenvalues in each sequence. A diagonal correlation matrix, which is equal to its own inverse, exhibits the minimum possible condition number. As the number of correlated (or multicollinear) assets in a portfolio increases, the condition number rises. At high levels, this leads to severe numerical instability, whereby slight modifications in any matrix entry may result in drastically different inverses. This phenomenon, often referred to as Markowitz’s curse, encapsulates the paradox wherein increased correlation among assets heightens the theoretical need for diversification, yet simultaneously increases the likelihood of unstable optimization outcomes. Consequently, the potential benefits of diversification are frequently overshadowed by estimation errors. These problems are exacerbated as the dimensionality of the covariance matrix increases. The estimation of each covariance term consumes degrees of freedom, and in general, a minimum of 1 2 N ( N + 1 ) {\displaystyle {\frac {1}{2}}N(N+1)} independent and identically distributed (IID) observations is required to estimate a non-singular covariance matrix of dimension N {\displaystyle N} . For example, constructing an invertible covariance matrix of dimension 50 necessitates at least five years of daily IID observations. However, empirical evidence suggests that the correlation structure of financial assets is highly unstable over such extended periods. These difficulties are highlighted by the observation that even naïve allocation strategies—such as equally weighted portfolios—have frequently outperformed both mean-variance and risk-based optimizations in out-of-sample tests (De Miguel et al., 2009). == The solution: Hierarchical Risk Parity == The HRP algorithm addresses Markowitz's curse in three steps: Hierarchical Clustering: Assets are grouped into clusters based on their correlations, forming a hierarchical tree structure. Quasi-Diagonalization: The correlation matrix is reordered based on the clustering results, revealing a block diagonal structure. Recursive Bisection: Weights are assigned to assets through a top-down approach, splitting the portfolio into smaller sub-portfolios and allocating capital based on inverse variance. === Step 1: Hierarchical clustering === Given a T × N {\displaystyle T\times N} matrix of asset returns X {\displaystyle X} , where each column represents a time series of returns for one of N {\displaystyle N} assets over T {\displaystyle T} time periods, a hierarchical clustering process can be used to construct a tree-based representation of asset relationships. First, we compute the N × N {\displaystyle N\times N} correlation matrix ρ = ρ i , j i , j = 1 . . . N {\displaystyle \rho ={\rho _{i,j}}\;{i,j=1\;...\;N}} , where ρ i , j = c o r r ( X i , X j ) {\displaystyle \rho _{i,j}=\mathrm {corr} (X_{i},X_{j})} . From this, a pairwise distance matrix D = d i , j {\displaystyle D={d_{i,j}}} is defined using the transformation: d i , j = 1 2 ( 1 − ρ i , j ) {\displaystyle d_{i,j}={\sqrt {{\frac {1}{2}}(1-\rho _{i,j})}}} This distance function defines a proper metric space, satisfying non-negativity, identity of indiscernibles, symmetry, and the triangle inequality. Next, a secondary distance matrix D ~ = d ~ i , j {\displaystyle {\tilde {D}}={{\tilde {d}}_{i,j}}} is computed, where each entry measures the Euclidean distance between the distance profiles of two assets: d ~ i , j = ∑ n = 1 N ( d n , i − d n , j ) 2 {\displaystyle {\tilde {d}}_{i,j}={\sqrt {\sum _{n=1}^{N}(d_{n,i}-d_{n,j})^{2}}}} While d i , j {\displaystyle d_{i,j}} reflects correlation-based proximity between two assets, d ~ i , j {\displaystyle {\tilde {d}}_{i,j}} quantifies dissimilarity across the entire system, as it depends on all pairwise distances. Hierarchical clustering proceeds by identifying the pair ( i , j ) {\displaystyle (i,j)} with the smallest value of d ~ i , j {\displaystyle {\tilde {d}}_{i,j}} (for i ≠ j {\displaystyle i\neq j} ), and forming a new cluster u [ 1 ] = ( i , j ) {\displaystyle u[1]=(i,j)} .

    Read more →
  • The Best Free AI Image Generator for Beginners

    The Best Free AI Image Generator for Beginners

    In search of the best AI image generator? An AI image generator is software that uses machine learning to help you get more done — it turns a rough idea into a polished result in seconds. When choosing one, weigh output quality, pricing, export formats, and how well it fits the tools you already use. Whether you are a beginner or a pro, the right AI image generator slots into your workflow and pays for itself fast. Below we compare features, pricing, and real output so you can choose with confidence.

    Read more →
  • Deborah Raji

    Deborah Raji

    Inioluwa Deborah Raji (born 1995/1996) is a Nigerian-Canadian computer scientist and socio-tech leader who works on algorithmic bias, AI accountability, and algorithmic auditing. A current Mozilla fellow, she has been recognized by MIT Technology Review and Forbes as one of the world's top young innovators. Raji started her work with racial bias in technology during her internship with Clarifai when she recognized that people of color were more often tagged for NSFW compared to white people. Raji has previously worked with Joy Buolamwini, Timnit Gebru, and the Algorithmic Justice League on researching gender and racial bias in facial recognition technology. Her work on racial bias in facial recognition has forced companies to ultimately change their practices. She has also worked with Google’s Ethical AI team and been a research fellow at the Partnership on AI and AI Now Institute at New York University working on how to operationalize ethical considerations in machine learning engineering practice. She was working on a computer vision model that would help clients flag inappropriate images as NSFW. == Early life and education == Raji was born in Port Harcourt, Nigeria, and moved to Mississauga, Ontario, Canada, when she was four years old. Eventually her family moved to Ottawa. She attended Colonel By Secondary School and completed the International Baccalaureate programme. She studied Engineering Science at the University of Toronto, graduating in 2019. In 2015, she founded Project Include, a nonprofit providing increased student access to engineering education, mentorship, and resources in low income and immigrant communities in the Greater Toronto Area. She started a Doctor of Philosophy - PhD, in Computer Science from the University of California, Berkeley in Aug 2021. == Career and research == Raji worked with Joy Buolamwini at the MIT Media Lab and Algorithmic Justice League, where she audited commercial facial recognition technologies from Microsoft, Amazon, IBM, Face++, and Kairos. They found that these technologies were significantly less accurate for darker-skinned women than for white men. With support from other top AI researchers and increased public pressure and campaigning, their work led IBM and Amazon to agree to support facial recognition regulation and later halt the sale of their product to police for at least a year. Raji also interned at machine learning startup Clarifai, where she worked on a computer vision model for flagging images. She participated in a research mentorship program at Google and worked with their Ethical AI team on creating model cards, a documentation framework for more transparent machine learning model reporting. She also co-led the development of internal auditing practices at Google. Her contributions at Google were separately presented and published at the AAAI conference and ACM Conference on Fairness, Accountability, and Transparency. In 2019, Raji was a summer research fellow at The Partnership on AI working on setting industry machine learning transparency standards and benchmarking norms. Raji was a Tech Fellow at the AI Now Institute worked on algorithmic and AI auditing. Currently, she is a fellow at the Mozilla Foundation researching algorithmic auditing and evaluation. Raji's work on bias in facial recognition systems has been highlighted in the 2020 documentary Coded Bias directed by Shalini Kantayya. She also took part in the 2026 documentary The AI Doc: Or How I Became an Apocaloptimist directed by Daniel Roher. == Awards == 2019 Venture Beat AI Innovations Award in category AI for Good (received with Joy Buolamwini and Timnit Gebru) 2020 MIT Technology Review 35 Under 35 Innovator Award 2020 EFF Pioneer Award (received with Buolamwini and Gebru) 2021 Forbes 30 Under 30 Award in Enterprise Technology 2021 100 Brilliant Women in AI Ethics Hall of Fame Honoree 2023 Time magazine 100 Most Influential People in AI

    Read more →
  • Generalized nondeterministic finite automaton

    Generalized nondeterministic finite automaton

    In the theory of computation, a generalized nondeterministic finite automaton (GNFA), also known as an expression automaton or a generalized nondeterministic finite state machine, is a variation of a nondeterministic finite automaton (NFA) where each transition is labeled with any regular expression. The GNFA reads blocks of symbols from the input which constitute a string as defined by the regular expression on the transition. There are several differences between a standard finite state machine and a generalized nondeterministic finite state machine. A GNFA must have only one start state and one accept state, and these cannot be the same state, whereas an NFA or DFA both may have several accept states, and the start state can be an accept state. A GNFA must have only one transition between any two states, whereas a NFA or DFA both allow for numerous transitions between states. In a GNFA, a state has a single transition to every state in the machine, although often it is a convention to ignore the transitions that are labelled with the empty set when drawing generalized nondeterministic finite state machines. == Formal definition == A GNFA can be defined as a 5-tuple, (S, Σ, T, s, a), consisting of a finite set of states (S); a finite set called the alphabet (Σ); a transition function (T : (S ∖ {\displaystyle \setminus } {a}) × (S ∖ {\displaystyle \setminus } {s}) → R); a start state (s ∈ S); an accept state (a ∈ S); where R is the collection of all regular expressions over the alphabet Σ. The transition function takes as its argument a pair of two states and outputs a regular expression (the label of the transition). This differs from other finite state machines, which take as input a single state and an input from the alphabet (or the empty string in the case of nondeterministic finite state machines) and outputs the next state (or the set of possible states in the case of nondeterministic finite state machines). A DFA or NFA can easily be converted into a GNFA and then the GNFA can be easily converted into a regular expression by repeatedly collapsing parts of it to single edges until S = {s, a}. Similarly, GNFAs can be reduced to NFAs by changing regular expression operators into new edges until each edge is labelled with a regular expression matching a single string of length at most 1. NFAs, in turn, can be reduced to DFAs using the powerset construction. This shows that GNFAs recognize the same set of formal languages as DFAs and NFAs.

    Read more →
  • XLeratorDB

    XLeratorDB

    XLeratorDB is a suite of database function libraries that enable Microsoft SQL Server to perform a wide range of additional (non-native) business intelligence and ad hoc analytics. The libraries, which are embedded and run centrally on the database, include more than 450 individual functions similar to those found in Microsoft Excel spreadsheets. The individual functions are grouped and sold as six separate libraries based on usage: finance, statistics, math, engineering, unit conversions and strings. WestClinTech, the company that developed XLeratorDB, claims it is "the first commercial function package add-in for Microsoft SQL Server." == Company history == WestClinTech (LLC), founded by software industry veterans Charles Flock and Joe Stampf in 2008, is located in Irvington, New York, United States. Flock was a co-founder of The Frustum Group, developer of the OPICS enterprise banking and trading platform, which was acquired by London-based Misys, PLC in 1996. Stampf joined Frustum in 1994 and with Flock remained active with the company after acquisition, helping to develop successive generations of OPICS now employed by over 150 leading financial institutions worldwide. Following a full year of research, development and testing, WestClinTech introduced and recorded its first commercial sale of XLeratorDB in April 2009. In September 2009, XLeratorDB became available to all Federal agencies through NASA's Strategic Enterprise-Wide Procurement (SEWP-IV) program, a government-wide acquisition contract. == Technology == XLeratorDB uses Microsoft SQL CLR(Common Language Runtime) technology. SQL CLR allows managed code to be hosted by, and run in, the Microsoft SQL Server environment. SQL CLR relies on the creation, deployment and registration of .NET Framework assemblies that are physically stored in managed code dynamic-link libraries (DLL). The assemblies may contain .NET namespaces, classes, functions, and properties. Because managed code compiles to native code prior to execution, functions using SQL CLR can achieve significant performance increases versus the equivalent functions written in T-SQL in some scenarios. XLeratorDB requires Microsoft SQL Server 2005 or SQL Server 2005 Express editions, or later (compatibility mode 90 or higher). The product installs with PERMISSION_SET=SAFE. SAFE mode, the most restrictive permission set, is accessible by all users. Code executed by an assembly with SAFE permissions cannot access external system resources such as files, the network, the internet, environment variables, or the registry. == Functions == In computer science, a function is a portion of code within a larger program which performs a specific task and is relatively independent of the remaining code. As used in database and spreadsheet applications these functions generally represent mathematical formulas widely used across a variety of fields. While this code may be user-generated, it is also embedded as a pre-written sub-routine in applications. These functions are typically identified by common nomenclature which corresponds to their underlying operations: e.g. IRR identifies the function which calculates Internal Rate of Return on a series of periodic cash flows. === Function uses === As subroutines, functions can be integrated and used in a variety of ways, and as part of larger, more complicated applications. Within large enterprise applications they may, for example, play an important role in defining business rules or risk management parameters, while remaining virtually invisible to end users. Within database management systems and spreadsheets, however, these kinds of functions also represent discrete sets of tools; they can be accessed directly and utilized on a stand-alone basis, or in more complex, user-defined configurations. In this context, functions can be used for business intelligence and ad hoc analysis of data in fields such as finance, statistics, engineering, math, etc. === Function types === XLeratorDB uses three kinds of functions to perform analytic operations: scalar, aggregate, and a hybrid form which WestClinTech calls Range Queries. Scalar functions take a single value, perform an operation and return a single value. An example of this type of function is LOG, which returns the logarithm of a number to a specified base. Aggregate functions operate on a series of values but return a single, summarizing value. An example of this type of function is AVG, which returns the average of values in a specified group. In XLeratorDB there are some functions which have characteristics of aggregate functions (operating on multiple series of values) but cannot be processed in SQL CLR using single column inputs, such as AVG does. For example, irregular internal rate of return (XIRR), a financial function, operates on a collection of cash flow values from one column, but must also apply variable period lengths from another column and an initial iterative assumption from a third, in order to return a single, summarizing value. WestClinTech documentation notes that Range Queries specify the data to be included in the result set of the function independently of the WHERE clause associated with the T-SQL statement, by incorporating a SELECT statement into the function as a string argument; the function then traps that SELECT statement, executes it internally and processes the result. Some XLeratorDB functions that employ Range Queries are: NPV, XNPV, IRR, XIRR, MIRR, MULTINOMIAL, and SERIESSUM. Within the application these functions are identified by a "_q" naming convention: e.g. NPV_q, IRR_q, etc. == Analytic functions == === SQL Server functions === Microsoft SQL Server is the #3 selling database management system (DBMS), behind Oracle and IBM. (While versions of SQL Server have been on the market since 1987, XLeratorDB is compatible with only the 2005 edition and later.) Like all major DBMS, SQL Server performs a variety of data mining operations by returning or arraying data in different views (also known as drill-down). In addition, SQL Server uses Transact-SQL (T-SQL) to execute four major classes of pre-defined functions in native mode. Functions operating on the DBMS offer several advantages over client layer applications like Excel: they utilize the most up-to-date data available; they can process far larger quantities of data; and, the data is not subject to exporting and transcription errors. SQL Server 2008 includes a total of 58 functions that perform relatively basic aggregation (12), math (23) and string manipulation (23) operations useful for analytics; it includes no native functions that perform more complex operations directly related to finance, statistics or engineering. === Excel functions === Microsoft Excel, a component of Microsoft Office suite, is one of the most widely used spreadsheet applications on the market today. In addition to its inherent utility as a stand-alone desktop application, Excel overlaps and complements the functionality of DBMS in several ways: storing and arraying data in rows and columns; performing certain basic tasks such as pivot table and aggregating values; and facilitating sharing, importing and exporting of database data. Excel's chief limitation relative to a true database is capacity; Excel 2003 is limited to some 65k rows and 256 columns; Excel 2007 extends this capacity to roughly 1million rows and 16k columns. By comparison, SQL Server is able to manage over 500k terabytes of memory. Excel offers, however, an extensive library of specialized pre-written functions which are useful for performing ad hoc analysis on database data. Excel 2007 includes over 300 of these pre-defined functions, although customized functions can also be created by users, or imported from third party developers as add-ons. Excel functions are grouped by type: === Excel business intelligence functions === Operating on the client computing layer Excel plays an important role as a business intelligence tool because it: performs a wide array of complex analytic functions not native to most DBMS software offers far greater ad hoc reporting and analytic flexibility than most enterprise software provides a medium for sharing and collaborating because of its ubiquity throughout the enterprise Microsoft reinforces this positioning with Business Intelligence documentation that positions Excel in a clearly pivotal role. === XLeratorDB vs. Excel functions === While operating within the database environment, XLeratorDB functions utilize the same naming conventions and input formats, and in most cases, return the same calculation results as Excel functions. XLeratorDB, coupled with SQL Server's native capabilities, compares to Excel's function sets as follows:

    Read more →
  • Mark Steedman

    Mark Steedman

    Mark Jerome Steedman (born 18 September 1946) is a British computational linguist and cognitive scientist. == Biography == Steedman graduated from the University of Sussex in 1968, with a B.Sc. in Experimental Psychology, and from the University of Edinburgh in 1973, with a Ph.D. in Artificial Intelligence (Dissertation: The Formal Description of Musical Perception gained in 1972. Advisor: Prof. H.C. Longuet-Higgins FRS). He has held posts as Lecturer in Psychology, University of Warwick (1977–83); Lecturer and Reader in Computational Linguistics, University of Edinburgh (1983–8); Associate and full Professor in Computer and Information Sciences, University of Pennsylvania (1988–98). He has held visiting positions at the University of Texas at Austin, the Max Planck Institute for Psycholinguistics, Radboud University Nijmegen, and the University of Pennsylvania, Philadelphia. Steedman currently holds the Chair of Cognitive Science in the School of Informatics at the University of Edinburgh (1998– ). He works in computational linguistics, artificial intelligence, and cognitive science, on Generation of Meaningful Intonation for Speech by Artificial Agents, Animated Conversation, The Communicative Use of Gesture, Tense and Aspect, and combinatory categorial grammar (CCG). He is also interested in Computational Musical Analysis and combinatory logic. == Distinctions == Member of the Academia Europæa (2006) Fellow of the British Academy (2002). Fellow of the Royal Society of Edinburgh (2002) AAAI Fellow (1993) President elect for 2008 of the Association for Computational Linguistics Fellow of the Association for Computational Linguistics (2012) == Principal publications == Steedman, Mark (1996). Surface structure and interpretation. Linguistic Inquiry Monograph. Vol. 30. Cambridge, MA: MIT Press. p. 123. ISBN 978-0-262-19379-5. Steedman, Mark (2000). The Syntactic Process. Language, Speech, and Communication. Cambridge, MA: MIT Press. p. 344. ISBN 978-0-262-69268-7. Steedman, Mark (Fall 2000). "Information Structure and the Syntax-Phonology Interface". Linguistic Inquiry. 31 (4): 649–689. doi:10.1162/002438900554505. ISSN 0024-3892. S2CID 9084597.

    Read more →
  • Struc2vec

    Struc2vec

    struc2vec is a framework to generate node vector representations on a graph that preserve the structural identity. In contrast to node2vec, that optimizes node embeddings so that nearby nodes in the graph have similar embedding, struc2vec captures the roles of nodes in a graph, even if structurally similar nodes are far apart in the graph. It learns low-dimensional representations for nodes in a graph, generating random walks through a constructed multi-layer graph starting at each graph node. It is useful for machine learning applications where the downstream application is more related with the structural equivalence of the nodes (e.g., it can be used to detect nodes in networks with similar functions, such as interns in the social network of a corporation). struc2vec identifies nodes that play a similar role based solely on the structure of the graph, for example computing the structural identity of individuals in social networks. In particular, struc2vec employs a degree-based method to measure the pairwise structural role similarity, which is then adopted to build the multi-layer graph. Moreover, the distance between the latent representation of nodes is strongly correlated to their structural similarity. The framework contains three optimizations: reducing the length of degree sequences considered, reducing the number of pairwise similarity calculations, and reducing the number of layers in the generated graph. struc2vec follows the intuition that random walks through a graph can be treated as sentences in a corpus. Each node in a graph is treated as an individual word, and short random walk is treated as a sentence. In its final phase, the algorithm employs Gensim's word2vec algorithm to learn embeddings based on biased random walks. Sequences of nodes are fed into a skip-gram or continuous bag of words model and traditional machine-learning techniques for classification can be used. It is considered a useful framework to learn node embeddings based on structural equivalence.

    Read more →
  • Karen Spärck Jones

    Karen Spärck Jones

    Karen Ida Boalth Spärck Jones (26 August 1935 – 4 April 2007) was a self-taught programmer and a pioneering British computer and information scientist responsible for the concept of inverse document frequency (IDF), a technology that underlies most modern search engines. She was an advocate for women in computer science, her slogan being, "Computing is too important to be left to men." In 2019, The New York Times published her belated obituary in its series Overlooked, calling her "a pioneer of computer science for work combining statistics and linguistics, and an advocate for women in the field." From 2008, to recognise her achievements in the fields of information retrieval (IR) and natural language processing (NLP), the Karen Spärck Jones Award is awarded annually to a recipient for outstanding research in one or both of her fields. == Early life and education == Karen Ida Boalth Spärck Jones was born in Huddersfield, Yorkshire, England. Her parents were Alfred Owen Jones, a chemistry lecturer, and Ida Spärck, a Norwegian who worked for the Norwegian government while in exile in London during World War II. Spärck Jones was educated at a grammar school in Huddersfield and then from 1953 to 1956 at Girton College, Cambridge, studying history, with an additional final year in Moral Sciences (philosophy). While at Cambridge, Spärck Jones joined the organisation known as the Cambridge Language Research Unit (CLRU) and met the head of CLRU Margaret Masterman, who would inspire her to go into computer science. While working at the CLRU, Spärck Jones began pursuing her PhD. At the time of submission, her PhD thesis was cast aside as uninspired and lacking original thought, but was later published in its entirety as a book. She briefly became a school teacher before moving into computer science. Spärck Jones married fellow Cambridge computer scientist Roger Needham in 1958. Spärck Jones's mother, Ida Spärck, had fled Norway on one of the last boats out after the German invasion in April 1940, going on to serve the Norwegian government in exile in London throughout the war. This background of displacement and resilience shaped the household in which Spärck Jones grew up. She later kept her mother's Norwegian surname professionally after marrying, stating that "it maintains a permanent existence of your own." Spärck Jones described her entry into computing as almost accidental. She had been working as a schoolteacher when she began visiting the CLRU out of curiosity about her husband's work. It was Margaret Masterman — whom she later described as "a very strange and interesting woman" — who offered her a research position and drew her fully into the field. == Career == Spärck Jones worked at the Cambridge Language Research Unit from the late 1950s, then at Cambridge University Computer Laboratory from 1974 until her retirement in 2002. From 1999, she held the post of Professor of Computers and Information. She had been given a permanent position only in 1993, and earlier in her career had been employed on a series of short-term contracts. She continued to work in the Computer Laboratory until shortly before her death. Her publications include nine books and numerous papers. A full list of her publications is available from the Cambridge Computer Laboratory. Spärck Jones' main research interests, since the late 1950s, were natural language processing and information retrieval. In 1964, Spärck Jones published "Synonymy and Semantic Classification", which is now seen as a foundational paper in the field of natural language processing. One of her most important contributions was the concept of inverse document frequency (IDF) weighting in information retrieval, which she introduced in a 1972 paper. IDF is used in most search engines today, usually as part of the term frequency–inverse document frequency (TF–IDF) weighting scheme. In the 1980s, Spärck Jones began her work on early speech recognition systems. In 1982 she became involved in the Alvey Programme which was an initiative to motivate more computer science research across the country. == Significance of inverse document frequency == At the time Spärck Jones was working, most computer scientists were focused on making people adapt to machines — learning precise codes and commands to retrieve information. Spärck Jones was working in the opposite direction: teaching computers to understand human language as it is actually used. Her 1972 paper introduced the concept of inverse document frequency (IDF) by observing that not all words carry equal informational value. A word like "the" appears in virtually every document and tells a retrieval system almost nothing about what any specific document is about. A rare word like "photosynthesis," by contrast, is highly specific and informative. IDF assigns each word a statistical weight based on how rarely it occurs across a document collection — the rarer the word, the higher its weight. When combined with term frequency (TF), which measures how often a word appears within a single document, the resulting TF–IDF score gives every word a relevance rating that can be used to rank documents in response to a search query. By 2007, Spärck Jones noted that "pretty much every web engine uses those principles." Her colleague John Tait remarked that "a lot of the stuff she was working on until five or ten years ago seemed like mad nonsense, and now we take it for granted." The 1972 paper remains among the most cited works in information retrieval research, with over 4,500 citations recorded in Google Scholar at the time of her death. The conceptual foundation of TF–IDF — that word meaning is statistical and contextual — has also informed later developments in machine learning and natural language processing, including transformer-based language models such as BERT. == Impact on artificial intelligence == Even though Spärck Jones' views on artificial intelligence (AI) were rather pessimistic in regard to the perceived limitations of AI in information retrieval, her work in natural language processing, information retrieval, and introducing the concept of inverse document frequency (IDF) contributed to the future technological development of AI. Her statistical and ranking methods shifted the direction of the development of AI towards being more expandable and led by data. Her work had a more indirect and conceptual impact on AI, compared to the current and direct impact it has had on search engines. == Gender and advocacy == Spärck Jones spent the majority of her career at Cambridge on short-term contracts without permanent employment, a situation she attributed directly to gender. In her 2001 IEEE oral history interview she stated that Cambridge was "in many ways not user-friendly, in the sense of women-friendly." She was frequently the only woman present in professional meetings throughout her career. She channelled this experience into active advocacy. She was a founding member of the women@cl network at Cambridge's Computer Laboratory, worked on outreach programmes aimed at encouraging girls into computing, and became widely known for her slogan: "Computing is too important to be left to men." She was the first woman ever to receive the BCS Lovelace Medal. === Honours and awards === These include: Gerard Salton Award (1988) Elected a Fellow of Association for the Advancement of Artificial Intelligence (AAAI) in 1993 President of the Association for Computational Linguistics (ACL) in 1994 Honorary degree of Doctor of Science from The City University in 1997. Elected a Fellow of the British Academy (FBA), where she also served as Vice-President in 2000–2002 Fellow of European Association for Artificial Intelligence (ECCAI) Association for Information Science and Technology (ASIS&T) Award of Merit (2002) Association for Computational Linguistics (ACL) Lifetime Achievement Award (2004) ACM - AAAI Allen Newell Award (2006) BCS Lovelace Medal (2007) Association for Computing Machinery (ACM) Women's Group Athena Award (2007) == Death and legacy == Spärck Jones died on 4 April 2007, due to cancer at the age of 71. In 2008, the BCS Information Retrieval Specialist Group (BCS IRSG) in conjunction with the British Computer Society established an annual Karen Spärck Jones Award in her honour, to encourage and promote research that advances understanding of Natural Language Processing or Information Retrieval. The Karen Spärck Jones lecture sponsored by BCS recognises the contribution that women have made to computing. In August 2017, the University of Huddersfield renamed one of its campus buildings in her honour. Formerly known as Canalside West, the Spärck Jones building houses the University's School of Computing and Engineering. When Spärck Jones died in 2007, The Times did not publish an obituary for her, despite having published one for her husband Roger Needham in 2003. In 2019, The New York Times included her in its Overlooked series under the title "Ove

    Read more →
  • Color image pipeline

    Color image pipeline

    An image pipeline or video pipeline is the set of components commonly used between an image source (such as a camera, a scanner, or the rendering engine in a computer game), and an image renderer (such as a television set, a computer screen, a computer printer or cinema screen), or for performing any intermediate digital image processing consisting of two or more separate processing blocks. An image/video pipeline may be implemented as computer software, in a digital signal processor, on an FPGA, or as fixed-function ASIC. In addition, analog circuits can be used to do many of the same functions. Typical components include image sensor corrections (including debayering or applying a Bayer filter), noise reduction, image scaling, gamma correction, image enhancement, colorspace conversion (between formats such as RGB, YUV or YCbCr), chroma subsampling, framerate conversion, image compression/video compression (such as JPEG), and computer data storage/data transmission. Typical goals of an imaging pipeline may be perceptually pleasing end-results, colorimetric precision, a high degree of flexibility, low cost/low CPU utilization/long battery life, or reduction in bandwidth/file size. Some functions may be algorithmically linear. Mathematically, those elements can be connected in any order without changing the end-result. As digital computers use a finite approximation to numerical computing, this is in practice not true. Other elements may be non-linear or time-variant. For both cases, there is often one or a few sequences of components that makes sense for optimum precision and minimum hardware-cost/CPU-load.

    Read more →
  • Corpus language

    Corpus language

    A corpus language is a language that has no living speakers but for which numerous records produced by its native speakers survive. Examples of corpus languages are Ancient Greek, Latin, the Egyptian language, Old English, Old Norse, Elamite, and Sanskrit. Some corpus languages, such as Ancient Greek and Latin, left very large corpora and therefore can be fully reconstructed, even though some details of pronunciation may be unclear. Such languages can be used even today, as is the case with Sanskrit and Latin. Other languages have such limited corpora that some important words—e.g., some pronouns—are lacking in the corpora. Examples of these are Ugaritic and Gothic. Languages attested only by a few words, often names, and a few phrases, are called Trümmersprache (literally "rubble languages") in German linguistics. These can be reconstructed only in a very limited way, and often their genetic relationship to other languages remains unclear. Examples are Dalmatian, Etruscan, also known as Rasenna, Dadanitic, a Semitic language that may be close to classical Arabic, Lombardic, Burgundian, Vandalic, and Oscan, Umbrian, and Faliscan, all Italic languages that were related to Latin. Corpus languages are studied using the methods of corpus linguistics, but corpus linguistics can also be used (and is commonly used) for the study of the writings and other records of living languages. Not all extinct languages are corpus languages, since there are many extinct languages in which few or no writings or other records survive, as is the case in the vast majority of languages that have ever existed.

    Read more →
  • David Horn (Israeli physicist)

    David Horn (Israeli physicist)

    David Horn (Hebrew: דוד הורן; born 10 September 1937) is a Professor (Emeritus) of Physics in the School of Physics and Astronomy at Tel Aviv University (TAU), Israel. He has served as Vice-Rector of TAU, Chairman of the School of Physics and Astronomy and as Dean of the Faculty of Exact Sciences in TAU. He is a fellow of the American Physical Society, nominated for "contributions to theoretical particle physics, including the seminal work on finite energy sum rules, research of the phenomenology of hadronic processes, and investigation of Hamiltonian lattice theories". == Early life and education == David Horn was born and educated in Haifa. He graduated from the Reali School in 1955. He began his academic studies in Physics at the Technion in Haifa in 1957, and received his B.Sc. (Summa Cum Laude) in 1961, and M.Sc. in 1962. He continued his Ph.D. studies at the Hebrew University of Jerusalem until 1965. His thesis on "Some Aspects of the Structure of Weak Interactions" was supervised by Prof. Yuval Ne'eman. == Career == Horn joined the newly founded Tel Aviv University as an assistant in 1962. He became a lecturer in 1965, a senior lecturer in 1967 and an associate professor in 1968. He was promoted to full professor of Physics in 1972. In 1974 he became the incumbent of the Edouard and Francoise Jaupart Chair of Theoretical Physics of Particles and Fields, a position he held until 2007. Horn has supervised 43 graduate students at TAU and authored over 240 scientific publications. He retired as a professor emeritus in 2005, and continues to be an active researcher. Horn spent a significant part of his career holding visiting academic positions at other universities and research institutes, including: Postdoctoral Fellow at Argonne National Lab, ILL, Research Fellow and three times Visiting Associate at California Institute of Technology, Pasadena, CA, Visitor at CERN in Geneva, Visiting Professor at Cornell University, NY, Member of the Institute for Advanced Study, Princeton, NJ, Visiting Professor at SLAC in Stanford University, CA, and Visiting Professor at Kyoto University, Japan. Beginning from 1980, Horn held official positions at Tel Aviv University, starting with tenure as Vice-Rector (1980-1983), a position he left for research at SLAC. After returning he was nominated Chairman of the Department of High Energy Physics (1984-1986), followed by tenures as Chairman of the School of Physics and Astronomy (1986-9), Dean of the Raymond and Beverly Sackler Faculty of Exact Sciences (1990-1995), and first Director of the Adams Super Center for Brain Studies (1993-2000). Horn has also held national and international professional positions. He was Chairman of the Israel Commission for High Energy Physics (1983-2003), and, in this capacity, served as an Israeli observer of the council of CERN (1991-2003). He served as member of the Israel Council for Higher Education (1987-1991), member of the Executive Committee of the European Physical Society (1989-1992) and member of the European Strategy Forum on Research Infrastructures (2005-2017). He chaired the Israeli Committee of Research Infrastructures (2012-2016), issuing roadmaps for scientific RI in 2013 and 2016. == Research == Horn's research work focused on theory and phenomenology of High Energy Physics until 1990. He then shifted his interests to Neural Computation and Machine Learning and, since 2005, he has also published in Bioinformatics. Together with Richard Dolen and Christoph Schmid he discovered the Finite Energy Sum Rules in 1967. It was a realization of the bootstrap approach to hadronic structure, and became known as the Dolen-Horn-Schmid Duality. Together with Richard Silver he investigated a model of coherent production of pions at high energy hadron collisions in 1971, and together with Jeffrey Mandula he undertook the investigation of mesons with constituent gluons in 1978. Moving to lattice gauge theories in 1979, he discovered, together with Shimon Yankielowic and Marvin Weinstein, a non-confining phase in Z(N) theories for large N. In 1981 he demonstrated the existence of finite matrix models with link gauge fields, nowadays known as quantum link models. In 1984 Horn and Weinstein developed the t-expansion methodology. Horn's contributions to neural modeling include a novel mechanism for memory maintenance via neuronal regulation in 1998, developed with Nir Levy and Eytan Ruppin and unsupervised learning of natural languages in 2005, a joint work with Zach Solan, Eytan Ruppin and Shimon Edelman, introducing novel algorithms for motif and grammar extraction from text. Horn has contributed to algorithms of clustering, an important topic in Machine Learning, by developing Support Vector Clustering (SVC) in 2001, together with Asa Ben Hur, Hava Siegelmann and Vladimir Vapnik. This was followed shortly thereafter by a joint work with Assaf Gottlieb on Quantum Clustering (QC). His contributions to Bioinformatics include motif descriptions of function and structure of proteins, as well as motif studies of genomic structures. Together with Erez Persi he studied compositional order of proteomes, and repeat instability of genomes, as evolution markers of organisms and of cancer (a joint work with Persi and others). == Honors == Horn is a Fellow of the American Physical Society (1985) and a Fellow of the Israel Physical Society (2018). == Publications == === Selected articles === R. Dolen, D. Horn and C. Schmid; Prediction of Regge-parameters of rho poles from low-energy pi-N scattering data Phys. Rev. Lett. 19 (1967) 402–407. Finite-Energy Sum Rules and Their Application to pi-N Charge Exchange Phys. Rev. 166 (1968) 1768–1781. D. Horn and R. Silver: Coherent production of pions, Annals Phys. 66 (1971) 509-541 T. Banks, D. Horn and H. Neuberger: Bosonization of the SU(N) Thirring Models, Nucl. Phys. B108, 119 (1976). D. Horn and J. Mandula: Model of Mesons with Constituent Gluons, Phys. Rev. D17, 898 (1978). D. Horn, M. Weinstein and S. Yankielowicz: Hamiltonian Approach to Z(N) Lattice Gauge Theories, Phys. Rev. D19, 3715 (1979). D. Horn: Finite Matrix Models with Continuous Local Gauge Invariance, Phys. Lett. 100B, 149-151 (1981). T. Banks, Y. Dothan and D. Horn: Geometric Fermions, Phys. Lett. 117B, 413 (1982). D. Horn and M. Weinstein: The t expansion: A nonperturbative analytic tool for Hamiltonian systems. Phys. Rev. D 30, 1256-1270 (1984). Ury Naftaly, Nathan Intrator and David Horn: Optimal Ensemble Averaging of Neural Networks. Network, Computation in Neural Systems, 8, 283-296 (1997). David Horn, Nir Levy, Eytan Ruppin: Memory Maintenance via Neuronal Regulation, Neural Computation, 10, 1-18 (1998). Asa Ben-Hur, David Horn, Hava Siegelmann and Vladimir Vapnik: Support Vector Clustering. Journal of Machine Learning Research 2, 125-137 (2001). David Horn and Assaf Gottlieb: Algorithm for data clustering in pattern recognition problems based on quantum mechanics, Phys. Rev. Lett. 88 (2002) 18702 Zach Solan, David Horn, Eytan Ruppin and Shimon Edelman: Unsupervised learning of natural languages, Proc. Natl. Acad. Sc. 102 (2005) 11629–11634. Vered Kunik, Yasmine Meroz, Zach Solan, Ben Sandbank, Uri Weingart, Eytan Ruppin and David Horn: Functional representation of enzymes by specific peptides. PLOS Computational Biology 2007, 3(8):e167. Benny Chor, David Horn, Yaron Levy, Nick Goldman and Tim Massingham: Genomic DNA k-mer spectra: models and modalities. Genome Biology 2009, 10(10):R108 Erez Persi and David Horn. Systematic Analysis of Compositional Order of Proteins Reveals New Characteristics of Biological Functions and a Universal Correlate of Macroevolution. PLoS Comput Biol 9 (2013): e1003346. David Horn. Taxa counting using Specific Peptides of Aminoacyl tRNA Synthetases Encyclopedia of Metagenomics, Springer, 2013. Sagi Shporer, Benny Chor, Saharon Rosset, David Horn. Inversion symmetry of DNA k-mer counts: validity and deviations. BMC Genomics 2016, 17:696 Erez Persi, Davide Prandi, Yuri I. Wolf, Yair Pozniak, Christopher Barbieri, Paola Gasperini, Himisha Beltran, Bishoy M. Faltas, Mark A. Rubin, Tamar Geiger, Eugene V. Koonin, Francesca Demichelis, David Horn. Proteomic and Genomic Signatures of Repeat Instability in Cancer and Adjacent Normal Tissues. PNAS 116, 34, 2019 - 08790 === Book === David Horn and Fredrick Zachariasen: Hadron Physics at Very High Energies. Benjamin 1973. === Patents === Method and Apparatus for Quantum Clustering. USA Patent No. 7,653,646 B2. Method for discovering relationships in data by dynamic quantum clustering USA Patent No 8874412 and USA Patent No. 9,646,074. == Personal life == Horn was married to Nira Fuss since 1963 until her death in 2019. He is a father of three, Yuval, Tamar, and Oded, and grandfather of nine. He lives in Tel Aviv, Israel.

    Read more →
  • Top 10 AI Humanizers Compared (2026)

    Top 10 AI Humanizers Compared (2026)

    Looking for the best AI humanizer? An AI humanizer is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI humanizer slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Tinybop

    Tinybop

    Tinybop is a Brooklyn based publisher of apps for children. == History == Tinybop is a Brooklyn-based children's media company established in 2011 by Raul Gutierrez. App titles are released in two series: the Explorer's Library - a series of science apps and Digital Toys - series of open-ended construction apps. == Published apps == Explorer's Library Titles: The Human Body – An anatomy app for children. Released 2013. The company's first app was illustrated by Kelli Anderson and has been downloaded millions of times. Selected for the American Library Association's Notable Children's Media List in 2022. Named Apple App Store's Best of 2013. Winner of the Digital Ehon Yuichi Kimura Prize for Children's Digital Media. Plants – An app about biomes around the world. Homes – An app about houses around with world. Illustrated by Tuesday Bassen. Winner of the Parents Gold Choice Award for children's apps. Simple Machines – A children's physics app about simple machines. The Earth – An app for children about the geologic Earth illustrated by Sarah Jacoby. Weather – A children's weather app. Skyscrapers – A children's app about building tall buildings. Space – An interactive solar system. Mammals – A children's app about mammals illustrated by Wenjia Tang. Winner of the Digital Ehon Award for Children's Educational media. Coral Reef – An app about marine ecosystems. Winner of an Excellence in Early Learning Digital Media Honor from the American Library Association. State of Matter – An app covering solids, liquids, and gases. Winner of Excellence in Early Learning Digital Media Honor from the American Library Association. Light and Color – An app about light and color. Selected for The American Library Association's Notable Children's Media List 2023. Winner of the 2022 Yoichi Sakakihara Prize for Children's Media. Digital Toys Titles: The Robot Factory – A robot building app for children illustrated by Owen Davey. Apple named The Robot Factory as iPad App of the Year in 2015. The Everything Machine – A visual coding app for children. The Everything Machine was named Apple's Best of 2015. Monsters – A monster creation app illustrated by Tianhua Mao. The Infinite Arcade – An arcade game building app. Me: A Kids Diary – A digital journal for children. Selected for The American Library Association's Notable Children's Media List 2020. The Creature Garden – An app that allows children to create fantastical animals illustrated by Natasha Durley. Selected for The American Library Association's Notable Children's Media List 2021. Things that Go Bump – A multiplayer game set in an enchanted Japanese house, released on Apple Arcade in 2018.

    Read more →
  • Dan Roth

    Dan Roth

    Dan Roth (Hebrew: דן רוט) is the Eduardo D. Glandt Distinguished Professor of Computer and Information Science at the University of Pennsylvania and the Chief AI Scientist at Oracle. Until June 2024 Roth was a VP and distinguished scientist at AWS AI. In his role at AWS, Roth led over the last three years the scientific effort behind the first-generation Generative AI products from AWS, including Titan Models, Amazon Q efforts, and Bedrock, from inception until they became generally available. Roth got his B.A. summa cum laude in mathematics from the Technion, Israel, and his Ph.D. in computer science from Harvard University in 1995. He taught at the University of Illinois at Urbana-Champaign from 1998 to 2017 before moving to the University of Pennsylvania. == Professional career == Roth is a Fellow of the American Association for the Advancement of Science (AAAS), the Association for Computing Machinery (ACM), the Association for the Advancement of Artificial Intelligence (AAAI), and the Association of Computational Linguistics (ACL). Roth’s research focuses on the computational foundations of intelligent behavior. He develops theories and systems pertaining to intelligent behavior using a unified methodology, at the heart of which is the idea that learning has a central role in intelligence. His work centers around the study of machine learning and inference methods to facilitate natural language understanding. In doing that he has pursued several interrelated lines of work that span multiple aspects of this problem - from fundamental questions in learning and inference and how they interact, to the study of a range of natural language processing (NLP) problems and developing advanced machine learning based tools for natural language applications. Roth has made seminal contribution to the fusion of Learning and Reasoning, Machine Learning with weak, incidental supervision, and to machine learning and inference approaches to natural language understanding. He has written the first paper on zero-shot learning in natural language processing, a 2008 paper by Chang, Ratinov, Roth, and Srikumar that was published at AAAI’08, but the name given to the learning paradigm there was dataless classification. Roth has worked on probabilistic reasoning (including its complexity and probabilistic lifted inference ), Constrained Conditional Models (ILP formulations of NLP problems) and constraints-driven learning, part-based (constellation) methods in object recognition, response based Learning, He has developed NLP and Information extraction tools that are being used broadly by researchers and commercially, including NER, coreference resolution, wikification, SRL, and ESL text correction. Roth is a co-founder of NexLP, Inc., a startup that applies natural language processing and machine learning in the legal and compliance domains. In 2020, NexLP was acquired by Reveal, Inc., an e-discovery software company. He is currently on the scientific advisory board of the Allen Institute for AI.

    Read more →
  • Chris Callison-Burch

    Chris Callison-Burch

    Chris Callison-Burch is an American computer scientist and professor of computer and information science at the University of Pennsylvania (Penn), specializing in natural language processing (NLP), artificial intelligence (AI), and crowdsourcing. He is recognised for his contributions to machine translation, paraphrase generation, and the application of large language models (LLMs) to AI challenges, with over 200 publications cited more than 33,000 times. Callison-Burch has influenced public policy on AI and copyright, testifying before the U.S. Congress in 2023 on generative AI’s implications. He serves as the faculty director for Penn’s Online Master of Science in Engineering in AI program. == Education == Callison-Burch earned his PhD in Computer Science from the University of Edinburgh in 2008, focusing on machine translation and paraphrasing techniques. His doctoral research developed statistical methods for generating paraphrases in machine translation systems, laying the foundation for his later NLP work. Prior to his PhD, he studied at Stanford University, where he developed an interest in computational linguistics. == Career == After his PhD, Callison-Burch joined the Centre for Language and Speech Processing at Johns Hopkins University as a research faculty member from 2008 to 2013, working on NLP projects, including machine translation and crowdsourcing for creating training data. In 2013, he joined the University of Pennsylvania as an assistant professor in the Department of Computer and Information Science and was promoted to associate professor in 2017, and to full professor in 2024. At Penn, Callison-Burch teaches courses on AI and NLP, including CIS 5300 (Natural Language Processing) and CIS 5210 (Artificial Intelligence), which attract over 500 students annually. He directs Penn’s Online Master of Science in Engineering in AI program, launched in 2025. He teaches AI and NLP courses on Coursera, reaching thousands of global learners. Callison-Burch was a part-time visiting researcher at Google in 2019 and 2020, where he collaborated on applying Google's LLM to Dungeons & Dragons dialogues. In 2023, he took a sabbatical at the Allen Institute for AI (AI2), where he contributed to vision-language models. == Research == Callison-Burch’s research focuses on NLP, AI, and crowdsourcing, with significant contributions to machine translation, paraphrase generation, and LLMs for tasks like text simplification and bias detection. His early work developed crowdsourcing methods for machine translation, leveraging non-expert annotators for paraphrase-based evaluation, influencing platforms like Amazon Mechanical Turk. Recent projects have included several notable works. Molmo and PixMo (2025) are open-weight vision-language models developed with AI2, achieving state-of-the-art multimodal performance and earning a Best Paper Honourable Mention at CVPR 2025. Also in 2025, his work on Calibrating Large Language Models with Sample Consistency improves LLM reliability via sample-based calibration, presented at NAACL 2025. The Media Bias Detector (2025) is a real-time tool analysing selection and framing bias in news, using LLMs to detect persuasive language differences (e.g., Russian vs. English Wikipedia). Holodeck (2024) is a language-guided system for generating 3D embodied AI environments, presented at CVPR 2024. BORDIRLINES (2024) is a dataset for cross-lingual retrieval-augmented generation, focusing on culturally sensitive tasks. He has co-authored over 200 publications, featured at conferences like ACL, EMNLP, and CVPR. == Awards and recognition == Callison-Burch has received numerous awards: Best Paper Honourable Mention at CVPR 2025 for "Molmo and PixMo". Best Paper Award at the Workshop on Cognitive Modelling and Computational Linguistics (CMCL) 2024 for "Evaluating Vision-Language Models on Bistable Images". Best Paper Award at STARSEM 2016 for "So-Called Non-Subsective Adjectives". Best Paper Award at the Workshop on Sense, Concept and Entity Representations 2017 for "Word Sense Filtering Improves Embedding-Based Lexical Substitution". Honourable Mention Award at CHI 2018 for "A Data-Driven Analysis of Workers’ Earnings on Amazon Mechanical Turk". Google Faculty Research Award (2013) for crowdsourcing in NLP. Sloan Research Fellowship (2014). He has received research funding from Google, Microsoft, Amazon, Facebook, Roblox, DARPA, IARPA, and NSF. His h-index is 72, with over 33,000 citations. He served as General Chair of ACL 2017 and as the Program Co-Chair EMNLP 2015. == Public policy and testimony == On May 17, 2023, Callison-Burch testified before the U.S. House Subcommittee on Courts, Intellectual Property, and the Internet on AI and copyright law. His testimony emphasised generative AI’s role in creative industries and the need for balanced copyright frameworks. He has appeared on Fox News to discuss AI’s societal impact, and discussed its impact with other print news sources. He contributes to AI ethics discussions, including workshops on AI’s effects on writing and creative professions.

    Read more →