Hartmut Neven

Hartmut Neven

Hartmut Neven (born 1964) is a German American scientist working in quantum computing, computer vision, robotics and computational neuroscience. He is best known for his work in face and object recognition and his contributions to quantum machine learning. He is currently Vice President of Engineering at Google where he leads the Quantum Artificial Intelligence Lab, which he founded in 2012. == Education == Hartmut Neven studied Physics and Economics in Brazil, Köln, Paris, Tübingen and Jerusalem. He wrote his Master thesis on a neuronal model of object recognition at the Max Planck Institute for Biological Cybernetics under Valentino Braitenberg. In 1996 he received his Ph.D. in Physics from the Institute for Neuroinformatics at the Ruhr University in Bochum, Germany, for a thesis on "Dynamics for vision-guided autonomous mobile robots" written under the tutelage of Christoph von der Malsburg. He received a scholarship from the Studienstiftung des Deutschen Volkes, Germany's most prestigious scholarship foundation. == Work == In 1998 Neven became research professor of computer science at the University of Southern California at the Laboratory for Biological and Computational Vision. In 2003 he returned as the head of the Laboratory for Human-Machine Interfaces at USC's Information Sciences Institute. === Face recognition, avatars and face filters === Neven co-founded two companies, Eyematic for which he served as CTO and Neven Vision which he initially led as CEO. At Eyematic he developed face recognition technology and real-time facial feature analysis for avatar animation. Teams led by Neven have repeatedly won top scores in government sponsored tests designed to determine the most accurate face recognition software. Face filters, now ubiquitous on mobile phones, were launched for the first time by Neven Vision on the networks of NTT DoCoMo and Vodafone Japan in 2003. Neven Vision also pioneered mobile visual search for camera phones. Neven Vision was acquired by Google in 2006. === Object recognition and adversarial images === At Google he managed teams responsible for advancing Google's visual search technologies. His team launched Google Goggles now Google Lens. The concept of adversarial patterns originated in his group when he tasked Christian Szegedy with a project to modify the pixel inputs of a deep neural network to lower the activity of select output nodes. The motivation was to use this technique for object localization which did not work out. But the idea gave rise to the fields of adversarial learning and DeepDream art. In 2013 his optical character recognition team won the ICDAR Robust Reading Competition by a wide margin and in 2014 the object recognition team won the ImageNet challenge. === Google Glass === Neven was a co-founder of the Google Glass project. His team completed the first prototype, codenamed Ant, in 2011. === Quantum Artificial Intelligence === In 2006 Neven started to explore the application of quantum computing to hard combinatorial problems arising in machine learning. In collaboration with D-Wave Systems he developed the first image recognition system based on quantum algorithms. It was demonstrated at SuperComputing07. At NIPS 2009 his team demonstrated the first binary classifier trained on a quantum processor. In 2012 together with Pete Worden at NASA Ames he founded the Quantum Artificial Intelligence Laboratory. In 2014 he invited John M. Martinis and his group at UC Santa Barbara to join the lab to start a fabrication facility for superconducting quantum processors. The Quantum Artificial Intelligence team performed the first experimental demonstration of a scalable simulation of a molecule. In 2016 the team formulated an experiment to demonstrate quantum supremacy. Quantum supremacy was then declared by Google in October 2019. In 2023 Quantum AI researchers demonstrated that quantum error correction works in practice by showing for the first time that the error of a logical qubit decreases when increasing the number of physical qubits it is composed of. Google's quantum processors have been used to study the physics of quantum many body states that otherwise are challenging to prepare in a laboratory such as time crystals, traversable wormholes and non-Abelian anyons. ==== Neven's law ==== Neven's law states that the performance of quantum computers improves at a doubly exponential rate.

Grammar induction

Grammar induction (or grammatical inference) is the process in machine learning of learning a formal grammar (usually as a collection of re-write rules or productions or alternatively as a finite-state machine or automaton of some kind) from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. More generally, grammatical inference is that branch of machine learning where the instance space consists of discrete combinatorial objects such as strings, trees and graphs. == Grammar classes == Grammatical inference has often been very focused on the problem of learning finite-state machines of various types (see the article Induction of regular languages for details on these approaches), since there have been efficient algorithms for this problem since the 1980s. Since the beginning of the century, these approaches have been extended to the problem of inference of context-free grammars and richer formalisms, such as multiple context-free grammars and parallel multiple context-free grammars. Other classes of grammars for which grammatical inference has been studied are combinatory categorial grammars, stochastic context-free grammars, contextual grammars and pattern languages. == Learning models == The simplest form of learning is where the learning algorithm merely receives a set of examples drawn from the language in question: the aim is to learn the language from examples of it (and, rarely, from counter-examples, that is, example that do not belong to the language). However, other learning models have been studied. One frequently studied alternative is the case where the learner can ask membership queries as in the exact query learning model or minimally adequate teacher model introduced by Angluin. == Methodologies == There is a wide variety of methods for grammatical inference. Two of the classic sources are Fu (1977) and Fu (1982). Duda, Hart & Stork (2001) also devote a brief section to the problem, and cite a number of references. The basic trial-and-error method they present is discussed below. For approaches to infer subclasses of regular languages in particular, see Induction of regular languages. A more recent textbook is de la Higuera (2010), which covers the theory of grammatical inference of regular languages and finite state automata. D'Ulizia, Ferri and Grifoni provide a survey that explores grammatical inference methods for natural languages. === Induction of probabilistic grammars === There are several methods for induction of probabilistic context-free grammars. === Grammatical inference by trial-and-error === The method proposed in Section 8.7 of Duda, Hart & Stork (2001) suggests successively guessing grammar rules (productions) and testing them against positive and negative observations. The rule set is expanded so as to be able to generate each positive example, but if a given rule set also generates a negative example, it must be discarded. This particular approach can be characterized as "hypothesis testing" and bears some similarity to Mitchel's version space algorithm. The Duda, Hart & Stork (2001) text provide a simple example which nicely illustrates the process, but the feasibility of such an unguided trial-and-error approach for more substantial problems is dubious. === Grammatical inference by genetic algorithms === Grammatical induction using evolutionary algorithms is the process of evolving a representation of the grammar of a target language through some evolutionary process. Formal grammars can easily be represented as tree structures of production rules that can be subjected to evolutionary operators. Algorithms of this sort stem from the genetic programming paradigm pioneered by John Koza. Other early work on simple formal languages used the binary string representation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNF language made trees a more flexible approach. Koza represented Lisp programs as trees. He was able to find analogues to the genetic operators within the standard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of genetic crossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness is measured by scoring the output from the functions of the Lisp code. Similar analogues between the tree structured lisp representation and the representation of grammars as trees, made the application of genetic programming techniques possible for grammar induction. In the case of grammar induction, the transplantation of sub-trees corresponds to the swapping of production rules that enable the parsing of phrases from some language. The fitness operator for the grammar is based upon some measure of how well it performed in parsing some group of sentences from the target language. In a tree representation of a grammar, a terminal symbol of a production rule corresponds to a leaf node of the tree. Its parent nodes corresponds to a non-terminal symbol (e.g. a noun phrase or a verb phrase) in the rule set. Ultimately, the root node might correspond to a sentence non-terminal. === Grammatical inference by greedy algorithms === Like all greedy algorithms, greedy grammar inference algorithms make, in iterative manner, decisions that seem to be the best at that stage. The decisions made usually deal with things like the creation of new rules, the removal of existing rules, the choice of a rule to be applied or the merging of some existing rules. Because there are several ways to define 'the stage' and 'the best', there are also several greedy grammar inference algorithms. These context-free grammar generating algorithms make the decision after every read symbol: Lempel-Ziv-Welch algorithm creates a context-free grammar in a deterministic way such that it is necessary to store only the start rule of the generated grammar. Sequitur and its modifications. These context-free grammar generating algorithms first read the whole given symbol-sequence and then start to make decisions: Byte pair encoding and its optimizations. === Distributional learning === A more recent approach is based on distributional learning. Algorithms using these approaches have been applied to learning context-free grammars and mildly context-sensitive languages and have been proven to be correct and efficient for large subclasses of these grammars. === Learning of pattern languages === Angluin defines a pattern to be "a string of constant symbols from Σ and variable symbols from a disjoint set". The language of such a pattern is the set of all its nonempty ground instances i.e. all strings resulting from consistent replacement of its variable symbols by nonempty strings of constant symbols. A pattern is called descriptive for a finite input set of strings if its language is minimal (with respect to set inclusion) among all pattern languages subsuming the input set. Angluin gives a polynomial algorithm to compute, for a given input string set, all descriptive patterns in one variable x. To this end, she builds an automaton representing all possibly relevant patterns; using sophisticated arguments about word lengths, which rely on x being the only variable, the state count can be drastically reduced. Erlebach et al. give a more efficient version of Angluin's pattern learning algorithm, as well as a parallelized version. Arimura et al. show that a language class obtained from limited unions of patterns can be learned in polynomial time. === Pattern theory === Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithms and machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language. In addition to the new algebraic vocabulary, its statistical approach was novel in its aim to: Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was commonplace at the time. Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbs-like graph. Study the randomness and variability of these graphs. Create the basic classes of stochastic models applied by listing the deformations of the patterns. Synthesize (sample) from the models, not just analyze signals with it. Broad in its mathematical coverage, pattern theory spans algebra and statistics, as well as local topological and global entropic properties. == Applications == The principle of grammar induction has been applied to other aspects of natural language processing, and has been applied (among many other problems) to semantic parsing, natural language understanding, example-based translation, language acquisition, grammar-based compre

Guideline execution engine

A guideline execution engine is a computer program which can interpret a clinical guideline represented in a computerized format and perform actions towards the user of an electronic medical record. A guideline execution engine needs to communicate with a host clinical information system. Virtual Medical Record (vMR) is one possible interface which can be used. The engine's main function is to manage instances of executed guidelines of individual patients. == Architecture == The following modules are generally needed for any engine: interface to clinical information system new guidelines loading module guideline interpreter module clinical events parser alert/recommendations dispatch == Guideline Interchange Format == The Guideline Interchange Format (GLIF) is a computer representation format for clinical guidelines. Represented guidelines can be executed using a guideline execution engine. The format has several versions as it has been improved. In 2003 GLIF3 was introduced. == Use of third party workflow engine as a guideline execution engine == Some commercial electronic health record systems use a workflow engine to execute clinical guidelines. RetroGuide and HealthFlow are examples of such an approach.

POSC Caesar

POSC Caesar Association (PCA) is an international, open and not-for-profit, member organization that promotes the development of open specifications to be used as standards for enabling the interoperability of data, software and related matters. PCA is the initiator of ISO 15926 "Integration of life-cycle data for process plants including oil and gas production facilities" and is committed to its maintenance and enhancement. Nils Sandsmark has been the General Manager of POSC Caesar Association since 1999 and Thore Langeland, Norwegian Oil Industry Association (Norwegian: Oljeindustriens Landsforening, OLF), is the chairman of the board. == History == === Caesar Offshore === The first predecessor of POSC Caesar Association, the Caesar Offshore program, started in 1993. The original focus was on standardizing technical data definitions for capital intensive projects at the handover from the EPC contractor to the owner/operators of onshore and offshore oil and gas production facilities. The program was sponsored by The Research Council of Norway, two EPC contractors (Aker Maritime and Kværner), three owners/operators (Norsk Hydro, Saga Petroleum and Statoil) and DNV as service provider and project owner. === POSC Caesar project === During the period 1994–96, Caesar Offshore Program was defined as a project of Petrotechnical Open Software Corporation (POSC) (now Energistics), and changed its name to the POSC Caesar Project. In 1995 the project was joined by BP, Brown and Root and Elf Aquitaine and in 1997 by Intergraph, IBM, Oracle, Lloyd's, Shell, ABB and UMOE Technologies. During that time, POSC Caesar also became a member of European Process Industries STEP Technical Liaison Executive (EPISTLE) where it collaborates with PISTEP (UK), and USPI-NL (The Netherlands) on the development of ISO 10303, also known as "Standard for the Exchange of Product model data (STEP)". === POSC Caesar Association === In 1997, POSC Caesar Association was founded as an independent, global, non-profit, member organization. POSC Caesar Association serves an international membership and collaborates with other international organizations. It has its main office in Norway. Albeit the name of POSC Caesar Association still hints to its past as a project within the Petrotechnical Open Software Corporation (POSC) (now Energistics), from 1997 onwards, the organization has been independent. Energistics and POSC Caesar Association do collaborate, and are formally member in each other's organization. == Membership == POSC Caesar Association has with its current 36 members from around the world and has established an international footprint (with a strong membership in Norway) that includes a variety of backgrounds, from academia and solution providers to engineering contractors and owners/operators. The members are (subdivided by organization type): Associations: Energistics (USA) and The Norwegian Oil Industry Association (OLF, Norway); Universities and Research Institutes: International Research Institute of Stavanger (IRIS, Norway), Norwegian University of Science and Technology (NTNU, Norway), Korea Advanced Institute of Science and Technology (KAIST, Korea), SINTEF (Norway), University of Bergen (Norway), University of Oslo (Norway), University of Stavanger (Norway), University of Tromsø (Norway) and Western Norway Research Institute (Norway); Oil and Gas Companies: BP (UK), Petronas (Malaysia) and Statoil (Norway); Engineering contractors and consultants: Akvaplan-niva (Norway), Aker Solutions (Norway), Asset Life Cycle Information Management (ALCIM, Malaysia), CAESAR systems (USA), Bechtel (USA), Det Norske Veritas (DNV, Norway), Information Logic (USA) and iXIT Engineering Technology (Germany), Phusion IM Ltd (UK); Solution providers: Aveva (UK), Bentley Systems (USA), Jotne EPM Technology (Norway), Epsis (Norway), Eurostep (Sweden), International Business Machines Corporation (IBM, USA), Siemens - Comos Industry Solutions (before Innotec) (Germany), Intergraph (USA), Invenia (Norway), Keel Solution (Denmark), Noumenon (UK), NRX (Canada), Octaga (Norway) and Tektonisk (Norway). In general, the organization holds three membership meetings a year; one in January / February in North-America (typically USA), one in April / May in Europe (typically Norway) and one in October in Asia (typically Malaysia). == Activities and services == === Initiator and custodian of ISO 15926 === In consultation with the other EPISTLE members and the International Organization for Standardization (ISO), it was decided in 2003 (some say already in 1997) that for modeling-technical reasons it was better to discontinue the development of ISO 10303 and to initiate the development of ISO 15926 "Integration of life-cycle data for process plants including oil and gas production facilities." Over the years, the scope of the standard has increased from the initial capital-intensive projects in the upstream oil and gas industry, to include also relevant terminology for downstream oil and gas industry applications and to deal with real-time data related to the actual oil and gas production. ISO 15926 has also over the years evolved from a dictionary (a list of terms with definitions), over a taxonomy (added hierarchy) to an ontology (a formal representation of a set of concepts within a domain and the relationships between those concepts). ISO 15926 is therefore sometimes nicknamed the "Oil and Gas Ontology", for some considered to be an essential prerequisite together with Semantic Web technologies to get to better interoperability, an optimal use of all available data across boundaries and an increase in efficiency. This is what some call the next generation of Integrated Operations. === Reference data services === Placeholders: Flow scheme of WIP - RDS - ISO and role of SIGs RDS Standards in database pilot (ISO) === Special interest groups === Placeholders: Overview of SIGs Drilling and Completion Reservoir and Production Operations and Maintenance == Projects == There are a number of projects (co-)organized by POSC Caesar Association working on the extension of the ISO 15926 standard in different application areas. === Capital intensive projects application domain === The following projects are running at the moment (August 2009): The ADI Project of FIATECH, to build the tools (which will then be made available in the public domain) The IDS Project of POSC Caesar Association, to define product models required for data sheets A joint collaboration project between FIATECH POSC Caesar Association is the ADI-IDS project is the ISO 15926 WIP === Upstream oil and gas industry application domain === The following projects are currently running (August 2009): The Integrated Operations in the High North (IOHN) project is working on extending ISO 15926 to handle real-time data transmission and (pre-)processing to enable the next generation of Integrated Operations. The Environment Web project to include environmental reporting terms and definitions as used in EPIM's EnvironmentWeb in ISO 15926. Finalised projects include: The Integrated Information Platform (IIP) project working on establishing a real-time information pipeline based on open standards. It worked among others on: Daily Drilling Report (DDR) to including all terms and definitions in ISO 15926. This standard became mandatory on February 1, 2008 for reporting on the Norwegian Continental Shelf by the Norwegian Petroleum Directorate (NPD) and Safety Authority Norway (PSA). NPD says that the quality of the reports has improved considerably since. Daily Production Report (DPR) to including all terms and definitions in ISO 15926. This standard was tested successfully on the Valhall (BP-operated) and Åsgard (StatoilHydro-operated) fields offshore Norway. The terminology and XML schemata developed have also been included in Energistics’ PRODML standard. == Conferences and events == === Semantic Days === === Sogndal academic network meeting === == Collaborations == POSC Caesar is collaborating with a number of standardization bodies, including: Mimosa: collaboration on open information standards for Operations and Maintenance mainly for the downstream oil and gas industry; FIATECH: collaboration on open information standards for life cycle data of capital projects; Energistics: collaboration on information standards for the upstream oil and gas industry, including WITSML and PRODML; OASIS: collaboration on e-business standards; ISO TC184/SC4: the host of the ISO 15926 standard.

Social History and Industrial Classification

Social History and Industrial Classification (SHIC) is a classification system used by many British museums for social history and industrial collections. It was first published in 1983. == Purpose == SHIC classifies materials (books, objects, recordings etc.) by their interaction with the people who used them. For example, a carpenter's hammer is classified with other tools of the carpenter, and not with a blacksmith's hammer. In contrast other classification systems, for example the Dewey Decimal Classification, might class all hammers together and close to the classification for other percussive tools. The specialist subject network, Social History Curator's Group (SHCG), obtained funding in 2012 to develop an on-line version, now on their website http://www.shcg.org.uk/ == Scheme == Materials are classified under four major category numbers: Community life Domestic and family life Personal life Working life Further classification within a category is by the use of further numbers after the decimal point. It is permissible to assign more than one classification in cases where the object had more than one use.

TiDB

TiDB (; "Ti" stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. Designed to be MySQL compatible, it is developed and supported primarily by PingCAP and licensed under Apache 2.0. It is also available as a paid product. TiDB drew its initial design inspiration from Google's Spanner and F1 papers. == Release history == See all TiDB release notes. On December 19, 2024, TiDB 8.5 GA was released. On May 24, 2024, TiDB 8.1 GA was released. On December 1, 2023, TiDB 7.5 GA was released. On May 31, 2023, TiDB 7.1 GA was released. On April 7, 2022, TiDB 6.0 GA was released. On April 7, 2021 TiDB 5.0 GA was released. On May 28, 2020, TiDB 4.0 GA was released. On June 28, 2019, TiDB 3.0 GA was released. On April 27, 2018, TiDB 2.0 GA was released. On October 16, 2017, TiDB 1.0 GA was released. == Main features == === Horizontal scalability === TiDB can expand both SQL processing and storage capacity by adding new nodes. === MySQL compatibility === TiDB acts like it is a MySQL 8.0 server to applications. A user can continue to use all of the existing MySQL client libraries. Because TiDB's SQL processing layer is built from scratch, it is not a MySQL fork. === Distributed transactions with strong consistency === TiDB internally shards a table into small range-based chunks that are referred to as "Regions". Each Region defaults to approximately 100 MB in size, and TiDB uses a two-phase commit internally to ensure that regions are maintained in a transactionally consistent way. === Cloud native === TiDB is designed to work in the cloud. The storage layer of TiDB, called TiKV, became a Cloud Native Computing Foundation (CNCF) member project in August 2018, as a Sandbox level project, and became an incubation-level hosted project in May 2019. TiKV graduated from CNCF in September 2020. === Real-time HTAP === TiDB can support both online transaction processing (OLTP) and online analytical processing (OLAP) workloads. TiDB has two storage engines: TiKV, a rowstore, and TiFlash, a columnstore. === High availability === TiDB uses the Raft consensus algorithm to ensure that data is available and replicated throughout storage in Raft groups. In the event of failure, a Raft group will automatically elect a new leader for the failed member, and self-heal the TiDB cluster. === Vector Search === TiDB has a vector data type and vector indexes. This allows TiDB to be used as Vector database in AI Retrieval-augmented generation applications. == Deployment methods == === Kubernetes with Operator === TiDB can be deployed in a Kubernetes-enabled cloud environment by using TiDB Operator. An Operator is a method of packaging, deploying, and managing a Kubernetes application. It is designed for running stateful workloads and was first introduced by CoreOS in 2016. TiDB Operator was originally developed by PingCAP and open-sourced in August, 2018. TiDB Operator can be used to deploy TiDB on a laptop, Google Cloud Platform’s Google Kubernetes Engine, and Amazon Web Services’ Elastic Container Service for Kubernetes. === TiUP === TiDB 4.0 introduces TiUP, a cluster operation and maintenance tool. It helps users quickly install and configure a TiDB cluster with a few commands. == Tools == TiDB has a series of open-source tools built around it to help with data replication and migration for existing MySQL and MariaDB users. === TiDB Data Migration (DM) === TiDB Data Migration (DM) is suited for replicating data from already sharded MySQL or MariaDB tables to TiDB. A common use case of DM is to connect MySQL or MariaDB tables to TiDB, treating TiDB almost as a slave, then directly run analytical workloads on this TiDB cluster in near real-time. === Backup & Restore === Backup & Restore (BR) is a distributed backup and restore tool for TiDB cluster data. === Dumpling === Dumpling is a data export tool that exports data stored in TiDB or MySQL. It lets users make logical full backups or full dumps from TiDB or MySQL. === TiDB Lightning === TiDB Lightning is a tool that supports high speed full-import of a large MySQL dump into a new TiDB cluster. This tool is used to populate an initially empty TiDB cluster with much data, in order to speed up testing or production migration. The import speed improvement is achieved by parsing SQL statements into key-value pairs, then directly generate Sorted String Table (SST) files to RocksDB. === TiCDC === TiCDC is a change data capture tool which streams data from TiDB to other systems like Apache Kafka.

Imageability

Imageability is a measure of how easily a physical object, word or environment will evoke a clear mental image in the mind of any person observing it. It is used in architecture and city planning, in psycholinguistics, and in automated computer vision research. In automated image recognition, training models to connect images with concepts that have low imageability can lead to biased and harmful results. == History and components == Kevin A. Lynch first introduced the term, "imageability" in his 1960 book, The Image of the City. In the book, Lynch argues cities contain a key set of physical elements that people use to understand the environment, orient themselves inside of it, and assign it meaning. Lynch argues the five key elements that impact the imageability of a city are Paths, Edges, Districts, Nodes, and Landmarks. Paths: channels in which people travel. Examples: streets, sidewalks, trails, canals, railroads. Edges: objects that form boundaries around space. Examples: walls, buildings, shoreline, curbstone, streets, and overpasses. Districts: medium to large areas people can enter into and out of that have a common set of identifiable characteristics. Nodes: large areas people can enter, that serve as the foci of the city, neighborhood, district, etc. Landmarks: memorable points of reference people cannot enter into. Examples: signs, mountains and public art. In 1914, half a century before The Image of the City was published, Paul Stern discussed a concept similar to imageability in the context of art. Stern, in Susan Langer's Reflections on Art, names the attribute that describes how vividly and intensely an artistic object could be experienced apparency. == In computer vision == Automated image recognition was developed by using machine learning to find patterns in large, annotated datasets of photographs, like ImageNet. Images in ImageNet are labelled using concepts in WordNet. Concepts that are easily expressed verbally, like "early", are seen as less "imageable" than nouns referring to physical objects like "leaf". Training AI models to associate concepts with low imageability with specific images can lead to problematic bias in image recognition algorithms. This has particularly been critiqued as it relates to the "person" category of WordNet and therefore also ImageNet. Trevor Pagan and Kate Crawford demonstrated in their essay "Excavating AI" and their art project ImageNet Roulette how this leads to photos of ordinary people being labelled by AI systems as "terrorists" or "sex offenders". Images in datasets are often labelled as having a certain level of imageability. As described by Kaiyu Yang, Fei-Fei Li and co-authors, this is often done following criteria from Allan Paivio and collaborators' 1968 psycholinguistic study of nouns. Yang el.al. write that dataset annotators tasked with labelling imageability "see a list of words and rate each word on a 1-7 scale from 'low imagery' to 'high imagery'. To avoid biased or harmful image recognition and image generation, Yang et.al. recommend not training vision recognition models on concepts with low imageability, especially when the concepts are offensive (such as sexual or racial slurs) or sensitive (their examples for this category include "orphan", "separatist", "Anglo-Saxon" and "crossover voter"). Even "safe" concepts with low imageability, like "great-niece" or "vegetarian" can lead to misleading results and should be avoided.