AI Tools

AI Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Data preprocessing

    Data preprocessing

    Data preprocessing can refer to manipulation, filtration or augmentation of data before it is analyzed, and is often an important step in the data mining process. Data collection methods are often loosely controlled, resulting in out-of-range values, impossible data combinations, and missing values, amongst other issues. Preprocessing is the process by which unstructured data is transformed into intelligible representations suitable for machine-learning models. This phase of model deals with noise in order to arrive at better and improved results from the original data set which was noisy. This dataset also has some level of missing value present in it. The preprocessing pipeline used can often have large effects on the conclusions drawn from the downstream analysis. Thus, representation and quality of data is necessary before running any analysis. If there is a high proportion of irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase may be more difficult. Data preparation and filtering steps can take a considerable amount of processing time. Examples of methods used in data preprocessing include cleaning, instance selection, normalization, one-hot encoding, data transformation, feature extraction and feature selection. == Applications == === Data mining === Data preprocessing allows for the removal of unwanted data with the use of data cleaning, this allows the user to have a dataset to contain more valuable information after the preprocessing stage for data manipulation later in the data mining process. Editing such dataset to either correct data corruption or human error is a crucial step to get accurate quantifiers like true positives, true negatives, false positives and false negatives found in a confusion matrix that are commonly used for a medical diagnosis. Users are able to join data files together and use preprocessing to filter any unnecessary noise from the data which can allow for higher accuracy. Users use Python programming scripts accompanied by the pandas library which gives them the ability to import data from a comma-separated values as a data-frame. The data-frame is then used to manipulate data that can be challenging otherwise to do in Excel. Pandas (software) which is a powerful tool that allows for data analysis and manipulation; which makes data visualizations, statistical operations and much more, a lot easier. Many also use the R programming language to do such tasks as well. The reason why a user transforms existing files into a new one is because of many reasons. Aspects of data preprocessing may include imputing missing values, aggregating numerical quantities and transforming continuous data into categories (data binning). More advanced techniques like principal component analysis and feature selection are working with statistical formulas and are applied to complex datasets which are recorded by GPS trackers and motion capture devices. === Semantic data preprocessing === Semantic data mining is a subset of data mining that specifically seeks to incorporate domain knowledge, such as formal semantics, into the data mining process. Domain knowledge is the knowledge of the environment the data was processed in. Domain knowledge can have a positive influence on many aspects of data mining, such as filtering out redundant or inconsistent data during the preprocessing phase. Domain knowledge also works as constraint. It does this by using working as set of prior knowledge to reduce the space required for searching and acting as a guide to the data. Simply put, semantic preprocessing seeks to filter data using the original environment of said data more correctly and efficiently. There are increasingly complex problems which are asking to be solved by more elaborate techniques to better analyze existing information. Instead of creating a simple script for aggregating different numerical values into a single value, it make sense to focus on semantic based data preprocessing. The idea is to build a dedicated ontology, which explains on a higher level what the problem is about. In regards to semantic data mining and semantic pre-processing, ontologies are a way to conceptualize and formally define semantic knowledge and data. The Protégé (software) is the standard tool for constructing an ontology. In general, the use of ontologies bridges the gaps between data, applications, algorithms, and results that occur from semantic mismatches. As a result, semantic data mining combined with ontology has many applications where semantic ambiguity can impact the usefulness and efficiency of data systems. Applications include the medical field, language processing, banking, and even tutoring, among many more. There are various strengths to using a semantic data mining and ontological based approach. As previously mentioned, these tools can help during the per-processing phase by filtering out non-desirable data from the data set. Additionally, well-structured formal semantics integrated into well designed ontologies can return powerful data that can be easily read and processed by machines. A specifically useful example of this exists in the medical use of semantic data processing. As an example, a patient is having a medical emergency and is being rushed to hospital. The emergency responders are trying to figure out the best medicine to administer to help the patient. Under normal data processing, scouring all the patient’s medical data to ensure they are getting the best treatment could take too long and risk the patients’ health or even life. However, using semantically processed ontologies, the first responders could save the patient’s life. Tools like a semantic reasoner can use ontology to infer the what best medicine to administer to the patient is based on their medical history, such as if they have a certain cancer or other conditions, simply by examining the natural language used in the patient's medical records. This would allow the first responders to quickly and efficiently search for medicine without having worry about the patient’s medical history themselves, as the semantic reasoner would already have analyzed this data and found solutions. In general, this illustrates the incredible strength of using semantic data mining and ontologies. They allow for quicker and more efficient data extraction on the user side, as the user has fewer variables to account for, since the semantically pre-processed data and ontology built for the data have already accounted for many of these variables. However, there are some drawbacks to this approach. Namely, it requires a high amount of computational power and complexity, even with relatively small data sets. This could result in higher costs and increased difficulties in building and maintaining semantic data processing systems. This can be mitigated somewhat if the data set is already well organized and formatted, but even then, the complexity is still higher when compared to standard data processing. Below is a simple a diagram combining some of the processes, in particular semantic data mining and their use in ontology. The diagram depicts a data set being broken up into two parts: the characteristics of its domain, or domain knowledge, and then the actual acquired data. The domain characteristics are then processed to become user understood domain knowledge that can be applied to the data. Meanwhile, the data set is processed and stored so that the domain knowledge can applied to it, so that the process may continue. This application forms the ontology. From there, the ontology can be used to analyze data and process results. Fuzzy preprocessing is another, more advanced technique for solving complex problems. Fuzzy preprocessing and fuzzy data mining make use of fuzzy sets. These data sets are composed of two elements: a set and a membership function for the set which comprises 0 and 1. Fuzzy preprocessing uses this fuzzy data set to ground numerical values with linguistic information. Raw data is then transformed into natural language. Ultimately, fuzzy data mining's goal is to help deal with inexact information, such as an incomplete database. Currently fuzzy preprocessing, as well as other fuzzy based data mining techniques see frequent use with neural networks and artificial intelligence.

    Read more →
  • RAMnets

    RAMnets

    RAMnets is one of the oldest practical neurally inspired classification algorithms. The RAMnets is also known as a type of "n-tuple recognition method" or "weightless neural network". == Algorithm == Consider (let us say N) sets of n distinct bit locations are selected randomly. These are the n-tuples. The restriction of a pattern to an n-tuple can be regarded as an n-bit number which, together with the identity of the n-tuple, constitutes a `feature' of the pattern. The standard n-tuple recognizer operates simply as follows: A pattern is classified as belonging to the class for which it has the most features in common with at least one training pattern of that class. This is the Θ {\displaystyle \Theta } = 0 case of a more general rule whereby the class assigned to unclassified pattern u is a c r g m a x ( ∑ i = 1 N Θ ( ∑ v ∈ D c δ ( α i ( u ) , α i ( v ) ) ) ) {\displaystyle {\begin{aligned}{\underset {c}{a}}rgmax(\sum _{i=1}^{N}\Theta (\sum _{v\in D_{c}}\delta (\alpha _{i}(u),\alpha _{i}(v))))\end{aligned}}} where Dc is the set of training patterns in class c, Θ ( x ) {\displaystyle \Theta (x)} = x for 0 ≤ x ≤ θ {\displaystyle 0\leq x\leq \theta } , Θ ( x ) = θ {\displaystyle \Theta (x)=\theta } for x ≥ θ {\displaystyle x\geq \theta } , δ i , j {\displaystyle \delta _{i,j}} is the Kronecker delta( δ i , j {\displaystyle \delta _{i,j}} =1 if i=j and 0 otherwise.)and ( α i ( u ) ) {\displaystyle (\alpha _{i}(u))} is the ith feature of the pattern u: ∑ j = 0 n − 1 u η i ( j ) 2 j {\displaystyle \sum _{j=0}^{n-1}u_{\eta }i(j)2^{j}} Here uk is the kth bit of u and u η i ( j ) {\displaystyle u_{\eta }i(j)} is the jth bit location of the ith n-tuple. With C classes to distinguish, the system can be implemented as a network of NC nodes, each of which is a random access memory (RAM); hence the term RAMnet. The memory content m c i α {\displaystyle m_{ci\alpha }} at address α {\displaystyle \alpha } of the ith node allocated to class c is set to m c i α {\displaystyle m_{ci\alpha }} = Θ ( ∑ v ∈ D c δ ( α , α i ( v ) ) ) {\displaystyle \Theta (\sum _{v\in D_{c}}\delta (\alpha ,\alpha _{i}(v)))} In the usual θ {\displaystyle \theta } = 1 case, the 1-bit content of m c i α {\displaystyle m_{ci\alpha }} is set if any pattern of Dc has feature α {\displaystyle \alpha } and unset otherwise. Recognition is accomplished by summing the contents of the nodes of each class at the addresses given by the features of the unclassified pattern. That is, pattern u is assigned to class a c r g m a x ( ∑ i = 1 N m c i α ( u ) ) {\displaystyle {\begin{aligned}{\underset {c}{a}}rgmax(\sum _{i=1}^{N}m_{ci\alpha }(u))\end{aligned}}} == RAM-discriminators and WiSARD == The RAMnets formed the basis of a commercial product known as WiSARD (Wilkie, Stonham and Aleksander Recognition Device) was the first artificial neural network machine to be patented. A RAM-discriminator consists of a set of X one-bit word RAMs with n inputs and a summing device (Σ). Any such RAM-discriminator can receive a binary pattern of X⋅n bits as input. The RAM input lines are connected to the input pattern by means of a biunivocal pseudo-random mapping. The summing device enables this network of RAMs to exhibit – just like other ANN models based on synaptic weights – generalization and noise tolerance. In order to train the discriminator one has to set all RAM memory locations to 0 and choose a training set formed by binary patterns of X⋅n bits. For each training pattern, a 1 is stored in the memory location of each RAM addressed by this input pattern. Once the training of patterns is completed, RAM memory contents will be set to a certain number of 0's and 1's. The information stored by the RAM during the training phase is used to deal with previous unseen patterns. When one of these is given as input, the RAM memory contents addressed by the input pattern are read and summed by Σ. The number r thus obtained, which is called the discriminator response, is equal to the number of RAMs that output 1. r reaches the maximum X if the input belongs to the training set. r is equal to 0 if no n-bit component of the input pattern appears in the training set (not a single RAM outputs 1). Intermediate values of r express a kind of “similarity measure” of the input pattern with respect to the patterns in the training set. A system formed by various RAM-discriminators is called WiSARD. Each RAM-discriminator is trained on a particular class of patterns, and classification by the multi-discriminator system is performed in the following way. When a pattern is given as input, each RAM-discriminator gives a response to that input. The various responses are evaluated by an algorithm which compares them and computes the relative confidence c of the highest response (e.g., the difference d between the highest response and the second highest response, divided by the highest response). A schematic representation of a RAM-discriminator and a 10 RAM-discriminator WiSARD is shown in Figure 1.

    Read more →
  • CarPlay

    CarPlay

    CarPlay is an Apple standard that enables a car radio or automotive head unit to be a display and controller for an iOS device. It is available on iPhone 5 and later models running iOS 7.1 or later. More than 800 car and motorcycle models support CarPlay, according to Apple. Vehicle owners can add support by installing certain aftermarket vehicle audio products. Most CarPlay systems connect to iOS through USB, some are wireless, and wireless support can be added through aftermarket dongles. CarPlay Ultra, a more integrated version of CarPlay, was first announced on Aston Martin DBX707 in May 2025. == Software == Apple's CarPlay-enabled apps include: Phone Apple Music Apple Maps Calendar Messages Audiobooks (part of Apple Books) Podcasts Settings News Developers must obtain permission from Apple to develop CarPlay-enabled apps. Such apps fall into five categories: Audio: primarily provide audio content, such as music or podcasts. Examples: Amazon Music, Audible, Google Play Music, iHeartRadio, QQ Music, Spotify, and Overcast. Navigation: turn-by-turn guidance, including searching for points of interests and navigating to a destination. Examples: AutoNavi, Baidu Maps, Google Maps, ChargeFinder and Waze. Automaker-made apps allow a user to control vehicle-specific features such as climate controls, gas levels, or radio via CarPlay. Messaging/Voice over IP (VoIP): listen to new messages and reply using dictation in an audio-only interface. Messaging apps on CarPlay integrate with third-party Siri support (known as SiriKit), while VoIP apps integrate with the iOS calling interface using CallKit. Examples: Telegram, WhatsApp, and Zoom. Food-ordering and parking-services apps. To discourage distracted driving, Siri is used extensively, providing voice turn-by-turn navigation guidance and voice-input for text messages. Newscast-style weather and stock results are announced instead of displayed. Requests that bring up visual information may be blocked when the car is in gear, and most native CarPlay apps deliver audio content with minimal interaction. CarPlay-enabled apps installed on the device appear on the CarPlay home screen unless disabled by the user. The inclusion or exclusion and order of app appearance can be changed on a per-vehicle basis. == Hardware == Most of the CarPlay software runs on the connected iPhone. The CarPlay interface provides audio output and a visual display to the vehicle's infotainment system, while adapting to the vehicle's available control methods, including touch screens, rotary dials, physical buttons, steering-wheel controls, and hands-free microphones. Aftermarket head units may support CarPlay or Android Auto, and many support both platforms. === Wired CarPlay === In a wired CarPlay configuration, the iPhone connects to the vehicle or head unit via a USB cable. The USB connection supplies power to the iPhone and provides a stable data link for audio, video, and control input. Wired CarPlay is supported by a wide range of factory-installed infotainment systems and aftermarket head units. Some third-party devices marketed as wireless CarPlay adapters operate by emulating a wired CarPlay connection to the vehicle. These devices plug into the vehicle's USB port and present themselves as a wired CarPlay interface, while separately establishing a wireless connection to the iPhone. Such devices still require the vehicle or head unit to support standard (wired) CarPlay. === Wireless CarPlay === Wireless CarPlay allows the iPhone to connect to a compatible vehicle or head unit without a physical cable. During the initial pairing process, the iPhone exchanges network credentials with the CarPlay receiver over Bluetooth. Once paired, CarPlay data is transmitted over a two-way Wi-Fi connection between the phone and the vehicle. Wireless CarPlay support depends on both the vehicle or head unit hardware and the iPhone model, and is generally limited to newer factory systems and select aftermarket receivers. == History == === Predecessor === In 2008, one year after the release of the iPhone, Mercedes vehicles were first to sell an audio system incorporating both the iPod and iPhone, equipped with 30-pin iOS input jacks. The new 2008 Harman Kardon NTG 2.5 featured full audio streaming, syncing, charging and control integrated into the steering wheel controls, instrument panel, and head unit. Apple was working with Mercedes to develop iOS compatible audio systems into their cars first only a year after iPhone launch. With an Apple Lightning-to-30-pin adapter, iPhones/iPods remain backwards-compatible with the Harman Kardon 2.5 and later models. This is the earliest audio system specifically engineered for iPod/iPhone integration, which predated CarPlay and every other manufacturer incorporating iOS into vehicles. The concept of CarPlay was based on the iOS 4 feature called "iPod Out" which was produced through several years of joint development by Apple and the BMW Group's Technology Office USA. iPod Out enabled vehicles with the necessary infrastructure to "host" the analog video and audio from a supporting iOS device while receiving inputs, such as button presses and knob rotations, from a car's infotainment system, to drive the "hosted" user interface in the vehicle's built-in display. It was announced at WWDC 2010 and first shipped in BMW Group vehicles in early 2011. The BMW and Mini option was called "PlugIn" and paved the way for the first cross-OEM platforms, introducing the concept of requiring a car-specific interface for apps (as opposed to MirrorLink's simple and insufficient mirroring of what was shown on the smartphone's screen). === Development === CarPlay's codename was Stark. Apple's Eddy Cue announced it as iOS in the Car at WWDC 2013. In January 2014, it was reported that Apple's hardware-oriented corporate culture had led to release delays. iOS in the Car was then rebranded and launched as CarPlay with significant design changes at the Geneva Motor Show in March 2014 with Ferrari, Kia, Mercedes-Benz, and Volvo among the first car manufacturers. At WWDC 2022, Apple announced plans to release an all-new version of CarPlay, informally dubbed CarPlay 2. The new version was said to be able to control vehicle functions, access vehicle stats, and take over multiple vehicle screens. Officials said they planned to release it in late 2024 and that manufacturers that are planning to adopt the new CarPlay include: Audi, Acura, Ford, Honda, Infiniti, Jaguar, Land Rover, Lincoln, Mercedes-Benz, Nissan, Polestar, Porsche, Renault, and Volvo. In January 2025, amidst delays, Apple removed the planned released date from its website. On May 15, 2025, Apple announced that next-generation CarPlay, now called CarPlay Ultra, would be included with all new vehicles from Aston Martin. Existing vehicles will also be receiving CarPlay Ultra through a future software update. It is only available in the US and Canada. == Timeline == June 2013: Apple introduced iOS in the Car; an early version of CarPlay that was never publicly released, at WWDC 2013. June 2013: BMW officials announced their cars would not support iOS in the Car; they later changed their minds. November 2013: Siri Eyes Free mode was offered as a dealer-installed accessory in the US to some Honda Accord and Acura RDX & ILX models. In December, Honda offered additional integration, featuring new HondaLink services, on some US and Canada models of the Civic and the Fit. March 2014: Apple introduced CarPlay, which was renamed from iOS in the Car with significant design changes, at the 2014 Geneva Motor Show with automakers Ferrari, Mercedes-Benz and Volvo. September 2014: A Ferrari FF was the first car with a full version of CarPlay. November 2014: Hyundai announced the Sonata sedan would be their first model with available CarPlay by the end of the first quarter of 2015. January 2015: Volkswagen announced CarPlay support would be coming later in 2015 and would be either standard or available on the majority of their 2016 model year lineup. May 2015: General Motors announced CarPlay would be available starting with 14 different 2016 model year Chevrolet vehicles. July 2015: Honda announced CarPlay would be available in their vehicles starting with the 2016 Honda Accord. December 2015: Volvo implemented CarPlay in the 2016 Volvo XC90 as their first vehicle with CarPlay support. December 2015: Mercedes-Benz confirmed that CarPlay would be available starting with select 2016 model year vehicles. January 2016: Apple released a list detailing the car models which support CarPlay. January 2016: Ford announced CarPlay would be available on all 2017 Ford/Lincoln model year vehicles equipped with the Sync 3 infotainment system. January 2016: FCA (now a part of Stellantis) announced CarPlay would be available on their UConnect infotainment system starting with select 2016 model year vehicles. March 2016: Subaru announced the beginning of CarPlay and Android Auto support, st

    Read more →
  • General Regionally Annotated Corpus of Ukrainian

    General Regionally Annotated Corpus of Ukrainian

    General Regionally Annotated Corpus of the Ukrainian Language (GRAC, Ukrainian: Генеральний регіонально анотований корпус української мови, romanized: Heneralnyi rehionalno anotovanyi korpus ukrainskoi movy, ГРАК, Ukrainian грак for rook) is a text corpus of the Ukrainian language comprising more than 2 billion tokens, intended for linguistic research in grammar, vocabulary, and the history of the Ukrainian literary language, as well as for use in compiling dictionaries and grammars. The corpus can be used for language study and also for preparing teaching materials, textbooks, learner’s dictionaries, and exercises using examples from real texts, taking into account frequency and collocational patterns, and so on. The corpus is not a model of standard Ukrainian: it may contain words and combinations that do not match current norms of the literary language. The corpus covers the period from 1816 to 2025, and as of 29 November 2025 it contains more than 812,000 texts by about 35,000 authors. == Composition of the corpus == In the 10th version of the corpus, available for searching from 20 October 2020, 35% consists of fiction. Some fiction genres are highlighted separately: children’s literature, folklore, dramatic works, and scripts. Among non-fiction texts: journalistic writing, including newspaper collections from 1888–1893, 1905, 1913–1918, 1919–1943, modern newspapers from different regions, and texts from online news/information sites; memoirs, letters, and diaries, including a sizeable corpus of Facebook texts representing blogs by people from all regions of Ukraine and the diaspora; scholarly and educational texts: monographs, dissertations, academic articles, textbooks; large subcorpora of academic literature in history, ethnography, philosophy, and law are singled out separately; religious texts, including two Ukrainian translations of the Bible; speeches and interviews. Some dictionaries that include phrasal examples and phraseology have also been incorporated, including the Ukrainian dictionary by Borys Hrinchenko and the Russian-Ukrainian idiomatic dictionary by I. Vyrhan and M. Pylynska. Using the corpus tools, these dictionaries can be searched not only for words, but also for lexico-grammatical patterns within examples and phraseological expressions. About 20% of the texts in the corpus are translations. The corpus includes translations from more than 80 languages, most of all from English and Russian. == Dating == Texts in the corpus are dated by the year of writing, or by the latest year in which a work could have been written; translated texts are dated by the year the translation was produced. A publication year may also be indicated, corresponding to the edition from which the text is taken. == Regional annotation == The corpus’s regional annotation is based on the modern administrative division of Ukraine. The corpus includes texts from all oblasts of Ukraine and from Crimea. A single text may belong to several regional subcorpora (if the author or translator was born, studied, or lived for a long time in different regions). In addition to regional subcorpora, there are subcorpora of works by authors of the Ukrainian diaspora (USA, Canada, Poland, Germany, the United Kingdom, France, etc.). These are mostly texts by emigrants of the 1940s, and to a lesser extent of 1917–1920s. == Morphological annotation == GRAC is based on the morphological analysis system nlp_uk, developed by specialists from the r2u group. The program analyzes the text and, for each word form, determines the lemma (lexeme) and tags (grammatical features). == Research based on the corpus == Research on the Ukrainian language has been carried out using the corpus, including studies of the historical dynamics of language norms, and letter and letter-combination frequencies for font development.

    Read more →
  • Metadatabase

    Metadatabase

    Metadatabase is a database model for (1) metadata management, (2) global query of independent databases, and (3) distributed data processing. The word metadatabase is an addition to the dictionary. Originally, metadata was only a common term referring simply to "data about data", such as tags, keywords, and markup headers. However, in this technology, the concept of metadata is extended to also include such data and knowledge representation as information models (e.g., relations, entities-relationships, and objects), application logic (e.g., production rules), and analytic models (e.g., simulation, optimization, and mathematical algorithms). In the case of analytic models, it is also referred to as a Modelbase. These classes of metadata are integrated with some modeling ontology to give rise to a stable set of meta-relations (tables of metadata). Individual models are interpreted as metadata and entered into these tables. As such, models are inserted, retrieved, updated, and deleted in the same manner as ordinary data do in an ordinary (relational) database. Users will also formulate global queries and requests for processing of local databases through the Metadatabase, using the globally integrated metadata. The Metadatabase structure can be implemented in any open technology for relational databases. == Significance == The Metadatabase technology is developed at Rensselaer Polytechnic Institute at Troy, New York, by a group of faculty and students (see the references at the end of the article), starting in late 1980s. Its main contribution includes the extension of the concept of metadata and metadata management, and the original approach of designing a database for metadata applications. These conceptual results continue to motivate new research and new applications. At the level of particular design, its openness and scalability is tied to that of the particular ontology proposed: It requires reverse-representation of the application models in order to save them into the meta-relations. In theory, the ontology is neutral, and it has been proven in some industrial applications. However, it needs more development to establish it for the field as an open technology. The requirement of reverse-representation is common to any global information integration technology. A way to facilitate it in the Metadatabase approach is to distribute a core portion of it at each local site, to allow for peer-to-peer translation on the fly.

    Read more →
  • Julia Hirschberg

    Julia Hirschberg

    Julia Hirschberg is an American computer scientist noted for her research on computational linguistics and natural language processing. She received her first PhD in history from the University of Michigan and the second from the University of Pennsylvania in computer science doing research in Natural Language Processing. She worked at Bell Labs and AT&T Bell Labs from 1985 to 2002 and from 2002 at Columbia University where she is currently the Percy K. and Vida L. W. Hudson Professor of Computer Science. == Biography == Julia Linn Bell Hirschberg received her first Ph.D. degree in history (16th-century Mexico) from University of Michigan in 1976. She served on the History faculty of Smith College from 1974 to 1982. She subsequently shifted to Computer Science studies, receiving her M.S. in Computer and Information Science from University of Pennsylvania in 1982 and a Ph.D. in Computer and Information Science from University of Pennsylvania in 1985. Upon graduation from University of Pennsylvania in 1985, Hirschberg joined AT&T Bell Labs as a Member of Technical staff in the Linguistics Research Department, where she worked on improving prosody assignment for Text-to-Speech Synthesis (TTS) in the Bell Labs TTS system. She was promoted to Department Head in 1994 when she created a new Human Computer Interface Research Lab. She and her department remained at Bell Labs until 1996 when they moved to AT&T Labs Research as part of a corporate reorganization. In 2002, she joined the Columbia University faculty as a professor in the Department of Computer Science. She served as Chair of the Computer Science Department from 2012 to 2018. She still leads classes at Columbia in speech and natural language research and supervises PhD students and a large number of research project students. == Research == Hirschberg's research has included prosody, discourse structure, conversational implicature, text-to-speech synthesis, speech summarization, spoken dialogue systems, emotional speech, deceptive speech, charismatic speech, entrainment, empathetic speech and code-switching. Hirschberg was among the first to combine Natural Language Processing (NLP) approaches to discourse and dialogue with speech research. She pioneered techniques in text analysis for prosody assignment in Text-to-Speech synthesis at Bell laboratories in the 1980s and 1990s, developing corpus-based statistical models based upon syntactic and discourse information which are in general use today in TTS systems. With Janet Pierrehumbert, she developed a theoretical model of intonational meaning. She was a leader in the development of the ToBI conventions for intonational description, which have been extended to numerous languages and which today are the most widely used standard for intonational annotation. Hirschberg has been a pioneer together with Gregory Ward in much experimental work on intonational sources of language meaning and how these interact with pragmatic phenomena, particularly on the meaning of accent (intonational prominent) items and the meaning of intonational contours. She also has innovated in numerous other areas involving prosody and meaning, including the role of grammatical function and surface position in pitch accent location, the use of prosody in disambiguating cue phrases (discourse markers) with Diane Litman, the role of prosody in disambiguation in English, Italian, and Spanish with Cinzia Avesani and Pilar Prieto, and the automatic identification of speech recognition errors using prosodic information, At AT&T Labs she worked with Fernando Pereira, Steve Whittaker, and others on speech search and developing new interfaces for speech navigation. At Columbia, she and her students have continued and extended research on spoken dialogue systems (automatically detecting speech recognition errors and inappropriate system queries, modeling turn-taking behavior, dialogue entrainment, modeling and generating clarification dialogues); on the automatic classification of trust, charisma, deception and emotion from speech; on speech summarization; prosody translation, hedging behavior in text and speech, text-to-speech synthesis, and speech search in low resource languages. She also holds several patents in TTS and in speech search. Corpora she and collaborators have collected include the Boston Directions Corpus, the Columbia SRI Colorado Deception Corpus, and the Columbia Games Corpus. She has served on numerous technical boards and editorial committees. She has served as a member of the Computing Research Association's (CRA) Board of Directors and as co-chair of CRA-W. She is also noted for her leadership in broadening participation in computing. == Awards == Hirschberg's notable honors and awards include: Elected as a member of the National Academy of Artificial Intelligence Academy of Sciences and recipient of the NAAI Artificial Intelligence Exploration Award, 2025 Elected as a Fellow of Asia-Pacific Artificial Intelligence Association (AAIA), 2024. 2020 ISCA Special Service Medal Honorary Doctorate (eredoctoraat) from Tilburg University, Netherlands, 2018. American Academy of Arts and Sciences, 2018. IEEE Fellow, 2017 National Academy of Engineering, 2017 ACM Fellow in 2015 Elected member, American Philosophical Society, 2014. Honorary member, Association for Laboratory Phonology, 2014. Association for Computational Linguistics (ACL) (Founding) Fellow, 2011. International Speech Communication Association (ISCA) Medal for Scientific Achievement, 2011. IEEE James L. Flanagan Speech and Audio Processing Award, 2011. Honorary Doctorate (Hedersdoktorer), KTH (Royal Institute of Technology) Stockholm, Sweden, 2007. AAAI Fellow, 1994. == Publications == A social history of Puebla de Los Ángeles, 1531-60, 1976 Empirical studies on the disambiguation of cue phrases, 1991 Prosody and conversation, 1998 Most recent publications and other information, https://www.cs.columbia.edu/speech/.

    Read more →
  • How to Choose an AI Chatbot

    How to Choose an AI Chatbot

    Looking for the best AI chatbot? An AI chatbot is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI chatbot slots into your workflow and pays for itself fast. Read on for hands-on impressions, pricing tiers, and the standout features that matter.

    Read more →
  • Noisy channel model

    Noisy channel model

    The noisy channel model is a framework used in spell checkers, question answering, speech recognition, and machine translation. In this model, the goal is to find the intended word given a word where the letters have been scrambled in some manner. == In spell-checking == See Chapter B of. Given an alphabet Σ {\displaystyle \Sigma } , let Σ ∗ {\displaystyle \Sigma ^{}} be the set of all finite strings over Σ {\displaystyle \Sigma } . Let the dictionary D {\displaystyle D} of valid words be some subset of Σ ∗ {\displaystyle \Sigma ^{}} , i.e., D ⊆ Σ ∗ {\displaystyle D\subseteq \Sigma ^{}} . The noisy channel is the matrix Γ w s = Pr ( s | w ) {\displaystyle \Gamma _{ws}=\Pr(s|w)} , where w ∈ D {\displaystyle w\in D} is the intended word and s ∈ Σ ∗ {\displaystyle s\in \Sigma ^{}} is the scrambled word that was actually received. The goal of the noisy channel model is to find the intended word given the scrambled word that was received. The decision function σ : Σ ∗ → D {\displaystyle \sigma :\Sigma ^{}\to D} is a function that, given a scrambled word, returns the intended word. Methods of constructing a decision function include the maximum likelihood rule, the maximum a posteriori rule, and the minimum distance rule. In some cases, it may be better to accept the scrambled word as the intended word rather than attempt to find an intended word in the dictionary. For example, the word schönfinkeling may not be in the dictionary, but might in fact be the intended word. === Example === Consider the English alphabet Σ = { a , b , c , . . . , y , z , A , B , . . . , Z , . . . } {\displaystyle \Sigma =\{a,b,c,...,y,z,A,B,...,Z,...\}} . Some subset D ⊆ Σ ∗ {\displaystyle D\subseteq \Sigma ^{}} makes up the dictionary of valid English words. There are several mistakes that may occur while typing, including: Missing letters, e.g., leter instead of letter Accidental letter additions, e.g., misstake instead of mistake Swapping letters, e.g., recieved instead of received Replacing letters, e.g., fimite instead of finite To construct the noisy channel matrix Γ {\displaystyle \Gamma } , we must consider the probability of each mistake, given the intended word ( Pr ( s | w ) {\displaystyle \Pr(s|w)} for all w ∈ D {\displaystyle w\in D} and s ∈ Σ ∗ {\displaystyle s\in \Sigma ^{}} ). These probabilities may be gathered, for example, by considering the Damerau–Levenshtein distance between s {\displaystyle s} and w {\displaystyle w} or by comparing the draft of an essay with one that has been manually edited for spelling. == In machine translation == One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. See chapter 1, and chapter 25 of. Suppose we want to translate a foreign language to English, we could model P ( E | F ) {\displaystyle P(E|F)} directly: the probability that we have English sentence E given foreign sentence F, then we pick the most likely one E ^ = arg ⁡ max E P ( E | F ) {\displaystyle {\hat {E}}=\arg \max _{E}P(E|F)} . However, by Bayes law, we have the equivalent equation: E ^ = argmax E ∈ English P ( F ∣ E ) ⏞ translation model P ( E ) ⏞ language model {\displaystyle {\hat {E}}={\underset {E\in {\text{ English }}}{\operatorname {argmax} }}\overbrace {P(F\mid E)} ^{\text{translation model }}\overbrace {P(E)} ^{\text{language model}}} The benefit of the noisy-channel model is in terms of data: If collecting a parallel corpus is costly, then we would have only a small parallel corpus, so we can only train a moderately good English-to-foreign translation model, and a moderately good foreign-to-English translation model. However, we can collect a large corpus in the foreign language only, and a large corpus in the English language only, to train two good language models. Combining these four models, we immediately get a good English-to-foreign translator and a good foreign-to-English translator. The cost of noisy-channel model is that using Bayesian inference is more costly than using a translation model directly. Instead of reading out the most likely translation by arg ⁡ max E P ( E | F ) {\displaystyle \arg \max _{E}P(E|F)} , it would have to read out predictions by both the translation model and the language model, multiply them, and search for the highest number. == In speech recognition == Speech recognition can be thought of as translating from a sound-language to a text-language. Consequently, we have T ^ = argmax T ∈ Text P ( S ∣ T ) ⏞ speech model P ( T ) ⏞ language model {\displaystyle {\hat {T}}={\underset {T\in {\text{ Text }}}{\operatorname {argmax} }}\overbrace {P(S\mid T)} ^{\text{speech model }}\overbrace {P(T)} ^{\text{language model}}} where P ( S | T ) {\displaystyle P(S|T)} is the probability that a speech sound S is produced if the speaker is intending to say text T. Intuitively, this equation states that the most likely text is a text that's both a likely text in the language, and produces the speech sound with high probability. The utility of the noisy-channel model is not in capacity. Theoretically, any noisy-channel model can be replicated by a direct P ( T | S ) {\displaystyle P(T|S)} model. However, the noisy-channel model factors the model into two parts which are appropriate for the situation, and consequently it is generally more well-behaved. When a human speaks, it does not produce the sound directly, but first produces the text it wants to speak in the language centers of the brain, then the text is translated into sound by the motor cortex, vocal cords, and other parts of the body. The noisy-channel model matches this model of the human, and so it is appropriate. This is justified in the practical success of noisy-channel model in speech recognition. === Example === Consider the sound-language sentence (written in IPA for English) S = aɪ wʊd laɪk wʌn tuː. There are three possible texts T 1 , T 2 , T 3 {\displaystyle T_{1},T_{2},T_{3}} : T 1 = {\displaystyle T_{1}=} I would like one to. T 2 = {\displaystyle T_{2}=} I would like one too. T 3 = {\displaystyle T_{3}=} I would like one two. that are equally likely, in the sense that P ( S | T 1 ) = P ( S | T 2 ) = P ( S | T 3 ) {\displaystyle P(S|T_{1})=P(S|T_{2})=P(S|T_{3})} . With a good English language model, we would have P ( T 2 ) > P ( T 1 ) > P ( T 3 ) {\displaystyle P(T_{2})>P(T_{1})>P(T_{3})} , since the second sentence is grammatical, the first is not quite, but close to a grammatical one (such as "I would like one to [go]."), while the third one is far from grammatical. Consequently, the noisy-channel model would output T 2 {\displaystyle T_{2}} as the best transcription.

    Read more →
  • Question answering

    Question answering

    Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP) that is concerned with building systems that automatically answer questions that are posed by humans in a natural language. A question-answering implementation, usually a computer program, may construct its answers by querying a structured database of knowledge or information, usually a knowledge base. More commonly, question-answering systems can pull answers from an unstructured collection of natural language documents. Some examples of natural language document collections used for question answering systems include reference texts, compiled newswire reports, Wikipedia pages and other World Wide Web pages. == History == Two early question answering systems were BASEBALL and LUNAR. BASEBALL answered questions about Major League Baseball over a period of one year. LUNAR answered questions about the geological analysis of rocks returned by the Apollo Moon missions. Both question answering systems were very effective in their chosen domains. LUNAR was demonstrated at a lunar science convention in 1971 and it was able to answer 90% of the questions in its domain that were posed by people untrained on the system. Further restricted-domain question answering systems were developed in the following years. The common feature of all these systems is that they had a core database or knowledge system that was hand-written by experts of the chosen domain. The language abilities of BASEBALL and LUNAR used techniques similar to ELIZA and DOCTOR, the first chatterbot programs. SHRDLU was a successful question-answering program developed by Terry Winograd in the late 1960s and early 1970s. It simulated the operation of a robot in a toy world (the "blocks world"), and it offered the possibility of asking the robot questions about the state of the world. The strength of this system was the choice of a very specific domain and a very simple world with rules of physics that were easy to encode in a computer program. In the 1970s, knowledge bases were developed that targeted narrower domains of knowledge. The question answering systems developed to interface with these expert systems produced more repeatable and valid responses to questions within an area of knowledge. These expert systems closely resembled modern question answering systems except in their internal architecture. Expert systems rely heavily on expert-constructed and organized knowledge bases, whereas many modern question answering systems rely on statistical processing of a large, unstructured, natural language text corpus. The 1970s and 1980s saw the development of comprehensive theories in computational linguistics, which led to the development of ambitious projects in text comprehension and question answering. One example was the Unix Consultant (UC), developed by Robert Wilensky at U.C. Berkeley in the late 1980s. The system answered questions pertaining to the Unix operating system. It had a comprehensive, hand-crafted knowledge base of its domain, and it aimed at phrasing the answer to accommodate various types of users. Another project was LILOG, a text-understanding system that operated on the domain of tourism information in a German city. The systems developed in the UC and LILOG projects never went past the stage of simple demonstrations, but they helped the development of theories on computational linguistics and reasoning. Specialized natural-language question answering systems have been developed, such as EAGLi for health and life scientists. Question answering systems have been extended in recent years to encompass additional domains of knowledge For example, systems have been developed to automatically answer temporal and geospatial questions, questions of definition and terminology, biographical questions, multilingual questions, and questions about the content of audio, images, and video. Current question answering research topics include: interactivity—clarification of questions or answers answer reuse or caching semantic parsing answer presentation knowledge representation and semantic entailment social media analysis with question answering systems sentiment analysis utilization of thematic roles Image captioning for visual question answering Embodied question answering In 2011, Watson, a question answering computer system developed by IBM, competed in two exhibition matches of Jeopardy! against Brad Rutter and Ken Jennings, winning by a significant margin. Facebook Research made their DrQA system available under an open source license. This system uses Wikipedia as knowledge source. The open source framework Haystack by deepset combines open-domain question answering with generative question answering and supports the domain adaptation of the underlying language models for industry use cases. Large Language Models (LLMs)[36] like GPT-4[37], Gemini[38] are examples of successful QA systems that are enabling more sophisticated understanding and generation of text. When coupled with Multimodal[39] QA Systems, which can process and understand information from various modalities like text, images, and audio, LLMs significantly improve the capabilities of QA systems. == Types == Question-answering research attempts to develop ways of answering a wide range of question types, including fact, list, definition, how, why, hypothetical, semantically constrained, and cross-lingual questions. Answering questions related to an article in order to evaluate reading comprehension is one of the simpler form of question answering, since a given article is relatively short compared to the domains of other types of question-answering problems. An example of such a question is "What did Albert Einstein win the Nobel Prize for?" after an article about this subject is given to the system. Closed-book question answering is when a system has memorized some facts during training and can answer questions without explicitly being given a context. This is similar to humans taking closed-book exams. Closed-domain question answering deals with questions under a specific domain (for example, medicine or automotive maintenance) and can exploit domain-specific knowledge frequently formalized in ontologies. Alternatively, "closed-domain" might refer to a situation where only a limited type of questions are accepted, such as questions asking for descriptive rather than procedural information. Question answering systems in the context of machine reading applications have also been constructed in the medical domain, for instance related to Alzheimer's disease. Open-domain question answering deals with questions about nearly anything and can only rely on general ontologies and world knowledge. Systems designed for open-domain question answering usually have much more data available from which to extract the answer. An example of an open-domain question is "What did Albert Einstein win the Nobel Prize for?" while no article about this subject is given to the system. Another way to categorize question-answering systems is by the technical approach used. There are a number of different types of QA systems, including: rule-based systems, statistical systems, and hybrid systems. Rule-based systems use a set of rules to determine the correct answer to a question. Statistical systems use statistical methods to find the most likely answer to a question. Hybrid systems use a combination of rule-based and statistical methods. == Architecture == As of 2001, question-answering systems typically included a question classifier module that determined the type of question and the type of answer. Different types of question-answering systems employ different architectures. For example, modern open-domain question answering systems may use a retriever-reader architecture. The retriever is aimed at retrieving relevant documents related to a given question, while the reader is used to infer the answer from the retrieved documents. Systems such as GPT-3, T5, and BART use an end-to-end architecture in which a transformer-based architecture stores large-scale textual data in the underlying parameters. Such models can answer questions without accessing any external knowledge sources. == Methods == Question answering is dependent on a good search corpus; without documents containing the answer, there is little any question answering system can do. Larger collections generally mean better question answering performance, unless the question domain is orthogonal to the collection. Data redundancy in massive collections, such as the web, means that nuggets of information are likely to be phrased in many different ways in differing contexts and documents, leading to two benefits: If the right information appears in many forms, the question answering system needs to perform fewer complex NLP techniques to understand the text. Correct answers can be filtered from false positives because the syst

    Read more →
  • Imitation learning

    Imitation learning

    Imitation learning is a paradigm in reinforcement learning, where an agent learns to perform a task by supervised learning from expert demonstrations . It is also called learning from demonstration and apprenticeship learning. It has been applied to underactuated robotics, self-driving cars, quadcopter navigation, helicopter aerobatics, and locomotion. == Approaches == Expert demonstrations are recordings of an expert performing the desired task, often collected as state-action pairs ( o t ∗ , a t ∗ ) {\displaystyle (o_{t}^{},a_{t}^{})} . === Behavior Cloning === Behavior Cloning (BC) is the most basic form of imitation learning. Essentially, it uses supervised learning to train a policy π θ {\displaystyle \pi _{\theta }} such that, given an observation o t {\displaystyle o_{t}} , it would output an action distribution π θ ( ⋅ | o t ) {\displaystyle \pi _{\theta }(\cdot |o_{t})} that is approximately the same as the action distribution of the experts. BC is susceptible to distribution shift. Specifically, if the trained policy differs from the expert policy, it might find itself straying from expert trajectory into observations that would have never occurred in expert trajectories. This was already noted by ALVINN, where they trained a neural network to drive a van using human demonstrations. They noticed that because a human driver never strays far from the path, the network would never be trained on what action to take if it ever finds itself straying far from the path. === DAgger === DAgger (Dataset Aggregation) improves on behavior cloning by iteratively training on a dataset of expert demonstrations. In each iteration, the algorithm first collects data by rolling out the learned policy π θ {\displaystyle \pi _{\theta }} . Then, it queries the expert for the optimal action a t ∗ {\displaystyle a_{t}^{}} on each observation o t {\displaystyle o_{t}} encountered during the rollout. Finally, it aggregates the new data into the dataset D ← D ∪ { ( o 1 , a 1 ∗ ) , ( o 2 , a 2 ∗ ) , . . . , ( o T , a T ∗ ) } {\displaystyle D\leftarrow D\cup \{(o_{1},a_{1}^{}),(o_{2},a_{2}^{}),...,(o_{T},a_{T}^{})\}} and trains a new policy on the aggregated dataset. === Decision transformer === The Decision Transformer approach models reinforcement learning as a sequence modelling problem. Similar to Behavior Cloning, it trains a sequence model, such as a Transformer, that models rollout sequences ( R 1 , o 1 , a 1 ) , ( R 2 , o 2 , a 2 ) , … , ( R t , o t , a t ) , {\displaystyle (R_{1},o_{1},a_{1}),(R_{2},o_{2},a_{2}),\dots ,(R_{t},o_{t},a_{t}),} where R t = r t + r t + 1 + ⋯ + r T {\displaystyle R_{t}=r_{t}+r_{t+1}+\dots +r_{T}} is the sum of future reward in the rollout. During training time, the sequence model is trained to predict each action a t {\displaystyle a_{t}} , given the previous rollout as context: ( R 1 , o 1 , a 1 ) , ( R 2 , o 2 , a 2 ) , … , ( R t , o t ) {\displaystyle (R_{1},o_{1},a_{1}),(R_{2},o_{2},a_{2}),\dots ,(R_{t},o_{t})} During inference time, to use the sequence model as an effective controller, it is simply given a very high reward prediction R {\displaystyle R} , and it would generalize by predicting an action that would result in the high reward. This was shown to scale predictably to a Transformer with 1 billion parameters that is superhuman on 41 Atari games. === Other approaches === See for more examples. == Related approaches == Inverse Reinforcement Learning (IRL) learns a reward function that explains the expert's behavior and then uses reinforcement learning to find a policy that maximizes this reward. Recent works have also explored multi-agent extensions of IRL in networked systems. Generative Adversarial Imitation Learning (GAIL) uses generative adversarial networks (GANs) to match the distribution of agent behavior to the distribution of expert demonstrations. It extends a previous approach using game theory.

    Read more →
  • Permutation automaton

    Permutation automaton

    In automata theory, a permutation automaton, or pure-group automaton, is a deterministic finite automaton such that each input symbol permutes the set of states. Formally, a deterministic finite automaton A may be defined by the tuple (Q, Σ, δ, q0, F), where Q is the set of states of the automaton, Σ is the set of input symbols, δ is the transition function that takes a state q and an input symbol x to a new state δ(q,x), q0 is the initial state of the automaton, and F is the set of accepting states (also: final states) of the automaton. A is a permutation automaton if and only if, for every two distinct states qi and qj in Q and every input symbol x in Σ, δ(qi,x) ≠ δ(qj,x). A formal language is p-regular (also: a pure-group language) if it is accepted by a permutation automaton. For example, the set of strings of even length forms a p-regular language: it may be accepted by a permutation automaton with two states in which every transition replaces one state by the other. == Applications == The pure-group languages were the first interesting family of regular languages for which the star height problem was proved to be computable. Another mathematical problem on regular languages is the separating words problem, which asks for the size of a smallest deterministic finite automaton that distinguishes between two given words of length at most n – by accepting one word and rejecting the other. The known upper bound in the general case is O ( n 2 / 5 ( log ⁡ n ) 3 / 5 ) {\displaystyle O(n^{2/5}(\log n)^{3/5})} . The problem was later studied for the restriction to permutation automata. In this case, the known upper bound changes to O ( n 1 / 2 ) {\displaystyle O(n^{1/2})} .

    Read more →
  • Jun'ichi Tsujii

    Jun'ichi Tsujii

    Jun'ichi Tsujii (辻井 潤一, Tsujii Jun'ichi; born 7 February 1949) is a Japanese computer scientist specializing in natural language processing and text mining, particularly in the field of biology and bioinformatics. == Education == Tsujii received his Bachelor of Engineering, Master of Engineering and PhD degrees in electrical engineering from Kyoto University in 1971, 1973, and 1978 respectively. He was Assistant Professor and Associate Professor at Kyoto University, before accepting a position as Professor of Computational Linguistics at the University of Manchester Institute of Science and Technology (UMIST) in 1988. He was President of the Association for Computational Linguistics (ACL) in 2006, and has been a permanent member of the International Committee on Computational Linguistics (ICCL) since 1992, and the chair of the committee since 2014. == Research == Since May 2015, Tsujii has been the director of the Artificial Intelligence Research Center at the National Institute of Advanced Industrial Science and Technology, Japan. Tsujii was previously a Principal Researcher at Microsoft Research Asia (MSRA). Before joining MSRA, he was a professor at the University of Tokyo, where he belonged to both the School of Inter-faculty Initiative on Informatics and the Graduate School of Information Science and Technology. Tsujii is also a Visiting Professor and Scientific Advisor at the National Centre for Text Mining (NaCTeM) at the University of Manchester in the United Kingdom. == Awards == On 14 May 2010, Tsujii was awarded the Medals of Honor with Purple Ribbon, one of Japan's highest awards, presented to influential contributors in the fields of art, academics or sports. In September 2014, Tsujii was awarded the FUNAI Achievement Award at the Forum on Information Technology (FIT), which took place at the University of Tsukuba. The award is presented to distinguished individuals engaged in research or related business activities in the field of Information Technology who have produced excellent achievements in the field, are still active in leading positions and have strong impact on young students and researchers. In December 2014, Tsujii was named as an ACL Fellow, in recognition of his significant contributions to MT, parsing by unification-based grammar and text mining for biology. In March 2016, Tsujii was awarded Okawa Prize for his contribution to the field of Natural Language Processing, Machine Translation and Text Mining, together with Professor Jaime Carbonnel of CMU. In August 2021, Tsujii received ACL Lifetime Achievement Award, which is considered the most prestigious award in the field of Computational Linguistics and Natural Language Processing. In May 2022, Tsujii received the Order of the Sacred Treasure, Gold Rays and Neck Ribbon, from the Japanese government. In October 2024, Tsujii was designated a Person of Cultural Merit. == Selected publications == Oiwa, Hidekazu; Tsujii, Jun'ichi (2014). Common Space Embedding of Primal-Dual Relation Semantic Spaces. COLING 2014. Dublin. pp. 1579–1590. Taura, K.; Matsuzaki, T.; Miwa, M.; Kamoshida, Y.; Yokoyama, D.; Dun, N.; Shibata, T.; Jun, C. S.; Tsujii, J. (2013). "Design and implementation of GXP make – A workflow system based on make". Future Generation Computer Systems. 29 (2): 662–672. doi:10.1016/j.future.2011.05.026. S2CID 31627886. Sun, X.; Zhang, Y.; Matsuzaki, T.; Tsuruoka, Y.; Tsujii, J. (2013). "Probabilistic Chinese word segmentation with non-local information and stochastic training". Information Processing & Management. 49 (3): 626–636. doi:10.1016/j.ipm.2012.12.003. Mu, T.; Goulermas, J. Y.; Tsujii, J.; Ananiadou, S. (2012). "Proximity-Based Frameworks for Generating Embeddings from Multi-Output Data". IEEE Transactions on Pattern Analysis and Machine Intelligence. 34 (11): 2216–2232. Bibcode:2012ITPAM..34.2216M. doi:10.1109/TPAMI.2012.20. PMID 23289130. S2CID 711467. Miwa, M.; Sætre, R.; Kim, J. D.; Tsujii, J. (2010). "Event Extraction with Complex Event Classification Using Rich Features". Journal of Bioinformatics and Computational Biology. 08 (1): 131–146. doi:10.1142/S0219720010004586. PMID 20183879. Kim, J. D.; Ohta, T.; Tsujii, J. (2008). "Corpus annotation for mining biomedical events from literature". BMC Bioinformatics. 9 10. doi:10.1186/1471-2105-9-10. PMC 2267702. PMID 18182099. Miyao, Y.; Tsujii, J. (2008). "Feature Forest Models for Probabilistic HPSG Parsing". Computational Linguistics. 34: 35–80. doi:10.1162/coli.2008.34.1.35. S2CID 885002. Sagae, Kenji; Tsujii, Jun'ichi (2007). Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles. EMNLP-CoNLL. pp. 1044–1050. Ananiadou, S; Pyysalo, S; Tsujii, J; Kell, D. B. (2010). "Event extraction for systems biology by text mining the literature". Trends in Biotechnology. 28 (7): 381–90. doi:10.1016/j.tibtech.2010.04.005. PMID 20570001. Tsuruoka, Y.; Tateishi, Y.; Kim, J. D.; Ohta, T.; McNaught, J.; Ananiadou, S.; Tsujii, J. (2005). "Developing a Robust Part-of-Speech Tagger for Biomedical Text". Advances in Informatics. Lecture Notes in Computer Science. Vol. 3746. p. 382. doi:10.1007/11573036_36. ISBN 978-3-540-29673-7. S2CID 206592413. Tsuruoka, Y.; Tsujii, J. (2005). Bidirectional inference with the easiest-first strategy for tagging sequence data. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT '05. pp. 467–474. doi:10.3115/1220575.1220634. Tsujii, J.; Ananiadou, S. (2005). "Thesaurus or Logical Ontology, Which One Do We Need for Text Mining?". Language Resources and Evaluation. 39: 77–90. doi:10.1007/s10579-005-2697-0. S2CID 3204827. Kazama, J. I.; Tsujii, J. I. (2005). "Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization". Machine Learning. 60 (1–3): 159–194. doi:10.1007/s10994-005-0911-3. hdl:10119/3305. Matsuzaki, T.; Miyao, Y.; Tsujii, J. I. (2005). Probabilistic CFG with latent annotations. 43rd Annual Meeting on Association for Computational Linguistics - ACL '05. p. 75. doi:10.3115/1219840.1219850. Kim, J. -D.; Ohta, T.; Tateisi, Y.; Tsujii, J. (2003). "GENIA corpus--a semantically annotated corpus for bio-textmining". Bioinformatics. 19: i180–i182. doi:10.1093/bioinformatics/btg1023. PMID 12855455. Hirschman, L.; Park, J. C.; Tsujii, J.; Wong, L.; Wu, C. H. (2002). "Accomplishments and challenges in literature data mining for biology". Bioinformatics. 18 (12): 1553–1561. doi:10.1093/bioinformatics/18.12.1553. PMID 12490438. Torisawa, K.; Tsujii, J. I. (1996). Computing phrasal-signs in HPSG prior to parsing. 16th conference on Computational linguistics -. Vol. 2. p. 949. doi:10.3115/993268.993332.

    Read more →
  • N-jet

    N-jet

    An N-jet is the set of (partial) derivatives of a function f ( x ) {\displaystyle f(x)} up to order N. Specifically, in the area of computer vision, the N-jet is usually computed from a scale space representation L {\displaystyle L} of the input image f ( x , y ) {\displaystyle f(x,y)} , and the partial derivatives of L {\displaystyle L} are used as a basis for expressing various types of visual modules. For example, algorithms for tasks such as feature detection, feature classification, stereo matching, tracking and object recognition can be expressed in terms of N-jets computed at one or several scales in scale space.

    Read more →
  • Gato (DeepMind)

    Gato (DeepMind)

    Gato is a deep neural network for a range of complex tasks that exhibits multimodality. It can perform tasks such as engaging in a dialogue, playing video games, controlling a robot arm to stack blocks, and more. == Overview == Gato was created by researchers at London-based AI firm DeepMind. It is a transformer, like GPT-3. According to MIT Technology Review, the system "learns multiple different tasks at the same time, which means it can switch between them without having to forget one skill before learning another" whereas "[t]he AI systems of today are called “narrow,” meaning they can only do a specific, restricted set of tasks such as generate text", and according to The Independent, it is a "'generalist agent' that can carry out a huge range of complex tasks, from stacking blocks to writing poetry". It uses supervised learning with 1.2B parameters. The technology has been described as "general purpose" artificial intelligence and a "step toward" artificial general intelligence.

    Read more →
  • ISO 2033

    ISO 2033

    The ISO 2033:1983 standard ("Coding of machine readable characters (MICR and OCR)") defines character sets for use with Optical Character Recognition or Magnetic Ink Character Recognition systems. The Japanese standard JIS X 9010:1984 ("Coding of machine readable characters (OCR and MICR)", originally designated JIS C 6229-1984) is closely related. == Character set for OCR-A == The version of the encoding for the OCR-A font registered with the ISO-IR registry as ISO-IR-91 is the Japanese (JIS X 9010 / JIS C 6229) version, which differs from the encoding defined by ISO 2033 only in the addition of a Yen sign at 5C. == Character set for OCR-B == The version of the G0 set for the OCR-B font registered with the ISO-IR registry as ISO-IR-92 is the Japanese (JIS X 9010 / JIS C 6229) version, which differs from the encoding defined by ISO 2033 only in being based on JIS-Roman (with a dollar sign at 0x24 and a Yen sign at 0x5C) rather than on the ISO 646 IRV (with a backslash at 0x5C and, at the time, a universal currency sign (¤) at 0x24). Besides those code points, it differs from ASCII only in omitting the backtick (`) and tilde (~). An additional supplementary set registered as ISO-IR-93 assigns the pound sign (£), universal currency sign (¤) and section sign (§) to their ISO-8859-1 codepoints, and the backslash to the ISO-8859-1 codepoint for the Yen sign. == Character set for JIS X 9008 (JIS C 6257) == JIS X 9010 (JIS C 6229) also defines character sets for the JIS X 9008:1981 (formerly JIS C 6257-1981) "hand-printed" OCR font. These include subsets of the JIS X 0201 Roman set (registered as ISO-IR-94 and omitting the backtick (`), lowercase letters, curly braces ({, }) and overline (‾)), and kana set (registered as ISO-IR-96 and omitting the East Asian style comma (、) and full stop (。), the interpunct (・) and the small kana), in addition to a set (registered as ISO-IR-95) containing only the backslash, which is assigned to the same code point as in ISO-IR-93. The JIS C 6527 font stylises the slash and backslash characters with a doubled appearance. The character names given are "Solidus" and "Reverse Solidus", matching the Unicode character names for the ASCII slash and backslash. However, the Unicode Optical Character Recognition block includes an additional code point for an "OCR Double Backslash" (⑊), although not for a double (forward) slash, although a double slash is available elsewhere, as U+2AFD ⫽ DOUBLE SOLIDUS OPERATOR. == Character set for E-13B == The ISO-IR-98 encoding defined by ISO 2033 encodes the character repertoire of the E13B font, as used with magnetic ink character recognition. Although ISO 2033 also specifies other encodings, the encoding for E-13B is the encoding referred to as ISO_2033_1983 by Perl libintl, and as ISO_2033-1983 or csISO2033 by the IANA. Other registered labels include iso-ir-98, its ISO-IR registration number, and simply e13b. The digits are preserved in their ASCII locations. Letters and symbols unavailable in the E13B font are omitted, while specialised punctuation for bank cheques included in the E13B font is added. The same symbols are available in Unicode in the Optical Character Recognition block.

    Read more →