White-box cryptography

White-box cryptography

In cryptography, the white-box model refers to an extreme attack scenario, in which an adversary has full unrestricted access to a cryptographic implementation, most commonly of a block cipher such as the Advanced Encryption Standard (AES). A variety of security goals may be posed (see the section below), the most fundamental being "unbreakability", requiring that any (bounded) attacker should not be able to extract the secret key hardcoded in the implementation, while at the same time the implementation must be fully functional. In contrast, the black-box model only provides an oracle access to the analyzed cryptographic primitive (in the form of encryption and/or decryption queries). There is also a model in-between, the so-called gray-box model, which corresponds to additional information leakage from the implementation, more commonly referred to as side-channel leakage. White-box cryptography is a practice and study of techniques for designing and attacking white-box implementations. It has many applications, including digital rights management (DRM), pay television, protection of cryptographic keys in the presence of malware, mobile payments and cryptocurrency wallets. Examples of DRM systems employing white-box implementations include CSS and Widevine. White-box cryptography is closely related to the more general notions of obfuscation, in particular, to Black-box obfuscation, proven to be impossible, and to Indistinguishability obfuscation, constructed recently under well-founded assumptions but so far being infeasible to implement in practice. As of January 2023, there are no publicly known unbroken white-box designs of standard symmetric encryption schemes. On the other hand, there exist many unbroken white-box implementations of dedicated block ciphers designed specifically to achieve incompressibility (see § Security goals). == Security goals == Depending on the application, different security goals may be required from a white-box implementation. Specifically, for symmetric-key algorithms the following are distinguished: Unbreakability is the most fundamental goal requiring that a bounded attacker should not be able to recover the secret key embedded in the white-box implementation. Without this requirement, all other security goals are unreachable since a successful attacker can simply use a reference implementation of the encryption scheme together with the extracted key. One-wayness requires that a white-box implementation of an encryption scheme can not be used by a bounded attacker to decrypt ciphertexts. This requirement essentially turns a symmetric encryption scheme into a public-key encryption scheme, where the white-box implementation plays the role of the public key associated to the embedded secret key. This idea was proposed already in the famous work of Diffie and Hellman in 1976 as a potential public-key encryption candidate. Code lifting security is an informal requirement on the context, in which the white-box program is being executed. It demands that an attacker can not extract a functional copy of the program. This goal is particularly relevant in the DRM setting. Code obfuscation techniques are often used to achieve this goal. A commonly used technique is to compose the white-box implementation with so-called external encodings. These are lightweight secret encodings that modify the function computed by the white-box part of an application. It is required that their effect is canceled in other parts of the application in an obscure way, using code obfuscation techniques. Alternatively, the canceling counterparts can be applied on a remote server. Incompressibility requires that an attacker can not significantly compress a given white-box implementation. This can be seen as a way to achieve code lifting security (see above), since exfiltrating a large program from a constrained device (for example, an embedded or a mobile device) can be time-consuming and may be easy to detect by a firewall. Examples of incompressible designs include SPACE cipher, SPNbox, WhiteKey and WhiteBlock. These ciphers use large lookup tables that can be pseudorandomly generated from a secret master key. Although this makes the recovery of the master key hard, the lookup tables themselves play the role of an equivalent secret key. Thus, unbreakability is achieved only partially. Traceability (Traitor tracing) requires that each distributed white-box implementation contains a digital watermark allowing identification of the guilty user in case the white-box program is being leaked and distributed publicly. == History == The white-box model with initial attempts of white-box DES and AES implementations were first proposed by Chow, Eisen, Johnson and van Oorshot in 2003. The designs were based on representing the cipher as a network of lookup tables and obfuscating the tables by composing them with small (4- or 8-bit) random encodings. Such protection satisfied a property that each single obfuscated table individually does not contain any information about the secret key. Therefore, a potential attacker has to combine several tables in their analysis. The first two schemes were broken in 2004 by Billet, Gilbert, and Ech-Chatbi using structural cryptanalysis. The attack was subsequently called "the BGE attack". The numerous consequent design attempts (2005-2022) were quickly broken by practical dedicated attacks. In 2016, Bos, Hubain, Michiels and Teuwen showed that an adaptation of standard side-channel power analysis attacks can be used to efficiently and fully automatically break most existing white-box designs. This result created a new research direction about generic attacks (correlation-based, algebraic, fault injection) and protections against them. == Competitions == Four editions of the WhibOx contest were held in 2017, 2019, 2021 and 2024 respectively. These competitions invited white-box designers both from academia and industry to submit their implementation in the form of (possibly obfuscated) C code. At the same time, everyone could attempt to attack these programs and recover the embedded secret key. Each of these competitions lasted for about 4-5 months. WhibOx 2017 / CHES 2017 Capture the Flag Challenge targeted the standard AES block cipher. Among 94 submitted implementations, all were broken during the competition, with the strongest one staying unbroken for 28 days. WhibOx 2019 / CHES 2019 Capture the Flag Challenge again targeted the AES block cipher. Among 27 submitted implementations, 3 programs stayed unbroken throughout the competition, but were broken after 51 days since the publication. WhibOx 2021 / CHES 2021 Capture the Flag Challenge changed the target to ECDSA, a digital signature scheme based on elliptic curves. Among 97 submitted implementations, all were broken within at most 2 days. WhibOx 2024 / CHES 2024 Capture the Flag Challenge again targeted ECDSA. Among 47 submitted implementations, all were broken during the competition, with the strongest one staying unbroken for almost 5 days.

Site-specific browser

A site-specific browser (SSB) is a software application dedicated to accessing pages from a single source (site) on a computer network such as the Internet or a private intranet. SSBs typically simplify the more complex functions of a web browser by excluding the menus, toolbars and browser graphical user interface associated with functions that are external to the workings of a single site. Modern site-specific browsers range from simple browser windows without navigation controls to sophisticated desktop applications built with frameworks like Electron that bundle entire browser engines. This evolution has enabled many popular desktop applications to be built using web technologies, effectively making them advanced site-specific browsers. == History == === Early development === One of the earliest examples of an SSB was MacDICT, a Mac OS 9 application that accessed various websites to define, translate, or find synonyms for words typed into a text box. However, the first general-purpose SSB is considered to be Bubbles, which launched in late 2005 on the Windows platform. Bubbles introduced the term "Site Specific Extensions" for SSB userscripts and created the first SSB JavaScript API. In 2007, Mozilla announced Prism (originally called WebRunner), a project to integrate web applications with the desktop. That same year, Todd Ditchendorf, a former Apple Dashboard engineer, released Fluid for macOS. On 2 September 2008, Google Chrome was released with a built-in "Create application shortcut" feature, bringing SSB functionality to mainstream users. This feature allowed any website to be launched in a separate window without the browser interface. === Modern era === The landscape of site-specific browsers changed dramatically with the introduction of Electron in 2013 (originally called Atom Shell). Electron combined Chromium and Node.js into a single runtime, enabling developers to build desktop applications using web technologies. This framework has since powered applications used by hundreds of millions of users, including Visual Studio Code, Slack, Discord, and Microsoft Teams. In 2015, the concept of Progressive Web Apps (PWAs) was introduced by Google engineers Alex Russell and Frances Berriman, representing a parallel evolution in web-to-desktop technology. While PWAs share similar goals with SSBs, they follow web standards and can be installed directly from browsers. More recently, alternative frameworks like Tauri have emerged, offering significantly smaller application sizes by using the system's native web renderer instead of bundling Chromium. == Technical implementation == Site-specific browsers can be implemented through various approaches: === Browser-based SSBs === The simplest form of SSB is created through browser features that allow websites to run in separate windows without the standard browser interface. Modern Chromium-based browsers offer "Install as app" or "Create shortcut" functionality that creates a dedicated window for a specific website. These SSBs share the browser's underlying engine and resources but operate in isolated windows. === Framework-based SSBs === More sophisticated SSBs are built using application frameworks: Electron: Bundles a complete Chromium browser with Node.js, resulting in applications of 85MB or larger. Each Electron application runs its own browser instance, providing full access to system APIs but consuming significant resources. Tauri: Uses the operating system's native web rendering engine (WebView2 on Windows, WebKit on macOS, and WebKitGTK on Linux), resulting in applications typically 2.5-10MB in size. Other frameworks: Include Neutralino.js (ultra-lightweight using system browser), Wails (Go-based), and the Chromium Embedded Framework (CEF). == Comparison with Progressive Web Apps == While site-specific browsers and Progressive Web Apps (PWAs) share the goal of bringing web content to the desktop, they differ in several key aspects: == Applications == Site-specific browsers have become the foundation for many popular desktop applications: Communication and collaboration: Many modern communication tools are built as SSBs, including Slack, Discord, Microsoft Teams, and WhatsApp Desktop. These applications benefit from web-based development while providing desktop integration. Development tools: Visual Studio Code, used by 73.6% of developers according to Stack Overflow's 2024 survey, is built with Electron, as are Atom and GitHub Desktop. Productivity software: Applications like Notion, Obsidian, and various project management tools use SSB technology to provide consistent experiences across platforms. Security and Privacy: Web browsers can be modified to only have access to a single site, in order to protect the security and privacy of the user via compartmentalization == Security and performance == === Memory usage === Framework-based SSBs, particularly those using Electron, are known for high memory consumption. Studies show Electron applications typically use 120-300MB at baseline, with complex applications consuming significantly more. This is approximately 5-10 times more memory than equivalent native applications. === Security considerations === SSBs can provide security benefits through process isolation, where each application runs in its own sandboxed environment. However, bundling an entire browser engine also means each application must be updated independently to patch security vulnerabilities. Research presented at the Network and Distributed System Security (NDSS) Symposium has identified various security challenges specific to Electron applications. === Bundle sizes === The choice of framework significantly impacts application size: Electron applications: 85MB+ (includes full Chromium) Tauri applications: 2.5-10MB (uses system WebView) Browser-based SSBs: No additional download (uses existing browser) == Software == === Browser support === Most modern browsers provide some form of SSB functionality: Chromium-based browsers (Google Chrome, Microsoft Edge, Brave, Opera, Vivaldi): "Install as app" or "Create shortcut" feature Safari: "Add to Dock" feature in macOS Sonoma (2023) Firefox: Removed SSB support in December 2020 (version 85) GNOME Web: "Install Site as Web Application" feature === Standalone tools === ==== Active ==== WebCatalog (Windows, macOS, Linux) – Manages multiple SSBs with isolated storage Fluid (macOS) – Pioneering SSB creator for Mac Unite (macOS) – Creates SSBs with customization options Coherence X (macOS) – Advanced SSB creation tool Pake (cross-platform) – Open-source SSB creator Wavebox (cross-platform) – Workspace browser with SSB features ==== Discontinued ==== Mozilla Prism – Cross-platform SSB creator (discontinued 2011) Nativefier – Command-line SSB creator (discontinued 2023) Epichrome – macOS SSB creator (discontinued 2021) === Development frameworks === Electron – Most popular framework, bundles Chromium and Node.js Tauri – Rust-based framework using system WebView Chromium Embedded Framework (CEF) – C++ library for embedding Chromium Neutralino.js – Lightweight framework using system browser Wails – Go-based framework for web frontends

List of text mining software

Text mining computer programs are available from many commercial and open source companies and sources. == Commercial == Angoss – Angoss Text Analytics provides entity and theme extraction, topic categorization, sentiment analysis and document summarization capabilities via the embedded AUTINDEX – is a commercial text mining software package based on sophisticated linguistics by IAI (Institute for Applied Information Sciences), Saarbrücken. DigitalMR – social media listening & text+image analytics tool for market research. FICO Score – leading provider of analytics. General Sentiment – Social Intelligence platform that uses natural language processing to discover affinities between the fans of brands with the fans of traditional television shows in social media. Stand alone text analytics to capture social knowledge base on billions of topics stored to 2004. IBM LanguageWare – the IBM suite for text analytics (tools and Runtime). IBM SPSS – provider of Modeler Premium (previously called IBM SPSS Modeler and IBM SPSS Text Analytics), which contains advanced NLP-based text analysis capabilities (multi-lingual sentiment, event and fact extraction), that can be used in conjunction with Predictive Modeling. Text Analytics for Surveys provides the ability to categorize survey responses using NLP-based capabilities for further analysis or reporting. Inxight – provider of text analytics, search, and unstructured visualization technologies. (Inxight was bought by Business Objects that was bought by SAP AG in 2008). Language Computer Corporation – text extraction and analysis tools, available in multiple languages. Lexalytics – provider of a text analytics engine used in Social Media Monitoring, Voice of Customer, Survey Analysis, and other applications. Salience Engine. The software provides the unique capability of merging the output of unstructured, text-based analysis with structured data to provide additional predictive variables for improved predictive models and association analysis. Linguamatics – provider of natural language processing (NLP) based enterprise text mining and text analytics software, I2E, for high-value knowledge discovery and decision support. Mathematica – provides built in tools for text alignment, pattern matching, clustering and semantic analysis. See Wolfram Language, the programming language of Mathematica. MATLAB offers Text Analytics Toolbox for importing text data, converting it to numeric form for use in machine and deep learning, sentiment analysis and classification tasks. Medallia – offers one system of record for survey, social, text, written and online feedback. NetMiner – software for network analysis and text mining. Supports social media and bibliographic data collection, NLP for english and chinese, sentiment analysis, work co-occurrence network(text network analysis) and visualization. NetOwl – suite of multilingual text and entity analytics products, including entity extraction, link and event extraction, sentiment analysis, geotagging, name translation, name matching, and identity resolution, among others. PolyAnalyst - text analytics environment. PoolParty Semantic Suite - graph-based text mining platform. RapidMiner with its Text Processing Extension – data and text mining software. SAS – SAS Text Miner and Teragram; commercial text analytics, natural language processing, and taxonomy software used for Information Management. Sketch Engine – a corpus manager and analysis software which providing creating text corpora from uploaded texts or the Web including part-of-speech tagging and lemmatization or detecting a particular website. Sysomos – provider social media analytics software platform, including text analytics and sentiment analysis on online consumer conversations. WordStat – Content analysis and text mining add-on module of QDA Miner for analyzing large amounts of text data. == Open source == Carrot2 – text and search results clustering framework. GATE – general Architecture for Text Engineering, an open-source toolbox for natural language processing and language engineering. Gensim – large-scale topic modelling and extraction of semantic information from unstructured text (Python). KH Coder – for Quantitative Content Analysis or Text Mining The KNIME Text Processing extension. Natural Language Toolkit (NLTK) – a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. OpenNLP – natural language processing. Orange with its text mining add-on. The PLOS Text Mining Collection. The programming language R provides a framework for text mining applications in the package tm. The Natural Language Processing task view contains tm and other text mining library packages. spaCy – open-source Natural Language Processing library for Python Stanbol – an open source text mining engine targeted at semantic content management. Voyant Tools – a web-based text analysis environment, created as a scholarly project.

Randomized weighted majority algorithm

The randomized weighted majority algorithm is an algorithm in machine learning theory for aggregating expert predictions to a series of decision problems. It is a simple and effective method based on weighted voting which improves on the mistake bound of the deterministic weighted majority algorithm. In fact, in the limit, its prediction rate can be arbitrarily close to that of the best-predicting expert. == Example == Imagine that every morning before the stock market opens, we get a prediction from each of our "experts" about whether the stock market will go up or down. Our goal is to somehow combine this set of predictions into a single prediction that we then use to make a buy or sell decision for the day. The principal challenge is that we do not know which experts will give better or worse predictions. The RWMA gives us a way to do this combination such that our prediction record will be nearly as good as that of the single expert which, in hindsight, gave the most accurate predictions. == Motivation == In machine learning, the weighted majority algorithm (WMA) is a deterministic meta-learning algorithm for aggregating expert predictions. In pseudocode, the WMA is as follows: initialize all experts to weight 1 for each round: add each expert's weight to the option they predicted predict the option with the largest weighted sum multiply the weights of all experts who predicted wrongly by 1 2 {\displaystyle {\frac {1}{2}}} Suppose there are n {\displaystyle n} experts and the best expert makes m {\displaystyle m} mistakes. Then, the weighted majority algorithm (WMA) makes at most 2.4 ( log 2 ⁡ n + m ) {\displaystyle 2.4(\log _{2}n+m)} mistakes. This bound is highly problematic in the case of highly error-prone experts. Suppose, for example, the best expert makes a mistake 20% of the time; that is, in N = 100 {\displaystyle N=100} rounds using n = 10 {\displaystyle n=10} experts, the best expert makes m = 20 {\displaystyle m=20} mistakes. Then, the weighted majority algorithm only guarantees an upper bound of 2.4 ( log 2 ⁡ 10 + 20 ) ≈ 56 {\displaystyle 2.4(\log _{2}10+20)\approx 56} mistakes. As this is a known limitation of the weighted majority algorithm, various strategies have been explored in order to improve the dependence on m {\displaystyle m} . In particular, we can do better by introducing randomization. Drawing inspiration from the Multiplicative Weights Update Method algorithm, we will probabilistically make predictions based on how the experts have performed in the past. Similarly to the WMA, every time an expert makes a wrong prediction, we will decrement their weight. Mirroring the MWUM, we will then use the weights to make a probability distribution over the actions and draw our action from this distribution (instead of deterministically picking the majority vote as the WMA does). == Randomized weighted majority algorithm (RWMA) == The randomized weighted majority algorithm is an attempt to improve the dependence of the mistake bound of the WMA on m {\displaystyle m} . Instead of predicting based on majority vote, the weights, are used as probabilities for choosing the experts in each round and are updated over time (hence the name randomized weighted majority). Precisely, if w i {\displaystyle w_{i}} is the weight of expert i {\displaystyle i} , let W = ∑ i w i {\displaystyle W=\sum _{i}w_{i}} . We will follow expert i {\displaystyle i} with probability w i W {\displaystyle {\frac {w_{i}}{W}}} . This results in the following algorithm: initialize all experts to weight 1. for each round: add all experts' weights together to obtain the total weight W {\displaystyle W} choose expert i {\displaystyle i} randomly with probability w i W {\displaystyle {\frac {w_{i}}{W}}} predict as the chosen expert predicts multiply the weights of all experts who predicted wrongly by β {\displaystyle \beta } The goal is to bound the worst-case expected number of mistakes, assuming that the adversary has to select one of the answers as correct before we make our coin toss. This is a reasonable assumption in, for instance, the stock market example provided above: the variance of a stock price should not depend on the opinions of experts that influence private buy or sell decisions, so we can treat the price change as if it was decided before the experts gave their recommendations for the day. The randomized algorithm is better in the worst case than the deterministic algorithm (weighted majority algorithm): in the latter, the worst case was when the weights were split 50/50. But in the randomized version, since the weights are used as probabilities, there would still be a 50/50 chance of getting it right. In addition, generalizing to multiplying the weights of the incorrect experts by β < 1 {\displaystyle \beta <1} instead of strictly 1 2 {\displaystyle {\frac {1}{2}}} allows us to trade off between dependence on m {\displaystyle m} and log 2 ⁡ n {\displaystyle \log _{2}n} . This trade-off will be quantified in the analysis section. == Analysis == Let W t {\displaystyle W_{t}} denote the total weight of all experts at round t {\displaystyle t} . Also let F t {\displaystyle F_{t}} denote the fraction of weight placed on experts which predict the wrong answer at round t {\displaystyle t} . Finally, let N {\displaystyle N} be the total number of rounds in the process. By definition, F t {\displaystyle F_{t}} is the probability that the algorithm makes a mistake on round t {\displaystyle t} . It follows from the linearity of expectation that if M {\displaystyle M} denotes the total number of mistakes made during the entire process, E [ M ] = ∑ t = 1 N F t {\displaystyle E[M]=\sum _{t=1}^{N}F_{t}} . After round t {\displaystyle t} , the total weight is decreased by ( 1 − β ) F t W t {\displaystyle \ (1-\beta )F_{t}W_{t}} , since all weights corresponding to a wrong answer are multiplied by β < 1 {\displaystyle \ \beta <1} . It then follows that W t + 1 = W t ( 1 − ( 1 − β ) F t ) {\displaystyle W_{t+1}=W_{t}(1-(1-\beta )F_{t})} . By telescoping, since W 1 = n {\displaystyle W_{1}=n} , it follows that the total weight after the process concludes is On the other hand, suppose that m {\displaystyle \ m} is the number of mistakes made by the best-performing expert. At the end, this expert has weight β m {\displaystyle \ \beta ^{m}} . It follows, then, that the total weight is at least this much; in other words, W ≥ β m {\displaystyle \ W\geq \beta ^{m}} . This inequality and the above result imply Taking the natural logarithm of both sides yields Now, the Taylor series of the natural logarithm is In particular, it follows that ln ⁡ ( 1 − ( 1 − β ) F t ) < − ( 1 − β ) F t {\displaystyle \ \ln(1-(1-\beta )F_{t})<-(1-\beta )F_{t}} . Thus, Recalling that E [ M ] = ∑ t = 1 N F t {\displaystyle E[M]=\sum _{t=1}^{N}F_{t}} and rearranging, it follows that Now, as β → 1 {\displaystyle \beta \to 1} from below, the first constant tends to 1 {\displaystyle 1} ; however, the second constant tends to + ∞ {\displaystyle +\infty } . To quantify this tradeoff, define ε = 1 − β {\displaystyle \varepsilon =1-\beta } to be the penalty associated with getting a prediction wrong. Then, again applying the Taylor series of the natural logarithm, It then follows that the mistake bound, for small ε {\displaystyle \varepsilon } , can be written in the form ( 1 + ϵ 2 + O ( ε 2 ) ) m + ϵ − 1 ln ⁡ ( n ) {\displaystyle \ \left(1+{\frac {\epsilon }{2}}+O(\varepsilon ^{2})\right)m+\epsilon ^{-1}\ln(n)} . In English, the less that we penalize experts for their mistakes, the more that additional experts will lead to initial mistakes but the closer we get to capturing the predictive accuracy of the best expert as time goes on. In particular, given a sufficiently low value of ε {\displaystyle \varepsilon } and enough rounds, the randomized weighted majority algorithm can get arbitrarily close to the correct prediction rate of the best expert. In particular, as long as m {\displaystyle m} is sufficiently large compared to ln ⁡ ( n ) {\displaystyle \ln(n)} (so that their ratio is sufficiently small), we can assign we can obtain an upper bound on the number of mistakes equal to This implies that the "regret bound" on the algorithm (that is, how much worse it performs than the best expert) is sublinear, at O ( m ln ⁡ ( n ) ) {\displaystyle O({\sqrt {m\ln(n)}})} . == Revisiting the motivation == Recall that the motivation for the randomized weighted majority algorithm was given by an example where the best expert makes a mistake 20% of the time. Precisely, in N = 100 {\displaystyle N=100} rounds, with n = 10 {\displaystyle n=10} experts, where the best expert makes m = 20 {\displaystyle m=20} mistakes, the deterministic weighted majority algorithm only guarantees an upper bound of 2.4 ( log 2 ⁡ 10 + 20 ) ≈ 56 {\displaystyle 2.4(\log _{2}10+20)\approx 56} . By the analysis above, it follows that minimizing the number of worst-case expected mistakes is equivalent to minimizing the fun

Handwriting recognition

Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning (optical character recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words. == Offline recognition == Offline handwriting recognition involves the automatic conversion of text in an image into letter codes that are usable within computer and text-processing applications. The data obtained by this form is regarded as a static representation of handwriting. Offline handwriting recognition is comparatively difficult, as different people have different handwriting styles. And, as of today, OCR engines are primarily focused on machine printed text and ICR for hand "printed" (written in capital letters) text. === Traditional techniques === ==== Character extraction ==== Offline character recognition often involves scanning a form or document. This means the individual characters contained in the scanned image will need to be extracted. Tools exist that are capable of performing this step. However, there are several common imperfections in this step. The most common is when characters that are connected are returned as a single sub-image containing both characters. This causes a major problem in the recognition stage. Yet many algorithms are available that reduce the risk of connected characters. ==== Character recognition ==== After individual characters have been extracted, a recognition engine is used to identify the corresponding computer character. Several different recognition techniques are currently available. ===== Feature extraction ===== Feature extraction works in a similar fashion to neural network recognizers. However, programmers must manually determine the properties they feel are important. This approach gives the recognizer more control over the properties used in identification. Yet any system using this approach requires substantially more development time than a neural network because the properties are not learned automatically. === Modern techniques === Where traditional techniques focus on segmenting individual characters for recognition, modern techniques focus on recognizing all the characters in a segmented line of text. Particularly they focus on machine learning techniques that are able to learn visual features, avoiding the limiting feature engineering previously used. State-of-the-art methods use convolutional networks to extract visual features over several overlapping windows of a text line image which a recurrent neural network uses to produce character probabilities. == Online recognition == Online handwriting recognition involves the automatic conversion of text as it is written on a special digitizer or PDA, where a sensor picks up the pen-tip movements as well as pen-up/pen-down switching. This kind of data is known as digital ink and can be regarded as a digital representation of handwriting. The obtained signal is converted into letter codes that are usable within computer and text-processing applications. The elements of an online handwriting recognition interface typically include: a pen or stylus for the user to write with a touch sensitive surface, which may be integrated with, or adjacent to, an output display. a software application which interprets the movements of the stylus across the writing surface, translating the resulting strokes into digital text. The process of online handwriting recognition can be broken down into a few general steps: preprocessing, feature extraction and classification The purpose of preprocessing is to discard irrelevant information in the input data, that can negatively affect the recognition. This concerns speed and accuracy. Preprocessing usually consists of binarization, normalization, sampling, smoothing and denoising. The second step is feature extraction. Out of the two- or higher-dimensional vector field received from the preprocessing algorithms, higher-dimensional data is extracted. The purpose of this step is to highlight important information for the recognition model. This data may include information like pen pressure, velocity or the changes of writing direction. The last big step is classification. In this step, various models are used to map the extracted features to different classes and thus identifying the characters or words the features represent. === Hardware === Commercial products incorporating handwriting recognition as a replacement for keyboard input were introduced in the early 1980s. Examples include handwriting terminals such as the Pencept Penpad and the Inforite point-of-sale terminal. With the advent of the large consumer market for personal computers, several commercial products were introduced to replace the keyboard and mouse on a personal computer with a single pointing/handwriting system, such as those from Pencept, CIC and others. The first commercially available tablet-type portable computer was the Write-Top from Linus Technologies, released in July 1988. Its operating system was based on MS-DOS. In the early 1990s, hardware makers including NCR, IBM and EO released tablet computers running the PenPoint operating system developed by GO Corp. PenPoint used handwriting recognition and gestures throughout and provided the facilities to third-party software. IBM's tablet computer was the first to use the ThinkPad name and used IBM's handwriting recognition. This recognition system was later ported to Microsoft Windows for Pen Computing, and IBM's Pen for OS/2. None of these were commercially successful. Advancements in electronics allowed the computing power necessary for handwriting recognition to fit into a smaller form factor than tablet computers, and handwriting recognition is often used as an input method for hand-held PDAs. The first PDA to provide written input was the Apple Newton, which exposed the public to the advantage of a streamlined user interface. However, the device was not a commercial success, owing to the unreliability of the software, which tried to learn a user's writing patterns. By the time of the release of the Newton OS 2.0, wherein the handwriting recognition was greatly improved, including unique features still not found in current recognition systems such as modeless error correction, the largely negative first impression had been made. After discontinuation of Apple Newton, the feature was incorporated in Mac OS X 10.2 and later as Inkwell. Palm later launched a successful series of PDAs based on the Graffiti recognition system. Graffiti improved usability by defining a set of "unistrokes", or one-stroke forms, for each character. This narrowed the possibility for erroneous input, although memorization of the stroke patterns did increase the learning curve for the user. The Graffiti handwriting recognition was found to infringe on a patent held by Xerox, and Palm replaced Graffiti with a licensed version of the CIC handwriting recognition which, while also supporting unistroke forms, pre-dated the Xerox patent. The court finding of infringement was reversed on appeal, and then reversed again on a later appeal. The parties involved subsequently negotiated a settlement concerning this and other patents. A Tablet PC is a notebook computer with a digitizer tablet and a stylus, which allows a user to handwrite text on the unit's screen. The operating system recognizes the handwriting and converts it into text. Windows Vista and Windows 7 include personalization features that learn a user's writing patterns or vocabulary for English, Japanese, Chinese Traditional, Chinese Simplified and Korean. The features include a "personalization wizard" that prompts for samples of a user's handwriting and uses them to retrain the system for higher accuracy recognition. This system is distinct from the less advanced handwriting recognition system employed in its Windows Mobile OS for PDAs. Although handwriting recognition is an input form that the public has become accustomed to, it has not achieved widespread use in either desktop computers or laptops. It is still generally accepted that keyboard input is both faster and more reliable. As of 2006, many PDAs offer handwriting input, sometimes even accepting natural cursive handwriting, but accuracy is still a problem, and some people still find even a simple on-screen keyboard more efficient. === Software === Early software could understand print handwriting where the characters were separated; however, cursive handwriting

Artificial intelligence of things

Artificial Intelligence of Things (AIoT) is the combination of artificial intelligence (AI) technologies with the Internet of things (IoT) infrastructure to create systems capable of sensing, learning, and acting on data without continuous human intervention. While IoT focuses on connectivity and sensor data collection, AI enables IoT devices to analyse data in real time and produce actionable outputs, including automated decisions at the edge. == Applications == === Manufacturing and predictive maintenance === Manufacturing accounts for the largest share of AIoT adoption by industry vertical. A common application is predictive maintenance, where sensors measuring vibration, temperature, current draw, and acoustic emissions feed machine learning models trained to detect signatures that precede equipment failure. These systems can flag developing faults weeks or months in advance, and in more advanced deployments can autonomously adjust machine parameters such as motor speed or cooling cycles to delay or prevent failure. === Other industries === In healthcare, AIoT enables remote patient monitoring through wearable devices that collect vital signs and apply AI models to detect anomalies or predict deterioration. In logistics, GPS and telematics sensors combined with AI models support real-time route optimisation, vehicle maintenance prediction, and fuel cost forecasting. Smart building systems use occupancy, temperature, and energy sensors with AI to dynamically adjust HVAC and lighting, reducing energy consumption. == Architecture == AIoT systems typically operate across three layers: a device layer of sensors and actuators that collect data, a connectivity layer that transmits data via protocols such as MQTT or HTTP, and a compute layer where AI models process the data either in the cloud or at the edge. The trend toward edge-based processing, where inference runs on low-cost processors near the data source rather than in a centralised cloud, has accelerated as hardware costs have fallen and applications increasingly require sub-second response times. == Market == Market sizing estimates for AIoT vary significantly depending on scope and definition. Fortune Business Insights valued the AIoT market at USD 35.65 billion in 2023, projecting growth to USD 253.86 billion by 2030 at a compound annual growth rate of 32.4%. Grand View Research estimated the broader market at USD 171.4 billion in 2024 with a CAGR of 31.7% through 2030, reflecting a wider definition that includes AI-integrated hardware components. North America accounted for approximately 40% of global market share in 2024, with the Asia-Pacific region projected as the fastest-growing market.

Semidefinite embedding

Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data. It is motivated by the observation that kernel Principal Component Analysis (kPCA) does not reduce the data dimensionality, as it leverages the Kernel trick to non-linearly map the original data into an inner-product space. == Algorithm == MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps: A neighbourhood graph is created. Each input is connected with its k-nearest input vectors (according to Euclidean distance metric) and all k-nearest neighbors are connected with each other. If the data is sampled well enough, the resulting graph is a discrete approximation of the underlying manifold. The neighbourhood graph is "unfolded" with the help of semidefinite programming. Instead of learning the output vectors directly, the semidefinite programming aims to find an inner product matrix that maximizes the pairwise distances between any two inputs that are not connected in the neighbourhood graph while preserving the nearest neighbors distances. The low-dimensional embedding is finally obtained by application of multidimensional scaling on the learned inner product matrix. The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich. == Optimization formulation == Let X {\displaystyle X\,\!} be the original input and Y {\displaystyle Y\,\!} be the embedding. If i , j {\displaystyle i,j\,\!} are two neighbors, then the local isometry constraint that needs to be satisfied is: | X i − X j | 2 = | Y i − Y j | 2 {\displaystyle |X_{i}-X_{j}|^{2}=|Y_{i}-Y_{j}|^{2}\,\!} Let G , K {\displaystyle G,K\,\!} be the Gram matrices of X {\displaystyle X\,\!} and Y {\displaystyle Y\,\!} (i.e.: G i j = X i ⋅ X j , K i j = Y i ⋅ Y j {\displaystyle G_{ij}=X_{i}\cdot X_{j},K_{ij}=Y_{i}\cdot Y_{j}\,\!} ). We can express the above constraint for every neighbor points i , j {\displaystyle i,j\,\!} in term of G , K {\displaystyle G,K\,\!} : G i i + G j j − G i j − G j i = K i i + K j j − K i j − K j i {\displaystyle G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji}\,\!} In addition, we also want to constrain the embedding Y {\displaystyle Y\,\!} to center at the origin: 0 = | ∑ i Y i | 2 ⇔ ( ∑ i Y i ) ⋅ ( ∑ i Y i ) ⇔ ∑ i , j Y i ⋅ Y j ⇔ ∑ i , j K i j {\displaystyle 0=|\sum _{i}Y_{i}|^{2}\Leftrightarrow (\sum _{i}Y_{i})\cdot (\sum _{i}Y_{i})\Leftrightarrow \sum _{i,j}Y_{i}\cdot Y_{j}\Leftrightarrow \sum _{i,j}K_{ij}} As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points. The objective function to be maximized is: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 {\displaystyle T(Y)={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}} Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold. The local isometry constraint Let τ = m a x { η i j | Y i − Y j | 2 } {\displaystyle \tau =max\{\eta _{ij}|Y_{i}-Y_{j}|^{2}\}\,\!} where η i j := { 1 if i is a neighbour of j 0 otherwise . {\displaystyle \eta _{ij}:={\begin{cases}1&{\mbox{if}}\ i{\mbox{ is a neighbour of }}j\\0&{\mbox{otherwise}}.\end{cases}}} prevents the objective function from diverging (going to infinity). Since the graph has N points, the distance between any two points | Y i − Y j | 2 ≤ N τ {\displaystyle |Y_{i}-Y_{j}|^{2}\leq N\tau \,\!} . We can then bound the objective function as follows: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 ≤ 1 2 N ∑ i , j ( N τ ) 2 = N 3 τ 2 2 {\displaystyle T(Y)={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}\leq {\dfrac {1}{2N}}\sum _{i,j}(N\tau )^{2}={\dfrac {N^{3}\tau ^{2}}{2}}\,\!} The objective function can be rewritten purely in the form of the Gram matrix: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 = 1 2 N ∑ i , j ( Y i 2 + Y j 2 − Y i ⋅ Y j − Y j ⋅ Y i ) = 1 2 N ( ∑ i , j Y i 2 + ∑ i , j Y j 2 − ∑ i , j Y i ⋅ Y j − ∑ i , j Y j ⋅ Y i ) = 1 2 N ( ∑ i , j Y i 2 + ∑ i , j Y j 2 − 0 − 0 ) = 1 N ( ∑ i Y i 2 ) = 1 N ( T r ( K ) ) {\displaystyle {\begin{aligned}T(Y)&{}={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}\\&{}={\dfrac {1}{2N}}\sum _{i,j}(Y_{i}^{2}+Y_{j}^{2}-Y_{i}\cdot Y_{j}-Y_{j}\cdot Y_{i})\\&{}={\dfrac {1}{2N}}(\sum _{i,j}Y_{i}^{2}+\sum _{i,j}Y_{j}^{2}-\sum _{i,j}Y_{i}\cdot Y_{j}-\sum _{i,j}Y_{j}\cdot Y_{i})\\&{}={\dfrac {1}{2N}}(\sum _{i,j}Y_{i}^{2}+\sum _{i,j}Y_{j}^{2}-0-0)\\&{}={\dfrac {1}{N}}(\sum _{i}Y_{i}^{2})={\dfrac {1}{N}}(Tr(K))\\\end{aligned}}\,\!} Finally, the optimization can be formulated as: Maximize T r ( K ) subject to K ⪰ 0 , ∑ i j K i j = 0 and G i i + G j j − G i j − G j i = K i i + K j j − K i j − K j i , ∀ i , j where η i j = 1 , {\displaystyle {\begin{aligned}&{\text{Maximize}}&&Tr(\mathbf {K} )\\&{\text{subject to}}&&\mathbf {K} \succeq 0,\sum _{ij}\mathbf {K} _{ij}=0\\&{\text{and}}&&G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji},\forall i,j{\mbox{ where }}\eta _{ij}=1,\end{aligned}}} After the Gram matrix K {\displaystyle K\,\!} is learned by semidefinite programming, the output Y {\displaystyle Y\,\!} can be obtained via Cholesky decomposition. In particular, the Gram matrix can be written as K i j = ∑ α = 1 N ( λ α V α i V α j ) {\displaystyle K_{ij}=\sum _{\alpha =1}^{N}(\lambda _{\alpha }V_{\alpha i}V_{\alpha j})\,\!} where V α i {\displaystyle V_{\alpha i}\,\!} is the i-th element of eigenvector V α {\displaystyle V_{\alpha }\,\!} of the eigenvalue λ α {\displaystyle \lambda _{\alpha }\,\!} . It follows that the α {\displaystyle \alpha \,\!} -th element of the output Y i {\displaystyle Y_{i}\,\!} is λ α V α i {\displaystyle {\sqrt {\lambda _{\alpha }}}V_{\alpha i}\,\!} .