Self-verifying finite automaton

Self-verifying finite automaton

In automata theory, a self-verifying finite automaton (SVFA) is a special kind of a nondeterministic finite automaton (NFA) with a symmetric kind of nondeterminism introduced by Hromkovič and Schnitger. Generally, in self-verifying nondeterminism, each computation path is concluded with any of the three possible answers: yes, no, and I do not know. For each input string, no two paths may give contradictory answers, namely both answers yes and no on the same input are not possible. At least one path must give answer yes or no, and if it is yes then the string is considered accepted. SVFA accept the same class of languages as deterministic finite automata (DFA) and NFA but have different state complexity. == Formal definition == An SVFA is represented formally by a 6-tuple, A=(Q, Σ, Δ, q0, Fa, Fr) such that (Q, Σ, Δ, q0, Fa) is an NFA, and Fa, Fr are disjoint subsets of Q. For each word w = a1a2 … an, a computation is a sequence of states r0,r1, …, rn, in Q with the following conditions: r0 = q0 ri+1 ∈ Δ(ri, ai+1), for i = 0, …, n−1. If rn ∈ Fa then the computation is accepting, and if rn ∈ Fr then the computation is rejecting. There is a requirement that for each w there is at least one accepting computation or at least one rejecting computation but not both. == Results == Each DFA is a SVFA, but not vice versa. Jirásková and Pighizzini proved that for every SVFA of n states, there exists an equivalent DFA of g ( n ) = Θ ( 3 n / 3 ) {\displaystyle g(n)=\Theta (3^{n/3})} states. Furthermore, for each positive integer n, there exists an n-state SVFA such that the minimal equivalent DFA has exactly g ( n ) {\displaystyle g(n)} states. Other results on the state complexity of SVFA were obtained by Jirásková and her colleagues.

Database virtualization

Database virtualization is the decoupling of the database layer, which lies between the storage and application layers within the application stack. Virtualization of the database layer enables a shift away from the physical, toward the logical or virtual. Virtualization enables compute and storage resources to be pooled and allocated on demand. This enables both the sharing of single server resources for multi-tenancy, as well as the pooling of server resources into a single logical database or cluster. In both cases, database virtualization provides increased flexibility, more granular and efficient allocation of pooled resources, and more scalable computing. == Virtual data partitioning == The act of partitioning data stores as a database grows has been in use for several decades. There are two primary ways that data has been partitioned inside legacy data management systems: Shared-data databases: an architecture that assumes all database cluster nodes share a single partition. Inter-node communications are used to synchronize update activities performed by different nodes on the cluster. Shared-data data management systems are limited to single-digit node clusters. Shared-nothing databases: an architecture in which all data is segregated to internally managed partitions with clear, well-defined data location boundaries. Shared-nothing databases require manual partition management. In virtual partitioning, logical data is abstracted from physical data by autonomously creating and managing large numbers of data partitions (100s to 1000s). Because they are autonomously maintained, the resources required to manage the partitions are minimal. This kind of massive partitioning results in: Partitions that are small, efficiently managed, and load-balanced. Systems that do not require re-partitioning events to define additional partitions, even when the hardware is changed. “Shared-data” and “shared-nothing” architectures allow scalability through multiple data partitions and cross-partition querying and transaction processing without full partition scanning. == Horizontal data partitioning == Partitioning database sources from consumers is a fundamental concept. With greater numbers of database sources, inserting a horizontal data virtualization layer between the sources and consumers helps address this complexity. Rick van der Lans, the author of multiple books on SQL and relational databases, has defined data virtualization as "the process of offering data consumers a data access interface that hides the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology." == Advantages == Added flexibility and agility for existing computing infrastructure. Enhanced database performance. Pooling and sharing computing resources, either splitting them (multi-tenancy) or combining them (clustering). Simplification of administration and management. Increased fault tolerance.

Local tangent space alignment

Local tangent space alignment (LTSA) is a method for manifold learning, which can efficiently learn a nonlinear embedding into low-dimensional coordinates from high-dimensional data, and can also reconstruct high-dimensional coordinates from embedding coordinates. It is based on the intuition that when a manifold is correctly unfolded, all of the tangent hyperplanes to the manifold will become aligned. It begins by computing the k-nearest neighbors of every point. It computes the tangent space at every point by computing the d-first principal components in each local neighborhood. It then optimizes to find an embedding that aligns the tangent spaces, but it ignores the label information conveyed by data samples, and thus can not be used for classification directly.

Optical character recognition

Optical character recognition (OCR) or optical character reader is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast). Widely used as a form of data entry from printed paper data records – whether passport documents, invoices, bank statements, computerized receipts, business cards, mail, printed data, or any suitable documentation – it is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed online, and used in machine processes such as cognitive computing, machine translation, (extracted) text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. Early versions needed to be trained with images of each character, and worked on one font at a time. Advanced systems capable of producing a high degree of accuracy for most fonts are now common, and with support for a variety of image file format inputs. Some systems are capable of reproducing formatted output that closely approximates the original page including images, columns, and other non-textual components. == History == Early optical character recognition may be traced to technologies involving telegraphy and creating reading devices for the blind. In 1914, Emanuel Goldberg developed a machine that read characters and converted them into standard telegraph code. Concurrently, Edmund Fournier d'Albe developed the Optophone, a handheld scanner that when moved across a printed page, produced tones that corresponded to specific letters or characters. In the late 1920s and into the 1930s, Emanuel Goldberg developed what he called a "Statistical Machine" for searching microfilm archives using an optical code recognition system. In 1931, he was granted US Patent number 1,838,389 for the invention. The patent was acquired by IBM. === Visually impaired users === In 1974, Ray Kurzweil started the company Kurzweil Computer Products, Inc. and continued development of omni-font OCR, which could recognize text printed in virtually any font. (Kurzweil is often credited with inventing omni-font OCR, but it was in use by companies, including CompuScan, in the late 1960s and 1970s.) Kurzweil used the technology to create a reading machine for blind people to have a computer read text to them out loud. The device included a CCD-type flatbed scanner and a text-to-speech synthesizer. On January 13, 1976, the finished product was unveiled during a widely reported news conference headed by Kurzweil and the leaders of the National Federation of the Blind. In 1978, Kurzweil Computer Products began selling a commercial version of the optical character recognition computer program. LexisNexis was one of the first customers, and bought the program to upload legal paper and news documents onto its nascent online databases. Two years later, Kurzweil sold his company to Xerox, which eventually spun it off as Scansoft, which merged with Nuance Communications. In the 2000s, OCR was made available online as a service (WebOCR), in a cloud computing environment, and in mobile applications like real-time translation of foreign-language signs on a smartphone. With the advent of smartphones and smartglasses, OCR can be used in internet connected mobile device applications that extract text captured using the device's camera. These devices that do not have built-in OCR functionality will typically use an OCR API to extract the text from the image file captured by the device. The OCR API returns the extracted text, along with information about the location of the detected text in the original image back to the device app for further processing (such as text-to-speech) or display. Various commercial and open source OCR systems are available for most common writing systems, including Latin, Cyrillic, Arabic, Hebrew, Indic, Bengali (Bangla), Devanagari, Tamil, Chinese, Japanese, and Korean characters. == Applications == OCR engines have been developed into software applications specializing in various subjects such as receipts, invoices, checks, and legal billing documents. The software can be used for: Entering data for business documents, e.g. checks, passports, invoices, bank statements and receipts Automatic number-plate recognition Passport recognition and information extraction in airports Automatically extracting key information from insurance documents Traffic-sign recognition Extracting business card information into a contact list Creating textual versions of printed documents, e.g. book scanning for Project Gutenberg Making electronic images of printed documents searchable, e.g. Google Books Converting handwriting in real-time to control a computer (pen computing) Defeating or testing the robustness of CAPTCHA anti-bot systems, though these are specifically designed to prevent OCR. Assistive technology for blind and visually impaired users Writing instructions for vehicles by identifying CAD images in a database that are appropriate to the vehicle design as it changes in real time Making scanned documents searchable by converting them to PDFs == Types == Optical character recognition (OCR) – targets typewritten text, one glyph or character at a time. Optical word recognition – targets typewritten text, one word at a time (for languages that use a space as a word divider). Usually just called "OCR". Intelligent character recognition (ICR) – also targets handwritten printscript or cursive text one glyph or character at a time, usually involving machine learning. Intelligent word recognition (IWR) – also targets handwritten printscript or cursive text, one word at a time. This is especially useful for languages where glyphs are not separated in cursive script. OCR is generally an offline process, which analyses a static document. There are cloud based services which provide an online OCR API service. Handwriting movement analysis can be used as input to handwriting recognition. Instead of merely using the shapes of glyphs and words, this technique is able to capture motion, such as the order in which segments are drawn, the direction, and the pattern of putting the pen down and lifting it. This additional information can make the process more accurate. This technology is also known as "online character recognition", "dynamic character recognition", "real-time character recognition", and "intelligent character recognition". == Techniques == === Pre-processing === OCR software often pre-processes images to improve the chances of successful recognition. Techniques include: De-skewing – if the document was not aligned properly when scanned, it may need to be tilted a few degrees clockwise or counterclockwise in order to make lines of text perfectly horizontal or vertical. Despeckling – removal of positive and negative spots, smoothing edges Binarization – conversion of an image from color or greyscale to black-and-white (called a binary image because there are two colors). The task is performed as a simple way of separating the text (or any other desired image component) from the background. The task of binarization is necessary since most commercial recognition algorithms work only on binary images, as it is simpler to do so. In addition, the effectiveness of binarization influences to a significant extent the quality of character recognition, and careful decisions are made in the choice of the binarization employed for a given input image type; since the quality of the method used to obtain the binary result depends on the type of image (scanned document, scene text image, degraded historical document, etc.). Line removal – Cleaning up non-glyph boxes and lines Layout analysis or zoning – Identification of columns, paragraphs, captions, etc. as distinct blocks. Especially important in multi-column layouts and tables. Line and word detection – Establishment of a baseline for word and character shapes, separating words as necessary. Script recognition – In multilingual documents, the script may change at the level of the words and hence, identification of the script is necessary, before the right OCR can be invoked to handle the specific script. Character isolation or segmentation – For per-character OCR, multiple characters that are connected due to image artifacts must be separated; single characters that are broken into multiple pieces due to artifacts must be connected. Normalization of aspect ratio and scale Segmentation of fixed-pitch fonts is accomplished relatively simply by aligning the image to a uniform grid based on where vertical grid lines will least often intersect black areas. For proportional fonts, more sophisticated techniques are needed because whitespace bet

Multi-surface method

The multi-surface method (MSM) is a form of decision making using the concept of piecewise-linear separability of datasets to categorize data. == Introduction == Two datasets are linearly separable if their convex hulls do not intersect. The method may be formulated as a feedforward neural network with weights that are trained via linear programming. Comparisons between neural networks trained with the MSM versus backpropagation show MSM is better able to classify data. The decision problem associated linear program for the MSM is NP-complete. == Mathematical formulation == Given two finite disjoint point sets A , B ∈ R n {\displaystyle {\mathcal {A,B}}\in \mathbb {R} ^{n}} , find a discriminant, f : R n → R {\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} } such that f ( A ) > 0 , f ( B ) ≤ 0 {\displaystyle f({\mathcal {A}})>0,f({\mathcal {B}})\leq 0} . If the intersection of convex hulls of the two sets is the empty set, then it is possible to use a single linear program to obtain a linear discriminant of the form, f ( x ) = c x + γ {\displaystyle f(x)=cx+\gamma } . Usually, in real applications, the sets' convex hulls do intersect, and a (often non-convex) piecewise-linear discriminant can be used, through the use of several linear programs.

Viaweb

Viaweb was a web-based application that allowed users to build and host their own online stores with little technical expertise using a web browser. The company was started in July 1995 by Paul Graham, Robert Morris (using the pseudonym "John McArtyem"), and Trevor Blackwell. Graham claims Viaweb was the first application service provider. Viaweb was also unusual for being partially written in the Lisp programming language. The software was originally called Webgen, but another company was using the same name, so the company renamed it to Viaweb, "because it worked via the Web". In 1998, Yahoo! Inc. bought Viaweb for 455,000 shares of Yahoo! capital stock, valued at about $49 million, and renamed it Yahoo! Store. Viaweb's example has been influential in Silicon Valley's entrepreneurial culture, largely due to Graham's widely read essays and his subsequent career as a successful venture capitalist.

Mathematics of neural networks in machine learning

An artificial neural network (ANN) or neural network combines biological principles with advanced statistics to solve problems in domains such as pattern recognition and game-play. ANNs adopt the basic model of neuron analogues connected to each other in a variety of ways. == Structure == === Neuron === A neuron with label j {\displaystyle j} receiving an input p j ( t ) {\displaystyle p_{j}(t)} from predecessor neurons consists of the following components: an activation a j ( t ) {\displaystyle a_{j}(t)} , the neuron's state, depending on a discrete time parameter, an optional threshold θ j {\displaystyle \theta _{j}} , which stays fixed unless changed by learning, an activation function f {\displaystyle f} that computes the new activation at a given time t + 1 {\displaystyle t+1} from a j ( t ) {\displaystyle a_{j}(t)} , θ j {\displaystyle \theta _{j}} and the net input p j ( t ) {\displaystyle p_{j}(t)} giving rise to the relation a j ( t + 1 ) = f ( a j ( t ) , p j ( t ) , θ j ) , {\displaystyle a_{j}(t+1)=f(a_{j}(t),p_{j}(t),\theta _{j}),} and an output function f out {\displaystyle f_{\text{out}}} computing the output from the activation o j ( t ) = f out ( a j ( t ) ) . {\displaystyle o_{j}(t)=f_{\text{out}}(a_{j}(t)).} Often the output function is simply the identity function. An input neuron has no predecessor but serves as input interface for the whole network. Similarly an output neuron has no successor and thus serves as output interface of the whole network. === Propagation function === The propagation function computes the input p j ( t ) {\displaystyle p_{j}(t)} to the neuron j {\displaystyle j} from the outputs o i ( t ) {\displaystyle o_{i}(t)} and typically has the form p j ( t ) = ∑ i o i ( t ) w i j . {\displaystyle p_{j}(t)=\sum _{i}o_{i}(t)w_{ij}.} === Bias === A bias term can be added, changing the form to the following: p j ( t ) = ∑ i o i ( t ) w i j + w 0 j , {\displaystyle p_{j}(t)=\sum _{i}o_{i}(t)w_{ij}+w_{0j},} where w 0 j {\displaystyle w_{0j}} is a bias. == Neural networks as functions == Neural network models can be viewed as defining a function that takes an input (observation) and produces an output (decision) f : X → Y {\displaystyle \textstyle f:X\rightarrow Y} or a distribution over X {\displaystyle \textstyle X} or both X {\displaystyle \textstyle X} and Y {\displaystyle \textstyle Y} . Sometimes models are intimately associated with a particular learning rule. A common use of the phrase "ANN model" is really the definition of a class of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons, number of layers or their connectivity). Mathematically, a neuron's network function f ( x ) {\displaystyle \textstyle f(x)} is defined as a composition of other functions g i ( x ) {\displaystyle \textstyle g_{i}(x)} , that can further be decomposed into other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between functions. A widely used type of composition is the nonlinear weighted sum, where f ( x ) = K ( ∑ i w i g i ( x ) ) {\displaystyle \textstyle f(x)=K\left(\sum _{i}w_{i}g_{i}(x)\right)} , where K {\displaystyle \textstyle K} (commonly referred to as the activation function) is some predefined function, such as the hyperbolic tangent, sigmoid function, softmax function, or rectifier function. The important characteristic of the activation function is that it provides a smooth transition as input values change, i.e. a small change in input produces a small change in output. The following refers to a collection of functions g i {\displaystyle \textstyle g_{i}} as a vector g = ( g 1 , g 2 , … , g n ) {\displaystyle \textstyle g=(g_{1},g_{2},\ldots ,g_{n})} . This figure depicts such a decomposition of f {\displaystyle \textstyle f} , with dependencies between variables indicated by arrows. These can be interpreted in two ways. The first view is the functional view: the input x {\displaystyle \textstyle x} is transformed into a 3-dimensional vector h {\displaystyle \textstyle h} , which is then transformed into a 2-dimensional vector g {\displaystyle \textstyle g} , which is finally transformed into f {\displaystyle \textstyle f} . This view is most commonly encountered in the context of optimization. The second view is the probabilistic view: the random variable F = f ( G ) {\displaystyle \textstyle F=f(G)} depends upon the random variable G = g ( H ) {\displaystyle \textstyle G=g(H)} , which depends upon H = h ( X ) {\displaystyle \textstyle H=h(X)} , which depends upon the random variable X {\displaystyle \textstyle X} . This view is most commonly encountered in the context of graphical models. The two views are largely equivalent. In either case, for this particular architecture, the components of individual layers are independent of each other (e.g., the components of g {\displaystyle \textstyle g} are independent of each other given their input h {\displaystyle \textstyle h} ). This naturally enables a degree of parallelism in the implementation. Networks such as the previous one are commonly called feedforward, because their graph is a directed acyclic graph. Networks with cycles are commonly called recurrent. Such networks are commonly depicted in the manner shown at the top of the figure, where f {\displaystyle \textstyle f} is shown as dependent upon itself. However, an implied temporal dependence is not shown. == Backpropagation == Backpropagation training algorithms fall into three categories: steepest descent (with variable learning rate and momentum, resilient backpropagation); quasi-Newton (Broyden–Fletcher–Goldfarb–Shanno, one step secant); Levenberg–Marquardt and conjugate gradient (Fletcher–Reeves update, Polak–Ribiére update, Powell–Beale restart, scaled conjugate gradient). === Algorithm === Let N {\displaystyle N} be a network with e {\displaystyle e} connections, m {\displaystyle m} inputs and n {\displaystyle n} outputs. Below, x 1 , x 2 , … {\displaystyle x_{1},x_{2},\dots } denote vectors in R m {\displaystyle \mathbb {R} ^{m}} , y 1 , y 2 , … {\displaystyle y_{1},y_{2},\dots } vectors in R n {\displaystyle \mathbb {R} ^{n}} , and w 0 , w 1 , w 2 , … {\displaystyle w_{0},w_{1},w_{2},\ldots } vectors in R e {\displaystyle \mathbb {R} ^{e}} . These are called inputs, outputs and weights, respectively. The network corresponds to a function y = f N ( w , x ) {\displaystyle y=f_{N}(w,x)} which, given a weight w {\displaystyle w} , maps an input x {\displaystyle x} to an output y {\displaystyle y} . In supervised learning, a sequence of training examples ( x 1 , y 1 ) , … , ( x p , y p ) {\displaystyle (x_{1},y_{1}),\dots ,(x_{p},y_{p})} produces a sequence of weights w 0 , w 1 , … , w p {\displaystyle w_{0},w_{1},\dots ,w_{p}} starting from some initial weight w 0 {\displaystyle w_{0}} , usually chosen at random. These weights are computed in turn: first compute w i {\displaystyle w_{i}} using only ( x i , y i , w i − 1 ) {\displaystyle (x_{i},y_{i},w_{i-1})} for i = 1 , … , p {\displaystyle i=1,\dots ,p} . The output of the algorithm is then w p {\displaystyle w_{p}} , giving a new function x ↦ f N ( w p , x ) {\displaystyle x\mapsto f_{N}(w_{p},x)} . The computation is the same in each step, hence only the case i = 1 {\displaystyle i=1} is described. w 1 {\displaystyle w_{1}} is calculated from ( x 1 , y 1 , w 0 ) {\displaystyle (x_{1},y_{1},w_{0})} by considering a variable weight w {\displaystyle w} and applying gradient descent to the function w ↦ E ( f N ( w , x 1 ) , y 1 ) {\displaystyle w\mapsto E(f_{N}(w,x_{1}),y_{1})} to find a local minimum, starting at w = w 0 {\displaystyle w=w_{0}} . This makes w 1 {\displaystyle w_{1}} the minimizing weight found by gradient descent. == Learning pseudocode == To implement the algorithm above, explicit formulas are required for the gradient of the function w ↦ E ( f N ( w , x ) , y ) {\displaystyle w\mapsto E(f_{N}(w,x),y)} where the function is E ( y , y ′ ) = | y − y ′ | 2 {\displaystyle E(y,y')=|y-y'|^{2}} . The learning algorithm can be divided into two phases: propagation and weight update. === Propagation === Propagation involves the following steps: Propagation forward through the network to generate the output value(s) Calculation of the cost (error term) Propagation of the output activations back through the network using the training pattern target to generate the deltas (the difference between the targeted and actual output values) of all output and hidden neurons. === Weight update === For each weight: Multiply the weight's output delta and input activation to find the gradient of the weight. Subtract the ratio (percentage) of the weight's gradient from the weight. The learning rate is the ratio (percentage) that influences the speed and quality of learning. The greater the ratio, the faster the neuron trains, but the lower the ratio, the more accurat