AI Data Center

AI Data Center — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Learning to rank

    Learning to rank

    Learning to rank (LTR) or machine-learned ranking (MLR) is the application of machine learning, often supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval and recommender systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data. == Applications == === In information retrieval === Ranking is a central part of many information retrieval problems, such as document retrieval, collaborative filtering, sentiment analysis, and online advertising. A possible architecture of a machine-learned search engine is shown in the accompanying figure. Training data consists of queries and documents matching them together with the relevance degree of each match. It may be prepared manually by human assessors (or raters, as Google calls them), who check results for some queries and determine relevance of each result. It is not feasible to check the relevance of all documents, and so typically a technique called pooling is used — only the top few documents, retrieved by some existing ranking models are checked. This technique may introduce selection bias. Alternatively, training data may be derived automatically by analyzing clickthrough logs (i.e. search results which got clicks from users), query chains, or such search engines' features as Google's (since-replaced) SearchWiki. Clickthrough logs can be biased by the tendency of users to click on the top search results on the assumption that they are already well-ranked. Training data is used by a learning algorithm to produce a ranking model which computes the relevance of documents for actual queries. Typically, users expect a search query to complete in a short time (such as a few hundred milliseconds for web search), which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as the vector space model, Boolean model, weighted AND, or BM25. This phase is called top- k {\displaystyle k} document retrieval and many heuristics were proposed in the literature to accelerate it, such as using a document's static quality score and tiered indexes. In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents. === In other areas === Learning to rank algorithms have been applied in areas other than information retrieval: In machine translation for ranking a set of hypothesized translations; In computational biology for ranking candidate 3-D structures in protein structure prediction problems; In recommender systems for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article. == Feature vectors == For the convenience of MLR algorithms, query-document pairs are usually represented by numerical vectors, which are called feature vectors. Such an approach is sometimes called bag of features and is analogous to the bag of words model and vector space model used in information retrieval for representation of documents. Components of such vectors are called features, factors or ranking signals. They may be divided into three groups (features from document retrieval are shown as examples): Query-independent or static features — those features, which depend only on the document, but not on the query. For example, PageRank or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's static quality score (or static rank), which is often used to speed up search query evaluation. Query-dependent or dynamic features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions. Query-level features or query features, which depend only on the query. For example, the number of words in a query. Some examples of features, which were used in the well-known LETOR dataset: TF, TF-IDF, BM25, and language modeling scores of document's zones (title, body, anchors text, URL) for a given query; Lengths and IDF sums of document's zones; Document's PageRank, HITS ranks and their variants. Selecting and designing good features is an important area in machine learning, which is called feature engineering. == Evaluation measures == There are several measures (metrics) which are commonly used to judge how well an algorithm is doing on training data and to compare the performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics. Examples of ranking quality measures: Mean average precision (MAP); DCG and NDCG; Precision@n, NDCG@n, where "@n" denotes that the metrics are evaluated only on top n documents; Mean reciprocal rank; Kendall's tau; Spearman's rho. DCG and its normalized variant NDCG are usually preferred in academic research when multiple levels of relevance are used. Other metrics such as MAP, MRR and precision, are defined only for binary judgments. Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric: Expected reciprocal rank (ERR); Yandex's pfound. Both of these metrics are based on the assumption that the user is more likely to stop looking at search results after examining a more relevant document, than after a less relevant document. == Approaches == Learning to Rank approaches are often categorized using one of three approaches: pointwise (where individual documents are ranked), pairwise (where pairs of documents are ranked into a relative order), and listwise (where an entire list of documents are ordered). Tie-Yan Liu of Microsoft Research Asia has analyzed existing algorithms for learning to rank problems in his book Learning to Rank for Information Retrieval. He categorized them into three groups by their input spaces, output spaces, hypothesis spaces (the core function of the model) and loss functions: the pointwise, pairwise, and listwise approach. In practice, listwise approaches often outperform pairwise approaches and pointwise approaches. This statement was further supported by a large scale experiment on the performance of different learning-to-rank methods on a large collection of benchmark data sets. In this section, without further notice, x {\displaystyle x} denotes an object to be evaluated, for example, a document or an image, f ( x ) {\displaystyle f(x)} denotes a single-value hypothesis, h ( ⋅ ) {\displaystyle h(\cdot )} denotes a bi-variate or multi-variate function and L ( ⋅ ) {\displaystyle L(\cdot )} denotes the loss function. === Pointwise approach === In this case, it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then the learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score. Formally speaking, the pointwise approach aims at learning a function f ( x ) {\displaystyle f(x)} predicting the real-value or ordinal score of a document x {\displaystyle x} using the loss function L ( f ; x j , y j ) {\displaystyle L(f;x_{j},y_{j})} . A number of existing supervised machine learning algorithms can be readily used for this purpose. Ordinal regression and classification algorithms can also be used in pointwise approach when they are used to predict the score of a single query-document pair, and it takes a small, finite number of values. === Pairwise approach === In this case, the learning-to-rank problem is approximated by a classification problem — learning a binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} that can tell which document is better in a given pair of documents. The classifier shall take two documents as its input and the goal is to minimize a loss function L ( h ; x u , x v , y u , v ) {\displaystyle L(h;x_{u},x_{v},y_{u,v})} . The loss function typically reflects the number and magnitude of inversions in the induced ranking. In many cases, the binary classifier h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} is implemented with a scoring function f ( x ) {\displaystyle f(x)} . As an example, RankNet adapts a probability model and defines h ( x u , x v ) {\displaystyle h(x_{u},x_{v})} as the estimated probability of the document x u {\displaystyle x_{u}} has higher quality than x v {\displaystyle x_{v}} : P u , v ( f ) = CDF ( f ( x u ) − f ( x v ) ) , {\displaystyle P_{u,v}(f)={\text{CDF}

    Read more →
  • Emotion-sensitive software

    Emotion-sensitive software

    Emotion-sensitive software (ESS) is software specifically designed to target and monitor emotional response in a human being. Some software measures anger by comparing the pitch of a voice to a regular, or calm, pitch. Another approach is the measurement of physical appearance. If a camera or similar recording device picks up a certain amount of red pigmentation in the skin the system can be alerted that this person is angered. The competitive landscape in the Electronic Surveillance Software (ESS) industry is marked by a high level of secrecy regarding the operational details of these software systems. Many producers deliberately withhold information about the inner workings of their ESS products, a strategy that serves dual purposes: firstly, it intensifies competition among companies in the sector, as each strives to maintain a unique edge without revealing trade secrets that could be leveraged by competitors; secondly, this secrecy acts as a deterrent against individuals or entities who might try to circumvent the surveillance mechanisms. One application of ESS was developed by University of Notre Dame Assistant Professor of Psychology Sidney D'Mello, Art Graesser from the University of Memphis and a colleague from Massachusetts Institute of Technology. They used the technology to create an electronic tutor that could assess a student's level of boredom and frustration based on facial expression and body language, and react accordingly.

    Read more →
  • Encyclopaedistics

    Encyclopaedistics

    Encyclopaedistics or encyclopaedics as a discipline, is the academic scholarship of encyclopedias as sources of encyclopedic knowledge and cultural objects as well; in this sense, this discipline is also known as "encyclopaedia studies" and can be termed as "theoretical encyclopaediography" by analogy with theoretical lexicography. Encyclopaedistics as a practical activity (profession or business) also called "encyclopaedic practice" or "encyclopedism" is the process of assembling encyclopaedias available to the public for sale or for free (encyclopaedia publishing or practical encyclopediography). In this sense, it is the art or craft of writing, compiling, and editing the paper or online encyclopedias. As a practical activity, encyclopaedistics originated in the Middle Ages in connection with the development of compendiums based on alphabetical structuring (e.g. first edition of Polyanthea by Dominicus Nanus Mirabellius). Encyclopaedistics is often defined as "the art and science of selecting and disseminating the information most significant to mankind". == Field of study == Encyclopaedistics is a specialized aspect of information science and communication science. At the same time, encyclopaedistics is also considered as one of scholarly disciplines which are seen as auxiliary for historical research (auxiliary sciences of history) . Third, encyclopaedics is a domain of philosophy (Romanticism). This term associated with German philosophers of the 18th century, such as Novalis, Friedrich Schlegel, who sought to create a "Scientific Bible" - both real and ideal book as the quintessence of human education (enlightenment). In any case, the most popular topics in encyclopaedia studies refferd the history of organization of encyclopaedic knowledge, encyclopaedic knowledge determination and selection, glossary composition, current state of development of encyclopaedic activity, features of making encyclopaedias and encyclopaedic articles, usage, role and significance of encyclopaedias, typology of encyclopaedic literature, encyclopaedists and encyclopaedic schools, opposition of classical encyclopaedias and Wikipedia as well as paper encyclopaedias and online encyclopaedias, case experience in building encyclopedias etc. In general, scholarly studies contribute to appearance of successful well-crafted encyclopaedias with high-quality articles. == Contemporary encyclopaedic practice == Today, academic institutions, universities, and publishing companies worldwide are engaged in encyclopaedic activity building national, multinational (universal), regional and subject-specific encyclopaedias, or doing studies related encyclopaedias. The development of national encyclopaedias is one of the prerogatives of the European Parliament in the policy of protection of accurate and verified information and in the fight against mis- and disinformation as well as in the policy of protecting, promoting and projecting Europe's values and interests in the world.

    Read more →
  • Documentation science

    Documentation science

    Documentation science is the study of the recording and retrieval of information. It includes methods for storing, retrieving, and sharing of information captured on physical as well as digital documents. This field is closely linked to the fields of library science and information science but has its own theories and practices. The term documentation science was coined by Belgian lawyer and peace activist Paul Otlet. He is considered to be the forefather of information science. He along with Henri La Fontaine laid the foundations of documentation science as a field of study. Professionals in this field are called documentalists. Over the years, documentation science has grown to become a large and important field of study. Evolving from traditional practices like archiving and retrieval to modern theories about the nature of documents, novel methods for organizing digital information, and applications in libraries, research, healthcare, business, and technology and more. This field continues to evolve in the digital age. == Developments in documentation science == 1895: The International Institute of Bibliography (originally Institut International de Bibliographie, IIB) was established on 12 September 1895, in Brussels, Belgium by Paul Otlet and Henri La Fontaine. It aimed to catalog all recorded knowledge using a universal classification system now known as the Universal Decimal Classification (UDC). 1931: International Institute of Bibliography (originally Institut International de Bibliographie, IIB) was renamed The International Institute for Documentation, (Institut International de Documentation, IID). 1934: Paul Otlet envisioned a “radiated library,” a global network of interconnected documents accessible from anywhere via telecommunication. This early idea is now seen as a forerunner of the internet. 1937: American Documentation Institute was founded (1968 nameshift to American Society for Information Science). 1951: Suzanne Briet published Qu'est-ce que la documentation? where she proposed that “a document is evidence in support of a fact,” expanding the definition to include objects such as animals in zoos when they are part of a scientific study. This was a significant theoretical shift in defining documents. 1965-1990: Documentation departments were established, for example, large research libraries, online computer retrieval systems and more. The persons doing the searches were called documentalists. But with the appearance of first CD-ROM databases in the mid-1980s and later the internet in 1990s, these intermediary searches decreased and most such departments closed or merged with other departments. 1996: "Dokvit", Documentation Studies, was established in 1996 at the University of Tromsø in Norway. 2001: The Document Academy was established. It is an international network that celebrates documentation. It was conducted by The Program of Documentation Studies, University of Tromsø, Norway and The School of Information Management and Systems, UC Berkeley. 2003: The first Document Research Conference (DOCAM), a series of conferences made by the Document Academy. DOCAM '03 (2003) was held 13–15 August 2003 at The School of Information Management and Systems (SIMS) at the University of California, Berkeley. 2007: Michael Buckland, Ronald Day, and Birger Hjørland expanded the theoretical foundations of documentation science. They researched and explored documents to be social artifacts, the role of ideology in classification, and how documents influenced knowledge systems. 2010s: The concept of post-documentation or “documentality” began in the 2010s, which focused on how digital traces (e.g., tweets, logs) function as documents without traditional physical form. This led to new thinking in document theory. 2016–present: The Document Academy's DOCAM conferences have continued, offering ongoing developments in the theory and practice of documentation. Themes include affect, memory, activism, and born-digital records. 2017: The journal Information Research published special issues addressing “document theory,” including views on documentation in virtual environments and digital archives. 2020–present: The growth of research data management (RDM) and open science has made documentation practices central to data sharing, metadata standards, and reproducibility in scientific work. == Theoretical foundations == Documentation science has some deep theories that explain what a document is, how people use documents, and how they are organized. These concepts were introduced by scholars who have not only studied libraries, but also philosophy, language, and social sciences. Suzanne Briet described a document as “any material form of evidence” that is made to be used as proof or to share information. An antelope in a zoo, for example, can be a document because it is being studied, classified, and described. Documents are not just things or materials but are also shaped by society. Michael Buckland noted that documents have meaning only when people agree they are useful or valid as information. He explained a document becomes a document when someone decides to use it as evidence. Ronald Day wrote about how documentation is not neutral, it can be influenced by power, ideology, and politics. He claimed that classification systems, like how libraries organize books, are not just technical tools. They also show what kinds of knowledge are seen as more important than others. In recent years, new theories have been introduced, like “documentality” by Maurizio Ferraris. He proposed that a document does not have to be a paper or file, it can also be something digital like a tweet, a database entry, or a log file, as long as it leaves a trace that can be looked at later. This theory helps explain modern digital documents. == Methodologies and practice == Documentation science includes many methods that help people collect, organize, store, and find information. These practices are used in libraries, archives, research labs, companies, and now also in online systems. === Collecting and creating documents === In the past, documentation work included gathering books, articles, reports, and other printed materials. People created records of these materials manually, using catalog cards, indexes, or bibliographies. Paul Otlet’s work with the Universal Bibliographic Repertory is one example. He created millions of card entries to organize knowledge from around the world. Today, documents are not only created by humans. Computers and machines also generate documents, like log files, metadata, and sensor data. These need new tools and methods for collection and management. === Organizing information === Organizing documents has always been a foundational element of documentation science. Methods like classification (dividing things into groups) and indexing (making lists of topics or keywords) help individuals find what they need. A widely used system is the Universal Decimal Classification (UDC) developed by Otlet and La Fontaine. Another is the Library of Congress Classification (LCC) used in the majority of U.S. libraries. Indexing can be performed by humans or by software programs that read the text and add tags to documents. Metadata is also used to describe documents. Metadata is “data about data” like the title, author, date, and subject of a document. Standards like Dublin Core are used in digital libraries to keep metadata consistent. === Retrieval and access === One of the main objectives of documentation is helping users find the right document. This is called information retrieval. In the past, this meant using catalog drawers or printed indexes. Today, people use search engines, databases, and digital libraries. Modern retrieval tools use Boolean logic, ranking algorithms, and sometimes machine learning to show the most useful results first. This is part of what is studied in both documentation science and information retrieval. === Preservation and archiving === Documents require long-term storage. This is called preservation of documents. Printed documents can be damaged by light, pests, or even time on the other hand digital documents can be deemed worthless if formats become outdated or storage facilities fail. Archivists use methods like migration, which includes moving files to new formats, and emulation, which replicates obsolete systems, to preserve materials. These methods and tools are ever changing as new technologies develop. But the main objective of documentation has remained the same, which is to keep information safe, organized, and easy to find. == Documentation in the digital age == With the expansion of the internet, computers, and cloud storage, documents are no longer just books, papers, or reports. They can now be emails, tweets, videos, websites, databases, or even log files created by machines. === Born-digital documents === Many documents today are created directly in digital form. These are called born-digit

    Read more →
  • Quantum image processing

    Quantum image processing

    Quantum image processing (QIMP) is using quantum computing or quantum information processing to create and work with quantum images. Due to some of the properties inherent to quantum computation, notably entanglement and parallelism, it is hoped that QIMP technologies will offer capabilities and performances that surpass their traditional equivalents, in terms of computing speed, security, and minimum storage requirements. == Background == A. Y. Vlasov's work in 1997 focused on using a quantum system to recognize orthogonal images. This was followed by efforts using quantum algorithms to search specific patterns in binary images and detect the posture of certain targets. Notably, more optics-based interpretations for quantum imaging were initially experimentally demonstrated in and formalized in after seven years. In 2003, Salvador Venegas-Andraca and S. Bose presented Qubit Lattice, the first published general model for storing, processing and retrieving images using quantum systems. Later on, in 2005, Latorre proposed another kind of representation, called the Real Ket, whose purpose was to encode quantum images as a basis for further applications in QIMP. Furthermore, in 2010 Venegas-Andraca and Ball presented a method for storing and retrieving binary geometrical shapes in quantum mechanical systems in which it is shown that maximally entangled qubits can be used to reconstruct images without using any additional information. Technically, these pioneering efforts with the subsequent studies related to them can be classified into three main groups: Quantum-assisted digital image processing (QDIP): These applications aim at improving digital or classical image processing tasks and applications. Optics-based quantum imaging (OQI) Classically inspired quantum image processing (QIMP) A survey of quantum image representation has been published in. Furthermore, the recently published book Quantum Image Processing provides a comprehensive introduction to quantum image processing, which focuses on extending conventional image processing tasks to the quantum computing frameworks. It summarizes the available quantum image representations and their operations, reviews the possible quantum image applications and their implementation, and discusses the open questions and future development trends. == Quantum image representations == There are various approaches for quantum image representation, that are usually based on the encoding of color information. A common representation is FRQI (Flexible Representation for Quantum Images), that captures the color and position at every pixel of the image, and defined as: | I ⟩ = 1 2 n ∑ i = 0 2 2 n − 1 | c i ⟩ ⊗ | i ⟩ {\displaystyle \vert I\rangle ={\frac {1}{2^{n}}}\sum _{i=0}^{2^{2n-1}}\vert c_{i}\rangle \otimes \vert i\rangle } where | i ⟩ {\textstyle |i\rangle } is the position and | c i ⟩ = c o s θ i | 0 ⟩ + s i n θ i | 1 ⟩ {\textstyle \vert c_{i}\rangle =cos\theta _{i}\vert 0\rangle +sin\theta _{i}\vert 1\rangle } the color with a vector of angles θ i ∈ [ 0 , π / 2 ] {\textstyle \theta _{i}\in \left[0,\pi /2\right]} . As it can be seen, | c i ⟩ {\textstyle \vert c_{i}\rangle } is a regular qubit state of the form | ψ ⟩ = α | 0 ⟩ + β | 1 ⟩ {\displaystyle \vert \psi \rangle =\alpha \vert 0\rangle +\beta \vert 1\rangle } , with basis states | 0 ⟩ = ( 1 0 ) {\textstyle \vert 0\rangle ={\begin{pmatrix}1\\0\end{pmatrix}}} and | 1 ⟩ = ( 0 1 ) {\textstyle \vert 1\rangle ={\begin{pmatrix}0\\1\end{pmatrix}}} , as well as amplitudes α {\textstyle \alpha } and β {\textstyle \beta } that satisfy | α | 2 + | β | 2 = 1 {\textstyle \left|\alpha \right|^{2}+\left|\beta \right|^{2}=1} . Another common representation is MCQI (Multi-Channel Representation for Quantum Images), that uses the RGB channels with quantum states and following FRQI definition: | I ⟩ = 1 2 n + 1 ∑ i = 0 2 2 n − 1 | C R G B i ⟩ ⊗ | i ⟩ {\displaystyle \vert I\rangle ={\frac {1}{2^{n+1}}}\sum _{i=0}^{2^{2n-1}}\vert C_{RGB}^{i}\rangle \otimes \vert i\rangle } | C R G B i ⟩ = cos ⁡ θ R i | 000 ⟩ + cos ⁡ θ G i | 001 ⟩ + cos ⁡ θ B i | 010 ⟩ + sin ⁡ θ R i | 100 ⟩ + sin ⁡ θ G i | 101 ⟩ + sin ⁡ θ B i | 110 ⟩ + cos ⁡ θ α | 011 ⟩ + sin ⁡ θ α | 111 ⟩ {\displaystyle {\begin{aligned}{\begin{aligned}\vert C_{RGB}^{i}\rangle &={\cos \theta _{R}^{i}\vert 000\rangle }+{\cos \theta _{G}^{i}\vert 001\rangle }+{\cos \theta _{B}^{i}\vert 010\rangle }\\&\quad +{\sin \theta _{R}^{i}\vert 100\rangle }+{\sin \theta _{G}^{i}\vert 101\rangle }+{\sin \theta _{B}^{i}\vert 110\rangle }\\&\quad +{\cos {\theta _{\alpha }}\vert 011\rangle }+{\sin \theta _{\alpha }\vert 111\rangle }\end{aligned}}\end{aligned}}} Departing from the angle-based approach of FRQI and MCQI, and using a qubit sequence, NEQR (Novel Enhanced Representation for Quantum Images) is another representation approach, that uses a function f ( y , x ) = C y x q − 1 C y x q − 2 … C y x 1 C y x 0 {\textstyle f\left(y,x\right)=C_{yx}^{q-1}C_{yx}^{q-2}\ldots C_{yx}^{1}C_{yx}^{0}} to encode color values for a 2 n × 2 n {\displaystyle 2^{n}\times 2^{n}} image: | I ⟩ = 1 2 n ∑ y = 0 2 n − 1 ∑ x = 0 2 n − 1 | f ( y , x ) ⟩ | y x ⟩ {\displaystyle \vert I\rangle ={\frac {1}{2^{n}}}\sum _{y=0}^{2^{n}-1}\sum _{x=0}^{2^{n}-1}\vert f\left(y,x\right)\rangle \vert yx\rangle } == Quantum image manipulations == A lot of the effort in QIMP has been focused on designing algorithms to manipulate the position and color information encoded using flexible representation of quantum images (FRQI) and its many variants. For instance, FRQI-based fast geometric transformations including (two-point) swapping, flip, (orthogonal) rotations and restricted geometric transformations to constrain these operations to a specified area of an image were initially proposed. Recently, NEQR-based quantum image translation to map the position of each picture element in an input image into a new position in an output image and quantum image scaling to resize a quantum image were discussed. While FRQI-based general form of color transformations were first proposed by means of the single qubit gates such as X, Z, and H gates. Later, Multi-Channel Quantum Image-based channel of interest (CoI) operator to entail shifting the grayscale value of the preselected color channel and the channel swapping (CS) operator to swap the grayscale values between two channels have been fully discussed. To illustrate the feasibility and capability of QIMP algorithms and application, researchers always prefer to simulate the digital image processing tasks on the basis of the QIRs that we already have. By using the basic quantum gates and the aforementioned operations, so far, researchers have contributed to quantum image feature extraction, quantum image segmentation, quantum image morphology, quantum image comparison, quantum image filtering, quantum image classification, quantum image stabilization, among others. In particular, QIMP-based security technologies have attracted extensive interest of researchers as presented in the ensuing discussions. Similarly, these advancements have led to many applications in the areas of watermarking, encryption, and steganography etc., which form the core security technologies highlighted in this area. In general, the work pursued by the researchers in this area are focused on expanding the applicability of QIMP to realize more classical-like digital image processing algorithms; propose technologies to physically realize the QIMP hardware; or simply to note the likely challenges that could impede the realization of some QIMP protocols. == Quantum image transform == By encoding and processing the image information in quantum-mechanical systems, a framework of quantum image processing is presented, where a pure quantum state encodes the image information: to encode the pixel values in the probability amplitudes and the pixel positions in the computational basis states. Given an image F = ( F i , j ) M × L {\displaystyle F=(F_{i,j})_{M\times L}} , where F i , j {\displaystyle F_{i,j}} represents the pixel value at position ( i , j ) {\displaystyle (i,j)} with i = 1 , … , M {\displaystyle i=1,\dots ,M} and j = 1 , … , L {\displaystyle j=1,\dots ,L} , a vector f → {\displaystyle {\vec {f}}} with M L {\displaystyle ML} elements can be formed by letting the first M {\displaystyle M} elements of f → {\displaystyle {\vec {f}}} be the first column of F {\displaystyle F} , the next M {\displaystyle M} elements the second column, etc. A large class of image operations is linear, e.g., unitary transformations, convolutions, and linear filtering. In the quantum computing, the linear transformation can be represented as | g ⟩ = U ^ | f ⟩ {\displaystyle |g\rangle ={\hat {U}}|f\rangle } with the input image state | f ⟩ {\displaystyle |f\rangle } and the output image state | g ⟩ {\displaystyle |g\rangle } . A unitary transformation can be implemented as a unitary evolution. Some basic and commonly used image transforms (e.g., the Fourier, Hadamard, an

    Read more →
  • Vinelink.com

    Vinelink.com

    Vinelink.com (VINE) is a national website in the United States that allows victims of crime, and the general public, to track the movements of prisoners held by the various states and territories. The first four letters in the websites name, "vine", are an acronym for "Victim Information and Notification Everyday". Vinelink.com displays information, based on the information provided by the various states' departments of correction and other law enforcement agencies, on whether an inmate is in custody, has been released, has been granted parole or probation, or has escaped from custody. In some cases, the website will reveal whether a defendant has been granted parole or probation, but then subsequently violated conditions of their release and become a fugitive. Information provided on Vinelink.com represents metadata, in that the website lists a defendant's custody status; but does not list what the individual is charged with, their criminal history, or the amount of their bail, if applicable. Internet users accessing the Vinelink.com website choose from a map of states and provinces within the United States where they wish to perform a search for an inmate. The user may then search for an individual using the inmate's or parolee's name, or by entering the inmate's specific department of corrections inmate number, if known. When the inmate's custody status changes, users who have registered to be notified of such changes will be notified via email, phone or both. This information is currently released upon request, without the website requesting reasons for the users search or requiring payment, as public records available to the general public. Inmate information is available for most states, and for Puerto Rico, on the website. The states of Arizona, Georgia, Massachusetts, Montana, New Hampshire and West Virginia provide very limited information on the site. In March of 2025, The Maine Sheriff's Association entered into a contract to pilot the use of the VINE system in three counties in the state as well as a regional jail, therefore making South Dakota the only state that does not participate in the VINE system to any degree. The website does not provide data on prisoners detained by the Federal Bureau of Prisons which has its own inmate locator web site nor for inmates of the U.S. military prisons.

    Read more →
  • Learning augmented algorithm

    Learning augmented algorithm

    A learning augmented algorithm (also called algorithm with predictions) is an algorithm that can make use of a prediction to improve its performance. Whereas in regular algorithms just the problem instance is inputted, learning augmented algorithms accept an extra parameter. This extra parameter often is a prediction of some property of the solution. This prediction is then used by the algorithm to improve its running time or the quality of its output. The most common application are online algorithms, where a prediction on the uncertain instance is provided. == Description == A learning augmented algorithm typically takes an input ( I , A ) {\displaystyle ({\mathcal {I}},{\mathcal {A}})} . Here I {\displaystyle {\mathcal {I}}} is a problem instance and A {\displaystyle {\mathcal {A}}} is the prediction. A prediction can be any object. Common are the following types: Prediction of an optimal solution. The prediction gives a solution to the problem or characterizes an optimal solution. Prediction of the input. This is mainly used for online problems. Prediction of algorithmic actions. A prediction tailored to a specific algorithm that suggests a specific algorithm execution. Learning augmented algorithms usually satisfy the following three properties: Consistency. A learning augmented algorithm is said to be consistent if the algorithm can be proven to have a good performance when it is provided with an accurate prediction. Smoothness. A learning augmented algorithm is called smooth if its performance can be bounded by a function of the quality of the prediction. Here, the quality can be measured in a problem specific way. This is also called the prediction error. Robustness. A learning augmented algorithm is called robust if its worst-case performance can be bounded even if the given prediction is inaccurate. Learning augmented algorithms generally do not prescribe how the prediction should be done. For this purpose machine learning can be used. == Applications == A few examples of problems where learning augmented algorithms have been applied are the following. === Online algorithms === The ski rental problem The weighted paging problem The set cover problem Nonclairvoyant scheduling The online bipartite matching problem === Warm starting === ==== Data structures ==== The binary search algorithm is an algorithm for finding elements of a sorted list x 1 , … , x n {\displaystyle x_{1},\ldots ,x_{n}} . It needs O ( log ⁡ ( n ) ) {\displaystyle O(\log(n))} steps to find an element with some known value y {\displaystyle y} in a list of length n {\displaystyle n} . With a prediction i {\displaystyle i} for the position of y {\displaystyle y} , the following learning augmented algorithm can be used. First, look at position i {\displaystyle i} in the list. If x i = y {\displaystyle x_{i}=y} , the element has been found. If x i < y {\displaystyle x_{i} y {\displaystyle x_{i}>y} , do the same as in the previous case, but instead consider i − 1 , i − 2 , i − 4 , … {\displaystyle i-1,i-2,i-4,\ldots } . The error is defined to be η = | i − i ∗ | {\displaystyle \eta =|i-i^{}|} , where i ∗ {\displaystyle i^{}} is the real index of y {\displaystyle y} . In the learning augmented algorithm, probing the positions i + 1 , i + 2 , i + 4 , … {\displaystyle i+1,i+2,i+4,\ldots } takes log 2 ⁡ ( η ) {\displaystyle \log _{2}(\eta )} steps. Then a binary search is performed on a list of size at most 2 η {\displaystyle 2\eta } , which takes log 2 ⁡ ( η ) {\displaystyle \log _{2}(\eta )} steps. This makes the total running time of the algorithm 2 log 2 ⁡ ( η ) {\displaystyle 2\log _{2}(\eta )} . So, when the error is small, the algorithm is faster than a normal binary search. This shows that the algorithm is consistent. Even in the worst case, the error will be at most n {\displaystyle n} . Then the algorithm takes at most O ( log ⁡ ( n ) ) {\displaystyle O(\log(n))} steps, so the algorithm is robust. ==== More examples ==== The maximum weight matching problem === Approximation algorithms === The maximum cut problem The vertex cover problem === Mechanism Design === The facility location problem

    Read more →
  • Upper ontology

    Upper ontology

    In information science, an upper ontology (also known as a top-level ontology, upper model, or foundation ontology) is an ontology (in the sense used in information science) that consists of very general terms (such as "object", "property", "relation") that are common across all domains. An important function of an upper ontology is to support broad semantic interoperability among a large number of domain-specific ontologies by providing a common starting point for the formulation of definitions. Terms in the domain ontology are ranked under the terms in the upper ontology, e.g., the upper ontology classes are superclasses or supersets of all the classes in the domain ontologies. A number of upper ontologies have been proposed, each with its own proponents. Library classification systems predate upper ontology systems. Though library classifications organize and categorize knowledge using general concepts that are the same across all knowledge domains, neither system is a replacement for the other. == Development == Any standard foundational ontology is likely to be contested among different groups, each with its own idea of "what exists". One factor exacerbating the failure to arrive at a common approach has been the lack of open-source applications that would permit the testing of different ontologies in the same computational environment. The differences have thus been debated largely on theoretical grounds, or are merely the result of personal preferences. Foundational ontologies can however be compared on the basis of adoption for the purposes of supporting interoperability across domain ontologies. No particular upper ontology has yet gained widespread acceptance as a de facto standard. Different organizations have attempted to define standards for specific domains. The 'Process Specification Language' (PSL) created by the National Institute of Standards and Technology (NIST) is one example. Another important factor leading to the absence of wide adoption of any existing upper ontology is the complexity. Some upper ontologies—Cyc is often cited as an example in this regard—are very large, ranging up to thousands of elements (classes, relations), with complex interactions among them and with a complexity similar to that of a human natural language, and the learning process can be even longer than for a natural language because of the unfamiliar format and logical rules. The motivation to overcome this learning barrier is largely absent because of the paucity of publicly accessible examples of use. As a result, those building domain ontologies for local applications tend to create the simplest possible domain-specific ontology, not related to any upper ontology. Such domain ontologies may function adequately for the local purpose, but they are very time-consuming to relate accurately to other domain ontologies. To solve this problem, some genuinely top level ontologies have been developed, which are deliberately designed to have minimal overlap with any domain ontologies. Examples are Basic Formal Ontology and the DOLCE (see below). === Arguments for the infeasibility of an upper ontology === Historically, many attempts in many societies have been made to impose or define a single set of concepts as more primal, basic, foundational, authoritative, true or rational than all others. A common objection to such attempts points out that humans lack the sort of transcendent perspective — or God's eye view — that would be required to achieve this goal. Humans are bound by language or culture, and so lack the sort of objective perspective from which to observe the whole terrain of concepts and derive any one standard. Thomasson, under the headline "1.5 Skepticism about Category Systems", wrote: "category systems, at least as traditionally presented, seem to presuppose that there is a unique true answer to the question of what categories of entity there are – indeed the discovery of this answer is the goal of most such inquiries into ontological categories. [...] But actual category systems offered vary so much that even a short survey of past category systems like that above can undermine the belief that such a unique, true and complete system of categories may be found. Given such a diversity of answers to the question of what the ontological categories are, by what criteria could we possibly choose among them to determine which is uniquely correct?" Another objection is the problem of formulating definitions. Top level ontologies are designed to maximize support for interoperability across a large number of terms. Such ontologies must therefore consist of terms expressing very general concepts, but such concepts are so basic to our understanding that there is no way in which they can be defined, since the very process of definition implies that a less basic (and less well understood) concept is defined in terms of concepts that are more basic and so (ideally) more well understood. Very general concepts can often only be elucidated, for example by means of examples, or paraphrase. There is no self-evident way of dividing the world up into concepts, and certainly no non-controversial one There is no neutral ground that can serve as a means of translating between specialized (or "lower" or "application-specific") ontologies Human language itself is already an arbitrary approximation of just one among many possible conceptual maps. To draw any necessary correlation between English words and any number of intellectual concepts, that we might like to represent in our ontologies, is just asking for trouble. (WordNet, for instance, is successful and useful, precisely because it does not pretend to be a general-purpose upper ontology; rather, it is a tool for semantic / syntactic / linguistic disambiguation, which is richly embedded in the particulars and peculiarities of the English language.) Any hierarchical or topological representation of concepts must begin from some ontological, epistemological, linguistic, cultural, and ultimately pragmatic perspective. Such pragmatism does not allow for the exclusion of politics between persons or groups, indeed it requires they be considered as perhaps more basic primitives than any that are represented. Those who doubt the feasibility of general purpose ontologies are more inclined to ask "what specific purpose do we have in mind for this conceptual map of entities and what practical difference will this ontology make?" This pragmatic philosophical position surrenders all hope of devising the encoded ontology version of "The world is everything that is the case." (Wittgenstein, Tractatus Logico-Philosophicus). Finally, there are objections similar to those against artificial intelligence. Technically, the complex concept acquisition and the social / linguistic interactions of human beings suggest any axiomatic foundation of "most basic" concepts must be cognitive biological or otherwise difficult to characterize since we don't have axioms for such systems. Ethically, any general-purpose ontology could quickly become an actual tyranny by recruiting adherents into a political program designed to propagate it and its funding means, and possibly defend it by violence. Historically, inconsistent and irrational belief systems have proven capable of commanding obedience to the detriment or harm of persons both inside and outside a society that accepts them. How much more harmful would a consistent rational one be, were it to contain even one or two basic assumptions incompatible with human life? === Arguments for the feasibility of an upper ontology === Many of those who doubt the possibility of developing wide agreement on a common upper ontology fall into one of two traps: they assert that there is no possibility of universal agreement on any conceptual scheme; but they argue that a practical common ontology does not need to have universal agreement, it only needs a large enough user community (as is the case for human languages) to make it profitable for developers to use it as a means to general interoperability, and for third-party developer to develop utilities to make it easier to use; and they point out that developers of data schemes find different representations congenial for their local purposes; but they do not demonstrate that these different representations are in fact logically inconsistent. In fact, different representations of assertions about the real world (though not philosophical models), if they accurately reflect the world, must be logically consistent, even if they focus on different aspects of the same physical object or phenomenon. If any two assertions about the real world are logically inconsistent, one or both must be wrong, and that is a topic for experimental investigation, not for ontological representation. In practice, representations of the real world are created as and known to be approximations to the basic reality, and their use is circumscribed by the limits of e

    Read more →
  • List of Ada software and tools

    List of Ada software and tools

    This is a list of software and programming tools for the Ada programming language, including IDEs, compilers, libraries, verification and debugging tools, numerical and scientific computing libraries, and related projects. == Compilers == GNAT — GCC Ada compiler and toolchain, maintained by AdaCore AdaCore GNAT Pro — commercial Ada compiler with advanced tooling for high-integrity and real-time systems Green Hills compiler for Ada — Ada compiler for embedded and safety-critical systems ObjectAda — Ada development environment for safety-critical and embedded systems == Integrated development environments (IDEs) and editors == GNAT Studio — IDE developed by AdaCore Emacs — supports Ada editing with Ada mode and syntax checking Eclipse — supports Ada through GNATbench plugin Visual Studio Code — Ada support via Ada Language Server extensions == Libraries and frameworks == See also: Ada Libraries on Wikibooks Ada.Calendar — date and time library Ada Web Services (AWS) — support for RESTful and SOAP web services Ada.Text_IO — standard library for text input/output Florist (POSIX Ada binding) – open-source implementation of the POSIX Ada bindings GNAT – Ada compiler part of GCC, which also provides an extensive runtime and library package hierarchy. GtkAda – Ada bindings for the GTK+ graphical user interface toolkit Matreshka – multipurpose Ada framework supporting Unicode, XML, JSON, and more. XML/Ada – XML and Unicode processing library == Real-time and embedded systems == Ada tasking — built-in concurrency support with tasks, protected objects, and rendezvous. Ada.Real_Time — real-time clocks, delays, and scheduling. ARINC 653 Ada profiles — for avionics real-time applications OpenMP Ada bindings — parallel programming for multi-core embedded systems Ravenscar profile — subset of Ada tasking for real-time and deterministic execution == Numerical and scientific computing == Ada.Numerics — libraries for numerical methods, linear algebra, and mathematical functions. SPARK math libraries — formal-methods-compliant numerical routines == Verification, debugging, and analysis == GNATprove — formal verification and static analysis tool for Ada and SPARK GNATstack — runtime stack analysis and checking GNATcoverage — code coverage measurement for Ada projects AdaControl — style checking and metrics for Ada == Testing frameworks == AUnit — unit testing framework for Ada GNATtest — automated testing framework for Ada == Documentation and code generation == GNATdoc — generates HTML documentation from Ada source code

    Read more →
  • Nike+iPod

    Nike+iPod

    The Nike+iPod Sport Kit is an activity tracker device, developed by Nike, Inc., which measures and records the distance and pace of a walk or run. The Nike+iPod consists of a small transmitter device attached to or embedded in a shoe, which communicates with either the Nike+ Sportband, or a receiver plugged into an iPod Nano. It can also work directly with a 2nd Generation iPod Touch (or higher), iPhone 3GS, iPhone 4, iPhone 4S, iPhone 5, The Nike+iPod was announced on May 23, 2006. On September 7, 2010, Nike released the Nike+ Running App (originally called Nike+ GPS) on the App Store, which used a tracking engine powered by MotionX that does not require the separate shoe sensor or pedometer. This application works using the accelerometer and GPS of the iPhone and the accelerometer of the iPod Touch, which does not have a GPS chip. Nike+Running is compatible with the iPhone 6 and iPhone 6 Plus down to iPhone 3GS and iPod touch. On June 21, 2012, Nike released Nike+ Running App for Android. The current app is compatible with all Android phones running 4.0.3 and up. == Overview == The sensor and iPod kit were revealed on May 20, 2006. The kit stores information such as the elapsed time of the workout, the distance traveled, pace, and calories burned by the individual. Nike+ was a collaboration between Nike and Apple; the platform consisted of an iPod, a wireless chip, Nike shoes that accepted the wireless chip, an iTunes membership, and a Nike+ online community. iPods using Nike iPod require a sensor and remote. The next upgraded product was the Sportband kit, which was announced in April 2008. The kit allows users to store run information without the iPod Nano. The Sportband consists of two parts: a rubber holding strap which is worn around the wrist, and a receiver which resembles a USB key-disk. The receiver displays information comparable to that of the iPod kit on the built-in display. After a run, the receiver can be plugged straight into a USB port and the software will upload the run information automatically to the Nike+ website. As of August 2008 "Nike+iPod for the Gym" launched, allowing users to record their cardio workouts directly to their iPods. No Sport kit or shoe sensor is required; all that is needed is a compatible iPod (1st–6th generation iPod Nano or 2nd/3rd gen iPod Touch) and an enabled piece of cardio equipment. As of March 2009, the seven largest commercial equipment providers were shipping enabled equipment (Life Fitness, Technogym, Precor USA, Star Trac, Cybex International, Matrix Fitness and Free Motion). The models of compatible cardio equipment include treadmills, stationary bicycles, stair climbers, ellipticals, and others such as Precor's Adaptive Motion Trainer. Once the user syncs an iPod with iTunes, the cardio workouts are automatically stored at Nikeplus.com, where each workout is visualized and tracked based on the number of calories burned. The calories are converted to "CardioMiles", at a ratio of 100:1, allowing cardio users to take full advantage of all the tools and features of Nikeplus.com, and allow them to engage in challenges with other runners, walkers and cardio users, using a common currency. With the release of the second-generation iPod Touch in 2008, Apple Inc. included a built-in ability to receive Nike+ signals, which allowed the iPod to connect directly to the wireless sensor thus eliminating the need for an external receiver to be connected. Apple also added this capability to the iPhone 3GS (released 2009), iPhone 4 (2010), and third-generation iPod Touch (2009). Those devices use their Broadcom Bluetooth chipset to receive the signals. On June 7, 2010, Polar and Nike introduced the Polar WearLink+ that works with Nike+. This new product works with the Nike+ SportBand and the fifth generation iPod nano in conjunction with the Nike+ iPod Sport Kit. Polar WearLink+ that works with Nike+ communicates directly with the fifth generation iPod nano and Nike+ SportBand using a proprietary digital protocol but it is dual-mode so it is also compatible with most Polar training computers (all those using 5 kHz analog transmission technology). Nike+ had 18 million global users as of April 2013. One year later, Nike updated the number of global users to 28 million. In iOS 6.1.2 (and possibly higher), a hole in the compatibility for the app has allowed jailbroken iPad users to use the native Nike + iPod iPhone and iPod app by moving the app bundle and setting permissions for the app. On April 30, 2018, Nike retired services for legacy Nike wearable devices, such as the Nike+ FuelBand and the Nike+ SportWatch GPS, and previous versions of apps, including Nike Run Club and Nike Training Club version 4.X and lower. Likewise, Nike no longer supported the Nike+ Connect software that transferred data to a NikePlus Profile or the Nike+ Fuel/FuelBand and Nike+ Move apps. == Sports kit equipment == The kit consists of two pieces: a piezoelectric sensor with a Nordic Semiconductor nRF2402 transmitter that is mounted under the inner sole of the shoe and a receiver that connects to the iPod. They communicate using a 2.4 GHz wireless radio and use Nordic Semiconductor's "ShockBurst" network protocol. The wireless data is encrypted in transit, but some uniquely identifying data is sent in the plain. The wireless protocol was reverse engineered and documented by Dmitry Grinberg in 2011. Nike recommends that the shoe be a Nike+ model with a special pocket in which to place the device. Nike has released the sensor for individual sale meaning that consumers no longer have to purchase the whole set (the iPod receiver and sensor). As the sensor battery cannot be replaced, a new one must be purchased every time the battery runs out. Aftermarket solutions are available to users who do not want to use shoes with built-in or hand-made pockets for the foot sensor, such as shoe pouches and containment devices designed to affix the sensor against the shoe laces. No matter how the sensor is integrated with the user's shoes, care must be taken that it is firmly fixed in place and will not jerk around while in use, which would degrade the accuracy. == Sports kit usage == The Sports Kit can be used to track running, which it refers to as "workouts". New workouts are started by plugging the receiving unit into the iPod, then navigating through the iPod menu system. The user chooses a goal for the workout, which might be to cover a specific distance, or burn a number of calories, or work out for a specified time. A workout can also be started without a goal, which is called a "Basic Workout". When the workout goal has been set, the receiver seeks the sensor, possibly asking the user to "walk around to activate [the] sensor". The user then must press the center button on the iPod to begin the workout. Audio feedback is provided in the user's choice of generic male or female voice by the iPod over the course of the workout, depending on the type of workout chosen. For goal-oriented workouts, the feedback will correspond to significant milestones toward the goal. In a distance workout, for example, the audio feedback will inform the user as each mile or kilometer has been completed, as well as the half-way point of the workout, and a countdown of four 100-meter increments at the end of the workout. The iPod's control wheel functions change slightly during a workout. The Pause button now not only pauses the music but also the workout. Similarly, the Menu button is used to access the controls to end the workout. The Forward and Back buttons are unchanged, performing audio track skip and reverse functions. The Center button has two functions: audio feedback about the current distance, time, and pace are provided when the button is tapped once, while if the button is held down the iPod skips to the "PowerSong" - an audio track chosen by the user, generally intended for motivation. In addition to the in-workout audio feedback, there are pre-recorded congratulations provided by Lance Armstrong, Tiger Woods, Joan Benoit Samuelson, and Paula Radcliffe whenever a user achieves a personal best (such as fastest mile, fastest 5K, fastest 10K, longest run yet) or reaches certain long-term milestones (such as 250 miles, 500 kilometers). This "celebrity feedback" is heard after the usual end-of-run statistics. While the Sports Kit can be used immediately after purchase, it will report more accurate results if it is calibrated before the first usage and then regularly afterwards. For calibration, the user finds a fixed known distance of at least 0.25 mile or 400 meters and then sets the Nike+ to calibration mode for the walk or run over that distance. When the walk or run is complete, the device calibrates itself and future workout reporting will reflect statistics closer to that individual user's workout style. Consumer Reports magazine tested the device and found it accurate as long as you keep an even pace. In workouts with varied pa

    Read more →
  • Reference data

    Reference data

    Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time. Examples of reference data include: Units of measurement Country codes Corporate codes Fixed conversion rates e.g., weight, temperature, and length Calendar structure and constraints Reference data sets are sometimes alternatively referred to as a "controlled vocabulary" or "lookup" data. Reference data differs from master data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. A further difference between reference data and master data is that a change to the reference data values may require an associated change in business process to support the change, while a change in master data will always be managed as part of existing business processes. For example, adding a new customer or sales product is part of the standard business process. However, adding a new product classification (e.g. "restricted sales item") or a new customer type (e.g. "gold level customer") will result in a modification to the business processes to manage those items. == Externally-defined reference data == For most organisations, most or all reference data is defined and managed within that organisation. Some reference data, however, may be externally defined and managed, for example by standards organizations. An example of externally defined reference data is the set of country codes as defined in ISO 3166-1. == Reference data management == Curating and managing reference data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's reference data. Without consistency across business process or applications, for example, similar things may be described in quite different ways. Reference data gain in value when they are widely re-used and widely referenced. Examples of good practice in reference data management include: Formalize the reference data management Use external reference data as much as possible Govern the reference data specific to your enterprise Manage reference data at enterprise level Version control your reference data

    Read more →
  • Synthetic data

    Synthetic data

    Synthetic data are artificially generated data not produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models. Data generated by a computer simulation can be seen as synthetic data. This encompasses most applications of physical modeling, such as music synthesizers or flight simulators. The output of such systems approximates the real thing, but is fully algorithmically generated. Synthetic data is used in a variety of fields as a filter for information that would otherwise compromise the confidentiality of particular aspects of the data. In many sensitive applications, datasets theoretically exist but cannot be released to the general public; synthetic data sidesteps the privacy issues that arise from using real consumer information without permission or compensation. == Usefulness == Synthetic data is generated to meet specific needs or certain conditions that may not be found in the original, real data. One of the hurdles in applying up-to-date machine learning approaches for complex scientific tasks is the scarcity of labeled data, a gap effectively bridged by the use of synthetic data, which closely replicates real experimental data. This can be useful when designing many systems, from simulations based on theoretical value, to database processors, etc. This helps detect and solve unexpected issues such as information processing limitations. Synthetic data are often generated to represent the authentic data and allows a baseline to be set. Another benefit of synthetic data is to protect the privacy and confidentiality of authentic data, while still allowing for use in testing systems. Computer security experts claim generated synthetic data "... enables us to create realistic behavior profiles for users and attackers. The data is used to train the fraud detection system itself, thus creating the necessary adaptation of the system to a specific environment." In defense and military contexts, synthetic data is seen as a potentially valuable tool to develop and improve complex AI systems, particularly in contexts where high-quality real-world data is scarce. At the same time, synthetic data together with the testing approach can give the ability to model real-world scenarios. == History == Scientific modelling of physical systems has a long history that runs concurrent with the history of physics. For example, research into synthesis of audio and voice can be traced back to the 1930s and before, driven forward by the developments of the telephone and audio recording technologies. Digitization gave rise to software synthesizers from the 1970s onwards. In the context of privacy-preserving statistical analysis, in 1993, the idea of original fully synthetic data was created by Donald Rubin. Rubin originally designed this to synthesize the Decennial Census long form responses for the short form households. He then released samples that did not include any actual long form records - in this he preserved anonymity of the household. Later that year, the idea of original partially synthetic data was created by Little. Little used this idea to synthesize the sensitive values on the public use file. A 1993 work fitted a statistical model to 60,000 MNIST digits, then it was used to generate over 1 million examples. Those were used to train a LeNet-4 to reach state of the art performance. In 1994, Stephen Fienberg introduced 'critical refinement', in which a parametric posterior predictive distribution (instead of a Bayes bootstrap) is used to do the sampling. Later, other important contributors to the development of synthetic data generation were Trivellore Raghunathan, Jerry Reiter, Donald Rubin, John M. Abowd, and Jim Woodcock. Collectively they came up with a solution for how to treat partially synthetic data with missing data. Similarly, they developed the technique of Sequential Regression Multivariate Imputation. == Calculations == Researchers test the framework on synthetic data, which is "the only source of ground truth on which they can objectively assess the performance of their algorithms". Synthetic data can be generated through the use of random lines, having different orientations and starting positions. Datasets can get fairly complicated. A more complicated dataset can be generated by using a synthesizer build. To create a synthesizer build, first use the original data to create a model or equation that fits the data the best. This model or equation will be called a synthesizer build. This build can be used to generate more data. Constructing a synthesizer build involves constructing a statistical model. In a linear regression line example, the original data can be plotted, and a best fit linear line can be created from the data. This line is a synthesizer created from the original data. The next step will be generating more synthetic data from the synthesizer build or from this linear line equation. In this way, the new data can be used for studies and research, and it protects the confidentiality of the original data. David Jensen from the Knowledge Discovery Laboratory explains how to generate synthetic data: "Researchers frequently need to explore the effects of certain data characteristics on their data model." To help construct datasets exhibiting specific properties, such as auto-correlation or degree disparity, proximity can generate synthetic data having one of several types of graph structure: random graphs that are generated by some random process; lattice graphs having a ring structure; lattice graphs having a grid structure, etc. In all cases, the data generation process follows the same process: Generate the empty graph structure. Generate attribute values based on user-supplied prior probabilities. Since the attribute values of one object may depend on the attribute values of related objects, the attribute generation process assigns values collectively. == Applications == === Fraud detection and confidentiality systems === Testing and training fraud detection and confidentiality systems are devised using synthetic data. Specific algorithms and generators are designed to create realistic data, which then assists in teaching a system how to react to certain situations or criteria. For example, intrusion detection software is tested using synthetic data. This data is a representation of the authentic data and may include intrusion instances that are not found in the authentic data. The synthetic data allows the software to recognize these situations and react accordingly. If synthetic data was not used, the software would only be trained to react to the situations provided by the authentic data and it may not recognize another type of intrusion. === Scientific research === Researchers doing clinical trials or any other research may generate synthetic data to aid in creating a baseline for future studies and testing. Real data can contain information that researchers may not want released, so synthetic data is sometimes used to protect the privacy and confidentiality of a dataset. Using synthetic data reduces confidentiality and privacy issues since it holds no personal information and cannot be traced back to any individual. Beyond privacy protection, synthetic data is also being explored for methodological innovation in drug development. For instance, synthetic data may be used to construct synthetic control arms as an alternative to conventional external control arms based on real-world data (RWD) or randomized controlled trials (RCTs). Collectively, regulatory agencies such as the FDA and EMA appear to be at various stages of recognizing and integrating AI-generated synthetic data into their methodologies. While there is growing consensus on the potential of such data to support model development and the broader lifecycle of medicinal products, to date no drug or medical device has been approved using solely or predominantly synthetic data—particularly not as a comparator arm generated entirely via data-driven algorithms. The quality and statistical handling of synthetic data are expected to become more prominent in future regulatory discussions, particularly in contexts such as predictive modeling (e.g., digital twins), where innovative approaches have already been referenced. === Machine learning === Synthetic data is increasingly being used for machine learning applications: a model is trained on a synthetically generated dataset with the intention of transfer learning to real data. Efforts have been made to enable more data science experiments via the construction of general-purpose synthetic data generators, such as the Synthetic Data Vault. In general, synthetic data has several natural advantages: once the synthetic environment is ready, it is fast and cheap to produce as much data as needed; synthetic data can have perfectly accurate labels, including labeling that may be very expensive or impo

    Read more →
  • Thai QR Payment

    Thai QR Payment

    Thai QR Payment or PromptPay (พร้อมเพย์) is a real-time payment system in Thailand that allows money transfers through digital channels using identifiers linked to a bank account, including a mobile phone number, citizen identification number, tax identification number or bank account number. The system was introduced in 2016 as part of Thailand's national e-payment infrastructure and was developed under the National e-Payment Master Plan, a government programme intended to expand digital payment infrastructure and reduce the use of cash in everyday transactions. It is owned by National ITMX ltd and Bank of Thailand and developed by Vocalink, a group by Mastercard == History == PromptPay (originally AnyID) is one of the National e-Payment projects and policies by Thailand, to regulate and standardize electronic payments to follow the technologies with internet and smartphones that is expanding and bringing technology into Finance and Commerce. By 22 December 2015, The First Prayut cabinet have approved the project as a national infastructure PromptPay has also been used in cross-border payment linkages with other real-time payment systems in Southeast Asia. In April 2021, the Monetary Authority of Singapore and the Bank of Thailand launched a linkage between Singapore's PayNow and Thailand's PromptPay, allowing customers of participating banks to send money between the two countries using a mobile phone number. In June 2021, the central banks of Thailand and Malaysia launched a cross-border QR payment linkage between PromptPay and Malaysia's DuitNow system. == Services == PromptPay's Services have included Encrypted Transactions and Payment between Two Individuals (C2C) Government Infrastructure Payment Tax Returns Individual PromptPay e-Wallet Thai QR Payment Pay Alert e-Donation Cross Border QR Payment

    Read more →
  • Query rewriting

    Query rewriting

    Query rewriting is a typically automatic transformation that takes a set of database tables, views, and/or queries, usually indices, often gathered data and query statistics, and other metadata, and yields a set of different queries, which produce the same results but execute with better performance (for example, faster, or with lower memory use). Query rewriting can be based on relational algebra or an extension thereof (e.g. multiset relational algebra with sorting, aggregation and three-valued predicates i.e. NULLs as in the case of SQL). The equivalence rules of relational algebra are exploited, in other words, different query structures and orderings can be mathematically proven to yield the same result. For example, filtering on fields A and B, or cross joining R and S can be done in any order, but there can be a performance difference. Multiple operations may be combined, and operation orders may be altered. The result of query rewriting may not be at the same abstraction level or application programming interface (API) as the original set of queries (though often is). For example, the input queries may be in relational algebra or SQL, and the rewritten queries may be closer to the physical representation of the data, e.g. array operations. Query rewriting can also involve materialization of views and other subqueries; operations that may or may not be available to the API user. The query rewriting transformation can be aided by creating indices from which the optimizer can choose (some database systems create their own indexes if deemed useful), mandating the use of specific indices, creating materialized and/or denormalized views, or helping a database system gather statistics on the data and query use, as the optimality depends on patterns in data and typical query usage. Query rewriting may be rule based or optimizer based. Some sources discuss query rewriting as a distinct step prior to optimization, operating at the level of the user accessible algebra API (e.g. SQL). There are other, largely unrelated concepts also named similarly, for example, query rewriting by search engines.

    Read more →
  • Task Force on Process Mining

    Task Force on Process Mining

    The IEEE Task Force on Process Mining (TFPM) is a non-commercial association for process mining. The IEEE (Institute of Electrical and Electronics Engineers) Task Force on Process Mining was established in October 2009 as part of the IEEE Computational Intelligence Society at the Eindhoven University of Technology. The task force is supported by over 80 organizations and has around 750 members. The main goal of the task force is to promote the research, development, education, and understanding of process mining. == About == In 2012, the IEEE World Congress on Computational Intelligence/ IEEE Congress on Evolutionary Computation held a session on Process Mining. Process mining is a type of research that is a mix of computational intelligence and data mining, as well as process modeling and analysis. === Activities and organization === The Task Force on Process Mining has a Steering Committee and an Advisory Board. The Steering Committee, was chaired by Wil van der Aalst in its inception in 2009, defined 15 action lines. These include the organization of the annual International Process Mining Conference (ICPM) series, standardization efforts leading to the IEEE XES standard for storing and exchanging event data, and the Process Mining Manifesto which was translated into 16 languages. The Task Force on Process Mining also publishes a newsletter, provides data sets, organizes workshops and competitions, and connects researchers and practitioners. In 2016, the IEEE Standards Association published the IEEE Standard for Extensible Event Stream (XES), which is a widely accepted file format by the process mining community. As of 2023, Boudewijn van Dongen serves as chair of the Steering Committee. Wil van der Aalst and Moe Wynn both serve as vice-chair of the Steering Committee.

    Read more →