AI Data Poisoning

AI Data Poisoning — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Inferential theory of learning

    Inferential theory of learning

    Inferential Theory of Learning (ITL) is an area of machine learning which describes inferential processes performed by learning agents. ITL has been continuously developed by Ryszard S. Michalski, starting in the 1980s. The first known publication of ITL was in 1983. In the ITL learning process is viewed as a search (inference) through hypotheses space guided by a specific goal. The results of learning need to be stored. Stored information will later be used by the learner for future inferences. Inferences are split into multiple categories including conclusive, deduction, and induction. In order for an inference to be considered complete it was required that all categories must be taken into account. This is how the ITL varies from other machine learning theories like Computational Learning Theory and Statistical Learning Theory; which both use singular forms of inference. == Usage == The most relevant published usage of ITL was in scientific journal published in 2012 and used ITL as a way to describe how agent-based learning works. According to the journal "The Inferential Theory of Learning (ITL) provides an elegant way of describing learning processes by agents".

    Read more →
  • FAIR data

    FAIR data

    FAIR data is data which meets the 2016 FAIR principles of findability, accessibility, interoperability, and reusability (FAIR). The FAIR principles emphasize machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in the volume, complexity, and rate of production of data. The abbreviation FAIR/O data is sometimes used to indicate that the dataset or database in question complies with the FAIR principles and also carries an explicit data‑capable open license. == FAIR principles published by GO FAIR == Findable The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process. F1. (Meta)data are assigned a globally unique and persistent identifier F2. Data are described with rich metadata (defined by R1 below) F3. Metadata clearly and explicitly include the identifier of the data they describe F4. (Meta)data are registered or indexed in a searchable resource Accessible Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorisation. A1. (Meta)data are retrievable by their identifier using a standardised communications protocol A1.1 The protocol is open, free, and universally implementable A1.2 The protocol allows for an authentication and authorisation procedure, where necessary A2. Metadata are accessible, even when the data are no longer available Interoperable The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation I2. (Meta)data use vocabularies that follow FAIR principles I3. (Meta)data include qualified references to other (meta)data Reusable The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. R1. (Meta)data are richly described with a plurality of accurate and relevant attributes R1.1. (Meta)data are released with a clear and accessible data usage license R1.2. (Meta)data are associated with detailed provenance R1.3. (Meta)data meet domain-relevant community standards The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component). === Acceptance and implementation === Before FAIR, a 2007 OECD report was the most influential paper discussing similar ideas related to data accessibility. In January 2014, the Lorentz Centre at Leiden University hosted a workshop entitled "Jointly designing a data FAIRPORT" where the participants first formulated the FAIR principles. After further discussions, they were published in the March 2016 issue of Scientific Data. At the 2016 G20 Hangzhou summit, the G20 leaders issued a statement endorsing the application of FAIR principles to research. Also in 2016, a group of Australian organisations developed a Statement on FAIR Access to Australia's Research Outputs, which aimed to extend the principles to research outputs more generally. In 2017, Germany, Netherlands and France agreed to establish an international office to support the FAIR initiative, the GO FAIR International Support and Coordination Office. Other international organisations active in the research data ecosystem, such as CODATA or Research Data Alliance (RDA) also support FAIR implementations by their communities. FAIR principles implementation assessment is being explored by FAIR Data Maturity Model Working Group of RDA, CODATA's strategic Decadal Programme "Data for Planet: Making data work for cross-domain challenges" mentions FAIR data principles as a fundamental enabler of data driven science. The Association of European Research Libraries recommends the use of FAIR principles. A 2017 paper by advocates of FAIR data reported that awareness of the FAIR concept was increasing among various researchers and institutes, but also, understanding of the concept was becoming confused as different people apply their own differing perspectives to it. Guides on implementing FAIR data practices state that the cost of a data management plan in compliance with FAIR data practices should be 5% of the total research budget. In 2019 the Global Indigenous Data Alliance (GIDA) released the CARE Principles for Indigenous Data Governance as a complementary guide. The CARE principles extend principles outlined in FAIR data to include Collective benefit, Authority to control, Responsibility, and Ethics to ensure data guidelines address historical contexts and power differentials. The CARE Principles for Indigenous Data Governance were drafted at the International Data Week and Research Data Alliance Plenary co-hosted event, "Indigenous Data Sovereignty Principles for the Governance of Indigenous Data Workshop", held 8 November 2018, in Gaborone, Botswana. The lack of information on how to implement the guidelines have led to inconsistent interpretations of them. In January 2020, representatives of nine groups of universities around the world produced the Sorbonne declaration on research data rights, which included a commitment to FAIR data, and called on governments to provide support to enable it. In 2021, researchers identified the FAIR principles as a conceptual component of data catalog software tools, with the other components being metadata management, business context and data responsibility roles. In April 2022, Matthias Scheffler and colleagues argued in Nature that FAIR principles are "a must" so that data mining and artificial intelligence can extract useful scientific information from the data. There have been moves in the geosciences to establish FAIR data by use of decimal georeferencing However, making data (and research outcomes) FAIR is a challenging task, and it is challenging to assess the FAIRness. In 2020, the FAIR Data Maturity Model Working Group published a set of guidelines for assessing "FAIRness".

    Read more →
  • Known-item search

    Known-item search

    Known-item search is a specialization of information exploration which represents the activities carried out by searchers who have a particular item in mind. In the context of library catalogs, known‐item search means a search for an item for which the author or title is known. Although the concept of known-item search originated in library science, it is now applied in the context of web search and other online search activities. Known-item search is distinguished from exploratory search, in which a searcher is unfamiliar with the domain of their search goal, unsure about the ways to achieve their goal, and/or unsure about what their goal is.

    Read more →
  • NewSQL

    NewSQL

    NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system. Many enterprise systems that handle high-profile data (e.g., financial and order processing systems) are too large for conventional relational databases, but have transactional and consistency requirements that are not practical for NoSQL systems. The only options previously available for these organizations were to either purchase more powerful computers or to develop custom middleware that distributes requests over conventional DBMS. Both approaches feature high infrastructure costs and/or development costs. NewSQL systems attempt to reconcile the conflicts. == History == The term was first used by 451 Group analyst Matthew Aslett in a 2011 research paper discussing the rise of a new generation of database management systems. One of the first NewSQL systems was the H-Store parallel database system. == Applications == Typical applications are characterized by heavy OLTP transaction volumes. OLTP transactions; are short-lived (i.e., no user stalls) touch small amounts of data per transaction use indexed lookups (no table scans) have a small number of forms (a small number of queries with different arguments). However, some support hybrid transactional/analytical processing (HTAP) applications. Such systems improve performance and scalability by omitting heavyweight recovery or concurrency control. == List of NewSQL-databases == Apache Trafodion Clustrix CockroachDB Couchbase CrateDB Google Spanner MySQL Cluster NuoDB OceanBase Pivotal GemFire XD SequoiaDB SingleStore was formerly known as MemSQL. TIBCO Active Spaces TiDB TokuDB TransLattice Elastic Database VoltDB YDB YugabyteDB == Features == The two common distinguishing features of NewSQL database solutions are that they support online scalability of NoSQL databases and the relational data model (including ACID consistency) using SQL as their primary interface. NewSQL systems can be loosely grouped into three categories: === New architectures === NewSQL systems adopt various internal architectures. Some systems employ a cluster of shared-nothing nodes, in which each node manages a subset of the data. They include components such as distributed concurrency control, flow control, and distributed query processing. === SQL engines === The second category are optimized storage engines for SQL. These systems provide the same programming interface as SQL, but scale better than built-in engines. === Transparent sharding === These systems automatically split databases across multiple nodes using Raft or Paxos consensus algorithm.

    Read more →
  • Galaksija BASIC

    Galaksija BASIC

    Galaksija BASIC was the BASIC interpreter of the Galaksija build-it-yourself home computer from Yugoslavia. While being partially based on code taken from TRS-80 Level 1 BASIC, which the creator believed to have been a Microsoft BASIC, the extensive modifications of Galaksija BASIC—such as to include rudimentary array support, video generation code (as the CPU itself did it in absence of dedicated video circuitry) and generally improvements to the programming language—is said to have left not much more than flow-control and floating point code remaining from the original. The core implementation of the interpreter was fully contained in the 4 KiB ROM "A" or "1". The computer's original mainboard had a reserved slot for an extension ROM "B" or "2" that added more commands and features such as a built-in Zilog Z80 assembler. == ROM "A"/"1" symbols and keywords == The core implementation, in ROM "A" or "1", contained 3 special symbols and 32 keywords: ! begins a comment (equivalent of standard BASIC REM command) # Equivalent of standard BASIC DATA statement & prefix for hex numbers ARR$(n) Allocates an array of strings, like DIM, but can allocate only array with name A$ BYTE serves as PEEK when used as a function (e.g. PRINT BYTE(11123)) and POKE when used as a command (e.g. BYTE 11123,123). CALL n Calls BASIC subroutine as GOSUB in most other BASICs (e.g. CALL 100+4X) CHR$(n) converts an ASCII numeric code into a corresponding character (string) DOT x, y draws (command) or inspects (function) a pixel at given coordinates (0<=x<=63, 0<=y<=47). DOT displays the clock or time controlled by content of Y$ variable. Not in standard ROM EDIT n causes specified program line to be edited ELSE standard part of IF-ELSE construct (Galaksija did not use THEN) EQ compare alphanumeric values X$ and Y$ FOR standard FOR loop GOTO standard GOTO command HOME equivalent of standard BASIC CLS command - clears the screen HOME n protects n characters from the top of the screen from being scrolled away IF standard part of IF-ELSE construct (Galaksija did not use THEN) INPUT user entry of variable INT(n) a function that returns the greatest integer value equal to or lesser than n KEY(n) test whether a particular keyboard key is pressed LIST lists the program. Optional numeric argument specifies the first line number to begin listing with. MEM returns memory consumption data (need details here) NEW clears the current BASIC program NEW n clears BASIC program and moves beginning of BASIC area NEXT standard terminator of FOR loop OLD loads a program from tape OLD n loads program to different address PTR Returns address of the variable PRINT Printing numeric or string expression. RETURN Return from BASIC subroutine RND function (takes no arguments) that returns a random number between 0 and 1. RUN runs (executes) BASIC program. Optional numeric argument specifies the line number to begin execution with. SAVE saves a program to tape. Optional two arguments specify memory range to be saved (need details here). STEP standard part of FOR loop STOP stops execution of BASIC program TAKE replacement for READ and RESTORE. If the parameter is variable name, acts as READ, if it is number, acts as RESTORE UNDOT x, y "undraws" (resets) at specified coordinates (see DOT) UNDOT Stops the clock, not part of ROM USR Calls machine code subroutine WORD Double byte PEEK and POKE == ROM "B"/"2" additional symbols and keywords == The extended BASIC features, in ROM "B" or "2", contained one extra reserved symbol and 22 extra keywords: % /LABEL ABS(x) ARCTG(x) COS(x) COSD(x) DEL DUMP EXP(x) INP(x) LDUMP LLIST LN(x) LPRINT OUT PI POW(x,y) REN SIN(x), SIND(x) SQR(x) TG(x) TGD(x)

    Read more →
  • Object storage

    Object storage

    Object storage (also known as object-based storage or blob storage) is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems, which manage data as a file hierarchy, and block storage, which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity. Object storage systems allow retention of massive amounts of unstructured data in which data is written once and read once (or many times). Object storage is used for purposes such as storing objects like videos and photos on Facebook, songs on Spotify, or files in online collaboration services, such as Dropbox. One of the limitations with object storage is that it is not intended for transactional data, as object storage was not designed to replace NAS file access and sharing; it does not support the locking and sharing mechanisms needed to maintain a single, accurately updated version of a file. == History == === Origins === Jim Starkey coined the term blob working at Digital Equipment Corporation to refer to opaque data entities. The terminology was adopted for Rdb/VMS. Blob is often humorously explained to be an abbreviation for binary large object. According to Starkey, this backronym arose when Terry McKiever, working in marketing at Apollo Computer felt that the term needed to be an abbreviation. McKiever began using the expansion basic large object. This was later eclipsed by the retroactive explanation of blobs as binary large objects. According to Starkey, "Blob don't stand for nothin'." Rejecting the acronym, he explained his motivation behind the coinage, saying, "A blob is the thing that ate Cincinnatti [sic], Cleveland, or whatever", referring to the 1958 science fiction film The Blob. In 1995, research led by Garth Gibson on Network-Attached Secure Disks first promoted the concept of splitting less common operations, like namespace manipulations, from common operations, like reads and writes, to optimize the performance and scale of both. In the same year, a Belgian company – FilePool – was established to build the basis for archiving functions. Object storage was proposed at Gibson's Carnegie Mellon University lab as a research project in 1996. Another key concept was abstracting the writes and reads of data to more flexible data containers (objects). Fine grained access control through object storage architecture was further described by one of the NASD team, Howard Gobioff, who later was one of the inventors of the Google File System. Other related work includes the Coda filesystem project at Carnegie Mellon, which started in 1987, and spawned the Lustre file system. There is also the OceanStore project at UC Berkeley, which started in 1999 and the Logistical Networking project at the University of Tennessee Knoxville, which started in 1998. In 1999, Gibson founded Panasas to commercialize the concepts developed by the NASD team. === Development === Seagate Technology played a central role in the development of object storage. According to the Storage Networking Industry Association (SNIA), "Object storage originated in the late 1990s: Seagate specifications from 1999 Introduced some of the first commands and how operating system effectively removed from consumption of the storage." A preliminary version of the "OBJECT BASED STORAGE DEVICES Command Set Proposal" dated 10/25/1999 was submitted by Seagate as edited by Seagate's Dave Anderson and was the product of work by the National Storage Industry Consortium (NSIC) including contributions by Carnegie Mellon University, Seagate, IBM, Quantum, and StorageTek. This paper was proposed to INCITS T-10 (International Committee for Information Technology Standards) with a goal to form a committee and design a specification based on the SCSI interface protocol. This defined objects as abstracted data, with unique identifiers and metadata, how objects related to file systems, along with many other innovative concepts. Anderson presented many of these ideas at the SNIA conference in October 1999. The presentation revealed an IP Agreement that had been signed in February 1997 between the original collaborators (with Seagate represented by Anderson and Chris Malakapalli) and covered the benefits of object storage, scalable computing, platform independence, and storage management. == Architecture == === Abstraction of storage === One of the design principles of object storage is to abstract some of the lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of blocks or (exclusively) files. Objects contain additional descriptive properties which can be used for better indexing or management. Administrators do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure. Object storage also allows the addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions. === Inclusion of rich custom metadata within the object === Object storage explicitly separates file metadata from data to support additional capabilities. As opposed to fixed metadata in file systems (filename, creation date, type, etc.), object storage provides for full function, custom, object-level metadata in order to: Capture application-specific or user-specific information for better indexing purposes Support data-management policies (e.g. a policy to drive object movement from one storage tier to another) Centralize management of storage across many individual nodes and clusters Optimize metadata storage (e.g. encapsulated, database or key value storage) and caching/indexing (when authoritative metadata is encapsulated with the metadata inside the object) independently from the data storage (e.g. unstructured binary storage) Additionally, in some object-based file-system implementations: The file system clients only contact metadata servers once when the file is opened and then get content directly via object-storage servers (vs. block-based file systems which would require constant metadata access) Data objects can be configured on a per-file basis to allow adaptive stripe width, even across multiple object-storage servers, supporting optimizations in bandwidth and I/O Object-based storage devices (OSD) as well as some software implementations (e.g., DataCore Swarm) manage metadata and data at the storage device level: Instead of providing a block-oriented interface that reads and writes fixed sized blocks of data, data is organized into flexible-sized data containers, called objects Each object has both data (an uninterpreted sequence of bytes) and metadata (an extensible set of attributes describing the object); physically encapsulating both together benefits recoverability. The command interface includes commands to create and delete objects, write bytes and read bytes to and from individual objects, and to set and get attributes on objects Security mechanisms provide per-object and per-command access control === Programmatic data management === Object storage provides programmatic interfaces to allow applications to manipulate data. At the base level, this includes Create, read, update and delete (CRUD) functions for basic read, write and delete operations. Some object storage implementations go further, supporting additional functionality like object/file versioning, object replication, life-cycle management and movement of objects between different tiers and types of storage. Most API implementations are REST-based, allowing the use of many standard HTTP calls. == Implementation == === Cloud storage === The vast majority of cloud storage available in the market leverages an object-storage architecture. Some notable examples are Amazon S3, which debuted in March 2006, Microsoft Azure Blob Storage, IBM Cloud Object Storage, Rackspace Cloud Files (whose code was donated in 2010 to Openstack project and released as OpenStack Swift), and Google Cloud Storage released in May 2010. === Object-based file systems === Some distributed file systems use an object-based architecture, where file metadata is stored in metadata servers and file data is stored i

    Read more →
  • Information flow

    Information flow

    In discourse-based grammatical theory, information flow is any tracking of referential information by speakers. Information may be new, i.e., just introduced into the conversation; given, i.e., already active in the speakers' consciousness; or old, i.e., no longer active. The various types of activation, and how these are defined, are model-dependent. Information flow affects grammatical structures such as: Word order (topic, focus, and afterthought constructions). Active, passive, or middle voice. Choice of deixis, such as articles; "medial" deictics such as Spanish ese and Japanese sore are generally determined by the familiarity of a referent rather than by physical distance. Overtness of information, such as whether an argument of a verb is indicated by a lexical noun phrase, a pronoun, or not mentioned at all. Clefting: Splitting a single clause into two clauses, each with its own verb, e.g. ‘The chicken turtles tasted like chicken.’ becomes ‘It was the chicken turtle | that tasted like chicken.’ In this case, clefting is used to shift the focus of the sentence to the subject, the chicken turtle. Front focus: Placing at the start (front) of a sentence information that would normally occur later in the sentence, to give it extra prominence. For example, in pop culture, Yoda's speech often utilizes such syntactic construction, such as when he says 'much to learn you still have' to Luke Skywalker. End focus (or end weight): Given or familiar information followed by new information. This gives prominence to the final part of the sentences and can enable suspense to build, e.g. ‘Through the door came a gigantic wolf’.(Umer Prince)

    Read more →
  • Single source of truth

    Single source of truth

    In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas such that every data element is mastered (or edited) in only one place, providing data normalization to a canonical form (for example, in database normalization or content transclusion). There are several scenarios with respect to copies and updates: The master data is never copied and instead only references to it are made; this means that all reads and updates go directly to the SSOT. The master data is copied but the copies are only read and only the master data is updated; if requests to read data are only made on copies, this is an instance of CQRS. The master data is copied and the copies are updated; this needs a reconciliation mechanism when there are concurrent updates. Updates on copies can be thrown out whenever a concurrent update is made on the master, so they are not considered fully committed until propagated to the master. (many blockchains work that way.) Concurrent updates are merged. (if an automatic merge fails, it could fall back on another strategy, which could be the previous strategy or something else like manual intervention, which most source version control systems do.) The advantages of SSOT architectures include easier prevention of mistaken inconsistencies (such as a duplicate value/copy somewhere being forgotten), and greatly simplified version control. Without a SSOT, dealing with inconsistencies implies either complex and error-prone consensus algorithms, or using a simpler architecture that's liable to lose data in the face of inconsistency (the latter may seem unacceptable but it is sometimes a very good choice; it is how most blockchains operate: a transaction is actually final only if it was included in the next block that is mined). Ideally, SSOT systems provide data that are authentic (and authenticatable), relevant, and referable. Deployment of an SSOT architecture is becoming increasingly important in enterprise settings where incorrectly linked duplicate or de-normalized data elements (a direct consequence of intentional or unintentional denormalization of any explicit data model) pose a risk for retrieval of outdated, and therefore incorrect, information. Common examples (i.e., example classes of implementation) are as follows: In electronic health records (EHRs), it is imperative to accurately validate patient identity against a single referential repository, which serves as the SSOT. Duplicate representations of data within the enterprise would be implemented by the use of pointers rather than duplicate database tables, rows, or cells. This ensures that data updates to elements in the authoritative location are comprehensively distributed to all federated database constituencies in the larger overall enterprise architecture. EHRs are an excellent class for exemplifying how SSOT architecture is both poignantly necessary and challenging to achieve: it is challenging because inter-organization health information exchange is inherently a cybersecurity competence hurdle, and nonetheless it is necessary, to prevent medical errors, to prevent the wasted costs of inefficiency (such as duplicated work or rework), and to make the primary care and medical home concepts feasible (to achieve competent care transitions). Single-source publishing as a general principle or ideal in content management relies on having SSOTs, via transclusion or (otherwise, at least) substitution. Substitution happens via libraries of objects that can be propagated as static copies which are later refreshed when necessary (that is, when refreshing of the copy-paste or import is triggered by a larger updating event). Component content management systems are a class of content management systems that aim to provide competence on this level. == Implementation == === Ontologic interactions === An acknowledged prerequisite (of the notion that any given single source of truth can exist) is that it depends on the ontologic condition that no more than a single truth (about any particular fact or idea) exists, an assertion that is ontologic in both the IT sense and the general sense of that word. In many instances, this presents no problem (for example, within particular namespaces, or even across them, as long as naming collisions or broader name conflicts are adequately handled). The broadest contexts (and thus thorniest, regarding ontologic discrepancies) require adequate epistemic regime comparison and reconciliation (or at least negotiation or transactional exchanges). An archetypal example of this class of reconciliation is that two theological seminary libraries, from two different religions (X and Y), could exchange information with an SSOT architecture, but the unification of truth would reside on the level of the statement that "religion X asserts that God is purple whereas religion Y asserts that God is green", rather than on the level of "God is purple" or "God is green". === Architectures or architectural features === An ideal implementation of SSOT is rarely possible in most enterprises. This is because many organisations have multiple information systems, each of which needs access to data relating to the same entities (e.g., customer). Often these systems are purchased as commercial off-the-shelf products from vendors and cannot be modified in trivial ways. Each of these various systems therefore needs to store its own version of common data or entities, and therefore each system must retain its own copy of a record (hence immediately violating the SSOT approach defined above). For example, an enterprise resource planning (ERP) system (such as SAP or Oracle e-Business Suite) may store a customer record; the customer relationship management (CRM) system also needs a copy of the customer record (or part of it) and the warehouse dispatch system might also need a copy of some or all of the customer data (e.g., shipping address). In cases where vendors do not support such modifications, it is not always possible to replace these records with pointers to the SSOT. For organisations (with more than one information system) wishing to implement a Single Source of Truth (without modifying all but one master system to store pointers to other systems for all entities), some supporting architectures are: Master data management (MDM) Event store and event sourcing (ES) ==== Master data management (MDM) ==== A master data management system typically serves as the source of truth for an organization's metadata, helping to ensure accuracy and consistency throughout that organizations multiple data sources. Typically the MDM acts as a hub for multiple systems, many of which could allow (be the source of truth for) updates to different aspects of information on a given entity. For example, the CRM system may be the "source of truth" for most aspects of the customer, and is updated by a call centre operator. However, a customer may (for example) also update their address via a customer service web site, with a different back-end database from the CRM system. The MDM application receives updates from multiple sources, acts as a broker to determine which updates are to be regarded as authoritative (the golden record) and then syndicates this updated data to all subscribing systems. The MDM application normally requires an ESB to syndicate its data to multiple subscribing systems. ==== Event store and event sourcing (ES) ==== In event oriented architectures, it has become increasingly common to find an implementation of the Event Sourcing pattern which stores the system state as an ordered sequence of state changes. To do this, you need an Event Store, a particular type of database designed to hold all the events that change the state of the system. The event store in an Event Sourcing + Command Query Responsibility Separation + Domain Driven Design + Messaging architecture is in fact a "single source of truth", with the additional advantage that it can also act as an Enterprise Service Bus as it can listen directly to the event store for status changes as everything passes by. In addition, by saving all the events, it also plays the role of Data Warehouse. One last advantage is that through this system the Shared Database pattern can be implemented, another technique not mentioned to obtain a single source of truth. ==== Data warehouse (DW) ==== While the primary purpose of a data warehouse is to support reporting and analysis of data that has been combined from multiple sources, the fact that such data has been combined (according to business logic embedded in the data transformation and integration processes) means that the data warehouse is often used as a de facto SSOT. Generally, however, the data available from the data warehouse are not used to update other systems; rather the DW becomes

    Read more →
  • Natural language processing

    Natural language processing

    Natural language processing (NLP) is the processing of natural language information by a computer. NLP is a subfield of computer science and is closely associated with artificial intelligence. NLP is also related to information retrieval, knowledge representation, computational linguistics, and linguistics more broadly. Major processing tasks in an NLP system include: speech recognition, text classification, natural language understanding, and natural language generation. == History == Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence," which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language. === Symbolic NLP (1950s – early 1990s) === The premise of symbolic NLP is often illustrated using John Searle's Chinese room thought experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts. 1950s: The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years of research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted in America (though some research continued elsewhere, such as Japan and Europe) until the late 1980s when the first statistical machine translation systems were developed. 1960s: Some notably successful natural language processing systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies, and ELIZA, a simulation of Rogerian psychotherapy, written by Joseph Weizenbaum between 1964 and 1966. Despite using minimal information about human thought or emotion, ELIZA was able to produce interactions that appeared human-like. When the "patient" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". Ross Quillian's successful work on natural language was demonstrated with a vocabulary of only twenty words, because that was all that would fit in a computer memory at the time. 1970s: During the 1970s, many programmers began to write "conceptual ontologies", which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first chatterbots were written (e.g., PARRY). 1980s: The 1980s and early 1990s mark the heyday of symbolic methods in NLP. Focus areas of the time included research on rule-based parsing (e.g., the development of HPSG as a computational operationalization of generative grammar), morphology (e.g., two-level morphology), semantics (e.g., Lesk algorithm), reference (e.g., within Centering Theory) and other areas of natural language understanding (e.g., in the Rhetorical Structure Theory). Other lines of research were continued, e.g., the development of chatterbots with Racter and Jabberwacky. An important development (that eventually led to the statistical turn in the 1990s) was the rising importance of quantitative evaluation in this period. === Statistical NLP (1990s–present) === Up until the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This shift was influenced by increasing computational power (see Moore's law) and a decline in the dominance of Chomskyan linguistic theories (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. 1990s: Many of the notable early successes in statistical methods in NLP occurred in the field of machine translation, due especially to work at IBM Research, such as IBM alignment models. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, many systems relied on corpora that were specifically developed for the tasks they were designed to perform. This reliance has been a major limitation to their broader effectiveness and continues to affect similar systems. Consequently, significant research has focused on methods for learning effectively from limited amounts of data. 2000s: With the growth of the web, increasing amounts of raw (unannotated) language data have become available since the mid-1990s. Research has thus increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, large quantities of non-annotated data are available (including, among other things, the entire content of the World Wide Web), which can often make up for the worse efficiency if the algorithm used has a low enough time complexity to be practical. 2003: word n-gram model, at the time the best statistical algorithm, is outperformed by a multi-layer perceptron (with a single hidden layer and context length of several words, trained on up to 14 million words, by Bengio et al.) 2010: Tomáš Mikolov (then a PhD student at Brno University of Technology) with co-authors applied a simple recurrent neural network with a single hidden layer to language modeling, and in the following years he went on to develop Word2vec. In the 2010s, representation learning and deep neural network-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. This shift gained momentum due to results showing that such techniques can achieve state-of-the-art results in many natural language tasks, e.g., in language modeling and parsing. This is increasingly important in medicine and healthcare, where NLP helps analyze notes and text in electronic health records that would otherwise be inaccessible for study when seeking to improve care or protect patient privacy. == Approaches: Symbolic, statistical, neural networks == Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular: such as by writing grammars or devising heuristic rules for stemming. Machine learning approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach: both statistical and neural network methods tend to focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare and common cases equally. language models, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce. the larger such a (probabilistic) language model is, the more accurate it becomes, in contrast to rule-based systems that can gain accuracy only by increasing the amount and complexity of the rules leading to intractability problems. Rule-based systems are commonly used: when the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the Apertium system, for preprocessing in NLP pipelines, e.g., tokenization, or for post-processing and transforming the output of NLP pipelines, e.g., for knowledge extraction from syntactic parses. === Statistical approach === In the late 1980s and mid-1990s, the statistical approach ended a peri

    Read more →
  • Agentic commerce

    Agentic commerce

    Agentic commerce (also referred to as agent-based commerce) describes an emerging form of e-commerce in which autonomous artificial intelligence (AI) agents independently execute purchasing and payment processes on behalf of users or organizations. Unlike conventional digital commerce systems, which require direct human interaction at key decision points, agentic commerce systems are designed to search for products or services, evaluate options, make purchasing decisions, and complete payments without real-time human involvement. An emerging development within the broader fields of e-commerce, fintech, and artificial intelligence; agentic commerce combines advances in generative AI, autonomous agents, application programming interfaces (APIs), and digital payment infrastructures to direct transactions with no direct human interaction. == Characteristics == A defining feature of agentic commerce is the delegation of end-to-end commercial activities to software agents. These agents typically operate according to predefined user preferences, rules, or constraints, such as price limits, quality criteria, delivery times, or preferred payment methods. Based on these parameters, an agent can autonomously perform tasks including product discovery, price comparison, contract selection, order placement, and payment execution. In contrast to decision-support systems, which provide recommendations to human users, agentic commerce systems are designed to act independently. Human involvement may be limited to initial configuration, periodic supervision, or exception handling. == Comparison with traditional and AI-assisted commerce == Traditional e-commerce requires users to manually browse products, select offers, and authorize payments. Generative AI systems used in commerce commonly assist users by answering questions or suggesting options, and do not complete transactions autonomously. Agentic commerce differs in that decision-making authority is partially or fully transferred to AI agents. As a result, the conventional customer journey, characterized by conscious decision points, may be replaced by continuous, automated micro-decisions performed by software. == Applications and business use cases == Potential applications of agentic commerce include recurring purchases, subscription management, business-to-business procurement, inventory replenishment, and price monitoring. In such contexts, transactions are often predictable and standardized, making them suitable for automation. From a business perspective, agentic commerce systems may be used to optimize supply chains, manage inventory levels, negotiate prices algorithmically, or execute transactions across multiple platforms. Enterprises adopting the new technology include retailers Walmart, Home Depot, Wayfair and Urban Outfitters, and ad tech DSPs, including Google Ads, Amazon, and Yahoo. Chinese tech firms are using apps to provide full-service shopping and payment tools. These includes Alibaba, Tencent, and ByteDance who are currently developing AI powered shopping apps. The Qwen AI chatbot allows users to complete transactions directly within its interface. US firms are still leading in developing AI models but integration is slower due to privacy restrictions. == Payments and technical infrastructure == Agentic commerce relies on digital payment systems capable of supporting automated, machine-initiated transactions, including API-based payment processing, tokenization, real-time authorization, and continuous risk monitoring. Typical user interfaces, such as shopping carts, may be replaced by backend integrations between AI agents, merchants, and payment service providers. For example, Iike 2025, Alibaba launched Alipay AI Pay, which grew and began operating as an application for different retailers. In December 2025, Alipay teamed up with Rokid to enable developers to integrate AI payments into AI agents on Rokid's Lingzhu platform. In January 2025, Alipay unveiled the Agentic Commerce Trust Protocol in partnership with Alibaba's consumer AI applications, such as the Qwen App and Taobao Instant Commerce. Qwen adopted the platform first, connecting it to Taobao Instant Commerce and Alipay AI Pay. Users could use Qwen's agentic feature to place food and drink orders within the application instead of having to click outside to an external browser. For merchants, participation in agentic commerce may require products and services to be presented in structured, machine-readable formats to ensure discoverability and interoperability with autonomous agents. == Universal Commerce Protocol (UCP) == In January 2026, Google announced the Universal Commerce Protocol (UCP), an open-source web standard intended to enable interoperability between AI agents and retail systems across the shopping journey, from discovery and checkout to post-purchase support. UCP makes use of REST, JSON-RPC transports, and support for Agent Payments Protocol (AP2), Agent2Agent (A2A), and Model Context Protocol (MCP). == Legal, regulatory, and security considerations == The use of autonomous agents in commerce raises legal and regulatory questions, particularly regarding authorization, liability, consumer protection, and fraud prevention. Existing payment and contract frameworks are generally based on human decision-makers, and their applicability to autonomous agents remains an area of active discussion. Open issues include responsibility for unauthorized or erroneous transactions, mechanisms for dispute resolution, standards for agent authentication, and compliance with data protection and financial regulations. Continuous, automated transaction patterns may also require new approaches to security and risk assessment. Traditional fraud models centered on identity verification may be insufficient for agentic commerce, and that merchants may need intent-based detection methods using machine learning and behavioral analysis to distinguish legitimate AI agents from malicious automation. === Governance frameworks === The deployment of autonomous AI agents in commercial environments has prompted the development of dedicated governance frameworks. These aim to define operational boundaries, decision authority, oversight mechanisms, and accountability structures for agentic systems. The Agentic Commerce Framework (ACF), created in 2025 by Vincent Dorange, is a governance standard that structures the deployment of autonomous AI agents around four founding principles (Decision Sovereignty, Governance by Design, Ultimate Human Control, Traceable Accountability), four operational layers, and 18 governance KPIs. In January 2026, Singapore's Infocomm Media Development Authority (IMDA) published the Model AI Governance Framework for Agentic AI, extending its existing AI governance guidelines to address agent-specific risks including delegation chains and multi-agent coordination. The Cloud Security Alliance (CSA) has also proposed an Agentic Trust Framework applying zero-trust principles to AI agent governance. == Ecosystem and implementation == The adoption of agentic commerce typically requires changes in commerce architecture, data modeling, identity and permissions, and API-based orchestration of checkout and post-purchase workflows. Management consultancies have identified agentic commerce as a structural evolution of digital commerce, emphasizing the role of AI-driven agents in automating discovery, decision-making, and transaction processes across commerce systems. McKinsey & Company has described agentic commerce as a significant shift in how consumers interact with brands and how enterprises design their commerce operating models. In Europe, this ecosystem also includes digital commerce consultancies specializing in the adoption of agentic commerce. Consulting firms such as Horrea support brands in understanding and implementing the technological and organizational shifts associated with agentic commerce. == Market development and outlook == Agentic commerce is generally regarded as an early-stage development. Industry analysts have projected that AI-driven agents could account for a small but growing share of digital payment transactions within the coming years. Due to the scale of global digital commerce, even limited adoption could represent substantial transaction volumes. Analysts expect that by 2029, AI agents could handle between 1% and 4% of all digital payment transactions. With a projected total transaction volume of over $36 trillion a year, even a small share translates into a market worth up to $1.47 trillion. According to a McKinsey study from October 2025, agentic commerce projects that by 2030, the U.S. business-to-consumer retail market alone could see up to $1 trillion in revenue orchestrated through agentic commerce. On a global scale, the opportunity could range from $3 trillion to $5 trillion. Early experiments and pilot projects have demonstrated both the potential and current limitations of the

    Read more →
  • Artificial intelligence industry in China

    Artificial intelligence industry in China

    The roots of the development of artificial intelligence in the People's Republic of China started in the late 1970s following Deng Xiaoping's reform and opening up emphasizing science and technology as the country's primary productive force. The initial stages of China's AI development were slow and encountered significant challenges due to lack of resources and talent. At the beginning China was behind most Western countries in terms of AI development. A majority of the research was led by scientists who had received higher education abroad. Since 2006, the Chinese government has steadily developed a national agenda for artificial intelligence development and emerged as one of the leading nations in artificial intelligence research and development. In 2016, the Chinese Communist Party (CCP) released its 13th Five-Year Plan in which it aimed to become a global AI leader by 2030. As of 2025, China is considered to be a world leader in AI technology along with the United States. The State Council has a list of "national AI teams" including fifteen China-based companies, including Baidu, Tencent, Alibaba, SenseTime, and iFlytek. Each company should lead the development of a designated specialized AI sector in China, such as facial recognition, software/hardware, and speech recognition. China's rapid AI development has significantly impacted Chinese society in many areas, including the socio-economic, military, intelligence, and political spheres. Agriculture, transportation, accommodation and food services, and manufacturing are the top industries that would be the most impacted by further AI deployment. The private sector, university laboratories, and the military are working collaboratively in many aspects as there are few current existing boundaries. In 2021, China published the Data Security Law of the People's Republic of China, its first national law addressing AI-related ethical concerns. In October 2022, the United States federal government announced a series of export controls and trade restrictions intended to restrict China's access to advanced computer chips for AI applications. In 2023, the Cyberspace Administration of China issued guidelines requiring that AI content upholds the ideology of the CCP including Core Socialist Values, avoids discrimination, respects intellectual property rights, and safeguards user data. In 2025, the Chinese government issued a document regarding training data, requiring companies to use as little as data deemed "unsafe" as possible, as well as requiring companies to test models regularly. Concerns have been raised about the effects of the Chinese government's censorship regime on the development of generative artificial intelligence and long-term talent acquisition with state of the country's demographics. Others have noted that official notions of AI safety require following the priorities of the CCP and are antithetical to standards in democratic societies and raised concerns about the extension of China's system of mass surveillance and censorship abroad. == History == The Chinese term for artificial intelligence (réngōngzhìnéng 人工智能) connotes "humanmade" intelligence. The term developed as mid-20th century localisation of the Japanese term jinko chino. The research and development of artificial intelligence in China started in the 1980s, with the announcement by Deng Xiaoping of the importance of science and technology for China's economic growth. === Late 1970s to early 2010s === Chinese artificial intelligence research and development began in late 1970s after Deng Xiaoping's reform and opening up. China's first national conference on AI occurred in 1979. Academic journals in the late 1970s began publishing literature reviews of Western research on AI topics. In the 1980s, a group of Chinese scientists launched AI research led by Qian Xuesen and Wu Wenjun. However, during the time, China's society still had a generally conservative view towards AI. In the early 1980s, Science Press published translated versions of Western textbooks such as Patrick Winston's Artificial Intelligence and Nils John Nilsson's Principles of Artificial Intelligence. In 1980, a journal of the Chinese Academy of Sciences convened its first annual National Symposium on Artificial Intelligence, which included national and international scholars like Herbert A. Simon. The Chinese Association for Artificial Intelligence (CAAI) was founded in September 1981 and was authorized by the Ministry of Civil Affairs. CAAI has continued to be the largest AI association in China as of 2025. In 1982, CAAI began publishing the Artificial Intelligence Journal, which published early AI research by Chinese academics. In the 1980s, Chinese research on AI was influenced by the field of cybernetics, particularly the work of Norbert Weiner and his text Cybernetics: Or Control and Communication in the Animal and the Machine. Chinese researchers at the time sought to situate AI as part of a broader "Intelligence Science" field which would include disciplines like mathematics, computer science, cognitive science, social sciences, and philosophy. In 1987, Tsinghua University began a research publication on AI. Beginning in 1993, smart automation and intelligence have been part of China's national technology plan. Since the 2000s, the Chinese government has further expanded its research and development funds for AI and the number of government-sponsored research projects has dramatically increased. In 2006, China announced a policy priority for the development of artificial intelligence, which was included in the National Medium and Long Term Plan for the Development of Science and Technology (2006–2020), released by the State Council. In the same year, artificial intelligence was also mentioned in the 11th Five-Year Plan. In 2011, the Association for the Advancement of Artificial Intelligence (AAAI) established a branch in Beijing, China. At same year, the Wu Wenjun Artificial Intelligence Science and Technology Award was founded in honor of Chinese mathematician Wu Wenjun, and it became the highest award for Chinese achievements in the field of artificial intelligence. The first award ceremony was held on May 14, 2012. In 2013, the International Joint Conferences on Artificial Intelligence (IJCAI) was held in Beijing, marking the first time the conference was held in China. This event coincided with the Chinese government's announcement of the "Chinese Intelligence Year," a significant milestone in China's development of artificial intelligence. === Late 2010s to early 2020s === AI became a major issue of commercial, public, and political focus in China in the latter half of the 2010s. Various interpretations of the primary cause for this increased focus exist, with some analyses focusing on the 2016 Go match between Google's AlphaGo and Lee Sedol, others emphasising the U.S. increasing trade restrictions on China's technology industries and the desire to achieve national technological self-sufficiency. The State Council of China issued "A Next Generation Artificial Intelligence Development Plan" (State Council Document [2017] No. 35) on 20 July 2017. In the document, the CCP Central Committee and the State Council urged governing bodies in China to promote the development of artificial intelligence. Specifically, the plan described AI as a strategic technology that has become a "focus of international competition".:2 The document urged significant investment in a number of strategic areas related to AI and called for close cooperation between the state and private sectors. It set the goal of China becoming the preeminent country for AI research and application by 2030. During the general secretaryship of Xi Jinping, artificial intelligence has been a focus of the CCP's military-civil fusion efforts. On the occasion of Xi's speech at the first plenary meeting of the Central Military-Civil Fusion Development Committee (CMCFDC), scholars from the National Defense University wrote in the PLA Daily that the "transferability of social resources" between economic and military ends is an essential component to being a great power. During the Two Sessions 2017,"artificial intelligence plus" was proposed to be elevated to a strategic level. The same year witnessed the emergence of multiple application-level usages in the medical field according to reports. In 2018, Xinhua News Agency, in partnership with Tencent's subsidiary Sogou, launched its first artificial intelligence-generated news anchor. In 2018, the State Council budgeted $2.1 billion for an AI industrial park in Mentougou district. In order to achieve this the State Council stated the need for massive talent acquisition, theoretical and practical developments, as well as public and private investments. Some of the stated motivations that the State Council gave for pursuing its AI strategy include the potential of artificial intelligence for industrial transformation, better social

    Read more →
  • Emotion recognition

    Emotion recognition

    Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables. == Human == Humans show a great deal of variability in their abilities to recognize emotion. A key point to keep in mind when learning about automated emotion recognition is that there are several sources of "ground truth", or truth about what the real emotion is. Suppose we are trying to recognize the emotions of Alex. One source is "what would most people say that Alex is feeling?" In this case, the 'truth' may not correspond to what Alex feels, but may correspond to what most people would say it looks like Alex feels. For example, Alex may actually feel sad, but he puts on a big smile and then most people say he looks happy. If an automated method achieves the same results as a group of observers it may be considered accurate, even if it does not actually measure what Alex truly feels. Another source of 'truth' is to ask Alex what he truly feels. This works if Alex has a good sense of his internal state, and wants to tell you what it is, and is capable of putting it accurately into words or a number. However, some people are alexithymic and do not have a good sense of their internal feelings, or they are not able to communicate them accurately with words and numbers. In general, getting to the truth of what emotion is actually present can take some work, can vary depending on the criteria that are selected, and will usually involve maintaining some level of uncertainty. == Automatic == Decades of scientific research have been conducted developing and evaluating methods for automated emotion recognition. There is now an extensive literature proposing and evaluating hundreds of different kinds of methods, leveraging techniques from multiple areas, such as signal processing, machine learning, computer vision, and speech processing. Different methodologies and techniques may be employed to interpret emotion such as Bayesian networks. , Gaussian Mixture models and Hidden Markov Models and deep neural networks. === Approaches === The accuracy of emotion recognition is usually improved when it combines the analysis of human expressions from multimodal forms such as texts, physiology, audio, or video. Different emotion types are detected through the integration of information from facial expressions, body movement and gestures, and speech. The technology is said to contribute in the emergence of the so-called emotional or emotive Internet. The existing approaches in emotion recognition to classify certain emotion types can be generally classified into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. ==== Knowledge-based techniques ==== Knowledge-based techniques (sometimes referred to as lexicon-based techniques), utilize domain knowledge and the semantic and syntactic characteristics of text and potentially spoken language in order to detect certain emotion types. In this approach, it is common to use knowledge-based resources during the emotion classification process such as WordNet, SenticNet, ConceptNet, and EmotiNet, to name a few. One of the advantages of this approach is the accessibility and economy brought about by the large availability of such knowledge-based resources. A limitation of this technique on the other hand, is its inability to handle concept nuances and complex linguistic rules. Knowledge-based techniques can be mainly classified into two categories: dictionary-based and corpus-based approaches. Dictionary-based approaches find opinion or emotion seed words in a dictionary and search for their synonyms and antonyms to expand the initial list of opinions or emotions. Corpus-based approaches on the other hand, start with a seed list of opinion or emotion words, and expand the database by finding other words with context-specific characteristics in a large corpus. While corpus-based approaches take into account context, their performance still vary in different domains since a word in one domain can have a different orientation in another domain. ==== Statistical methods ==== Statistical methods commonly involve the use of different supervised machine learning algorithms in which a large set of annotated data is fed into the algorithms for the system to learn and predict the appropriate emotion types. Machine learning algorithms generally provide more reasonable classification accuracy compared to other approaches, but one of the challenges in achieving good results in the classification process, is the need to have a sufficiently large training set. Some of the most commonly used machine learning algorithms include Support Vector Machines (SVM), Naive Bayes, and Maximum Entropy. Deep learning, which is under the unsupervised family of machine learning, is also widely employed in emotion recognition. Well-known deep learning algorithms include different architectures of Artificial Neural Network (ANN) such as Convolutional Neural Network (CNN), Long Short-term Memory (LSTM), and Extreme Learning Machine (ELM). The popularity of deep learning approaches in the domain of emotion recognition may be mainly attributed to its success in related applications such as in computer vision, speech recognition, and Natural Language Processing (NLP). ==== Hybrid approaches ==== Hybrid approaches in emotion recognition are essentially a combination of knowledge-based techniques and statistical methods, which exploit complementary characteristics from both techniques. Some of the works that have applied an ensemble of knowledge-driven linguistic elements and statistical methods include sentic computing and iFeel, both of which have adopted the concept-level knowledge-based resource SenticNet. The role of such knowledge-based resources in the implementation of hybrid approaches is highly important in the emotion classification process. Since hybrid techniques gain from the benefits offered by both knowledge-based and statistical approaches, they tend to have better classification performance as opposed to employing knowledge-based or statistical methods independently. A downside of using hybrid techniques however, is the computational complexity during the classification process. === Datasets === Data is an integral part of the existing approaches in emotion recognition and in most cases it is a challenge to obtain annotated data that is necessary to train machine learning algorithms. For the task of classifying different emotion types from multimodal sources in the form of texts, audio, videos or physiological signals, the following datasets are available: HUMAINE: provides natural clips with emotion words and context labels in multiple modalities Belfast database: provides clips with a wide range of emotions from TV programs and interview recordings SEMAINE: provides audiovisual recordings between a person and a virtual agent and contains emotion annotations such as angry, happy, fear, disgust, sadness, contempt, and amusement IEMOCAP: provides recordings of dyadic sessions between actors and contains emotion annotations such as happiness, anger, sadness, frustration, and neutral state eNTERFACE: provides audiovisual recordings of subjects from seven nationalities and contains emotion annotations such as happiness, anger, sadness, surprise, disgust, and fear DEAP: provides electroencephalography (EEG), electrocardiography (ECG), and face video recordings, as well as emotion annotations in terms of valence, arousal, and dominance of people watching film clips DREAMER: provides electroencephalography (EEG) and electrocardiography (ECG) recordings, as well as emotion annotations in terms of valence, dominance of people watching film clips MELD: is a multiparty conversational dataset where each utterance is labeled with emotion and sentiment. MELD provides conversations in video format and hence suitable for multimodal emotion recognition and sentiment analysis. MELD is useful for multimodal sentiment analysis and emotion recognition, dialogue systems and emotion recognition in conversations. MuSe: provides audiovisual recordings of natural interactions between a person and an object. It has discrete and continuous emotion annotations in terms of valence, arousal and trustworthiness as well as speech topics useful for multimodal sentiment analysis and emotion recognition. UIT-VSMEC: is a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese

    Read more →
  • Virtual intelligence

    Virtual intelligence

    Virtual intelligence (VI) is the term given to artificial intelligence that exists within a virtual world. Many virtual worlds have options for persistent avatars that provide information, training, role-playing, and social interactions. The immersion in virtual worlds provides a platform for VI beyond the traditional paradigm of past user interfaces (UIs). What Alan Turing established as a benchmark for telling the difference between human and computerized intelligence was devoid of visual influences. With today's VI bots, virtual intelligence has evolved past the constraints of past testing into a new level of the machine's ability to demonstrate intelligence. The immersive features of these environments provide nonverbal elements that affect the realism provided by virtually intelligent agents. Virtual intelligence is the intersection of these two technologies: Virtual environments: Immersive 3D spaces provide for collaboration, simulations, and role-playing interactions for training. Many of these virtual environments are currently being used for government and academic projects, including Second Life, VastPark, Olive, OpenSim, Outerra, Oracle's Open Wonderland, Duke University's Open Cobalt, and many others. Some of the commercial virtual worlds are also taking this technology into new directions, including the high-definition virtual world Blue Mars. Artificial intelligence (AI): AI is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. VI is a type of AI that operates within virtual environments to simulate human-like interactions and responses. == Applications == Cutlass Bomb Disposal Robot: Northrop Grumman developed a virtual training opportunity because of the prohibitive real-world cost and dangers associated with bomb disposal. By replicating a complicated system without having to learn advanced code, the virtual robot has no risk of damage, trainee safety hazards, or accessibility constraints. MyCyberTwin: NASA is among the companies that have used the MyCyberTwin AI technologies. They used it for the Phoenix rover in the virtual world Second Life. Their MyCyberTwin used a programmed profile to relay information about what the Phoenix rover was doing and its purpose. Second China: The University of Florida developed the "Second China" project as an immersive training experience for learning how to interact with the culture and language in a foreign country. Students are immersed in an environment that provides role-playing challenges coupled with language and cultural sensitivities magnified during country-level diplomatic missions or during times of potential conflict or regional destabilization. The virtual training provides participants with opportunities to access information, take part in guided learning scenarios, communicate, collaborate, and role-play. While China was the country for the prototype, this model can be modified for use with any culture to help better understand social and cultural interactions and see how other people think and what their actions imply. Duke School of Nursing Training Simulation: Extreme Reality developed virtual training to test critical thinking with a nurse performing trained procedures to identify critical data to make decisions and performing the correct steps for intervention. Bots are programmed to respond to the nurse's actions as the patient with their conditions improving if the nurse performs the correct actions.

    Read more →
  • Run-to-completion scheduling

    Run-to-completion scheduling

    Run-to-completion scheduling or nonpreemptive scheduling is a scheduling model in which each task runs until it either finishes, or explicitly yields control back to the scheduler. Run-to-completion systems typically have an event queue which is serviced either in strict order of admission by an event loop, or by an admission scheduler which is capable of scheduling events out of order, based on other constraints such as deadlines. Some preemptive multitasking scheduling systems behave as run-to-completion schedulers in regard to scheduling tasks at one particular process priority level, at the same time as those processes still preempt other lower priority tasks and are themselves preempted by higher priority tasks.

    Read more →
  • The Algorithm Auction

    The Algorithm Auction

    The Algorithm Auction is the world's first auction of computer algorithms. Created by Ruse Laboratories, the initial auction featured seven lots and was held at the Cooper Hewitt, Smithsonian Design Museum on March 27, 2015. Five lots were physical representations of famous code or algorithms, including a signed, handwritten copy of the original Hello, World! C program by its creator Brian Kernighan on dot-matrix printer paper, a printed copy of 5,000 lines of Assembly code comprising the earliest known version of Turtle Graphics, signed by its creator Hal Abelson, a necktie containing the six-line qrpff algorithm capable of decrypting content on a commercially produced DVD video disc, and a pair of drawings representing OkCupid's original Compatibility Calculation algorithm, signed by the company founders. The qrpff lot sold for $2,500. Two other lots were “living algorithms,” including a set of JavaScript tools for building applications that are accessible to the visually impaired and the other is for a program that converts lines of software code into music. Winning bidders received, along with artifacts related to the algorithms, a full intellectual property license to use, modify, or open-source the code. All lots were sold, with Hello World receiving the most bids. Exhibited alongside the auction lots were a facsimile of the Plimpton 322 tablet on loan from Columbia University, and Nigella, an art-world facing computer virus named after Nigella Lawson and created by cypherpunk and hacktivist Richard Jones. Sebastian Chan, Director of Digital & Emerging Media at the Cooper–Hewitt, attended the event remotely from Milan, Italy via a Beam Pro telepresence robot. == Effects == Following the auction, the Museum of Modern Art held a salon titled The Way of the Algorithm highlighting algorithms as "a ubiquitous and indispensable component of our lives."

    Read more →