Virtual intelligence

Virtual intelligence

Virtual intelligence (VI) is the term given to artificial intelligence that exists within a virtual world. Many virtual worlds have options for persistent avatars that provide information, training, role-playing, and social interactions. The immersion in virtual worlds provides a platform for VI beyond the traditional paradigm of past user interfaces (UIs). What Alan Turing established as a benchmark for telling the difference between human and computerized intelligence was devoid of visual influences. With today's VI bots, virtual intelligence has evolved past the constraints of past testing into a new level of the machine's ability to demonstrate intelligence. The immersive features of these environments provide nonverbal elements that affect the realism provided by virtually intelligent agents. Virtual intelligence is the intersection of these two technologies: Virtual environments: Immersive 3D spaces provide for collaboration, simulations, and role-playing interactions for training. Many of these virtual environments are currently being used for government and academic projects, including Second Life, VastPark, Olive, OpenSim, Outerra, Oracle's Open Wonderland, Duke University's Open Cobalt, and many others. Some of the commercial virtual worlds are also taking this technology into new directions, including the high-definition virtual world Blue Mars. Artificial intelligence (AI): AI is a branch of computer science that aims to create intelligent machines capable of performing tasks that typically require human intelligence. VI is a type of AI that operates within virtual environments to simulate human-like interactions and responses. == Applications == Cutlass Bomb Disposal Robot: Northrop Grumman developed a virtual training opportunity because of the prohibitive real-world cost and dangers associated with bomb disposal. By replicating a complicated system without having to learn advanced code, the virtual robot has no risk of damage, trainee safety hazards, or accessibility constraints. MyCyberTwin: NASA is among the companies that have used the MyCyberTwin AI technologies. They used it for the Phoenix rover in the virtual world Second Life. Their MyCyberTwin used a programmed profile to relay information about what the Phoenix rover was doing and its purpose. Second China: The University of Florida developed the "Second China" project as an immersive training experience for learning how to interact with the culture and language in a foreign country. Students are immersed in an environment that provides role-playing challenges coupled with language and cultural sensitivities magnified during country-level diplomatic missions or during times of potential conflict or regional destabilization. The virtual training provides participants with opportunities to access information, take part in guided learning scenarios, communicate, collaborate, and role-play. While China was the country for the prototype, this model can be modified for use with any culture to help better understand social and cultural interactions and see how other people think and what their actions imply. Duke School of Nursing Training Simulation: Extreme Reality developed virtual training to test critical thinking with a nurse performing trained procedures to identify critical data to make decisions and performing the correct steps for intervention. Bots are programmed to respond to the nurse's actions as the patient with their conditions improving if the nurse performs the correct actions.

Princh

Princh is a Danish software company, which is headquartered in Aarhus, Denmark. Founded in 2015, Princh develops cloud printing and electronic payment products. The company is headquartered in the city of Aarhus. While utilizing a smartphone or web app, users can locate a nearby printer to their current location, get directions to access said printer, and/or authorize a print and pay for the print job in question. The product is available as a native mobile apps for Android and iOS, as well as on web and desktop products for businesses and libraries. The app connects a network of printer owners and users around the world. Princh supports an array of printable files. == History == The company was founded in 2015. The company is currently based in the southern part of Aarhus. The Princh printing service was officially launched on June 23, 2015. Currently, Princh is available as a service in a multitude of locations such as print shops, libraries, hotels, or universities. Princh is a popular printing and payment product among libraries and can among other places be found in Denmark, Sweden, Norway, Germany, United Kingdom, United States, and Canada. == How it works == With the Princh app, users will be able to locate their nearest printer. Once the user is at the printer, the user chooses the document to be printed out and shares it with the Princh app. The user then selects the desired nearby printer entering the printer ID number or scanning the QR-code located on top of the printer, pays electronically and the print job is processed by the printer. Printer owners get access to a personal control panel where they can set printing prices and monitor all Princh activity for their business. == Notes and references ==

InfiniBand

InfiniBand (IB) is a computer networking standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. Between 2014 and June 2016, it was the most commonly used interconnect in the TOP500 list of supercomputers. Mellanox (acquired by Nvidia) manufactures InfiniBand host bus adapters and network switches, which are used by large computer system and database vendors in their product lines. As a computer cluster interconnect, IB competes with Ethernet, Fibre Channel, and Intel Omni-Path. The technology is promoted by the InfiniBand Trade Association. == History == InfiniBand originated in 1999 from the merger of two competing designs: Future I/O and Next Generation I/O (NGIO). NGIO was led by Intel, with a specification released in 1998, and joined by Sun Microsystems and Dell. Future I/O was backed by Compaq, IBM, and Hewlett-Packard. This led to the formation of the InfiniBand Trade Association (IBTA), which included both sets of hardware vendors as well as software vendors such as Microsoft. At the time it was thought some of the more powerful computers were approaching the interconnect bottleneck of the PCI bus, in spite of upgrades like PCI-X. Version 1.0 of the InfiniBand Architecture Specification was released in 2000. Initially the IBTA vision for IB was simultaneously a replacement for PCI in I/O, Ethernet in the machine room, cluster interconnect and Fibre Channel. IBTA also envisaged decomposing server hardware on an IB fabric. Mellanox had been founded in 1999 to develop NGIO technology, but by 2001 shipped an InfiniBand product line called InfiniBridge at 10 Gbit/second speeds. Following the burst of the dot-com bubble there was hesitation in the industry to invest in such a far-reaching technology jump. By 2002, Intel announced that instead of shipping IB integrated circuits ("chips"), it would focus on developing PCI Express, and Microsoft discontinued IB development in favor of extending Ethernet. Sun Microsystems and Hitachi continued to support IB. In 2003, the System X supercomputer built at Virginia Tech used InfiniBand in what was estimated to be the third largest computer in the world at the time. The OpenIB Alliance (later renamed OpenFabrics Alliance) was founded in 2004 to develop an open set of software for the Linux kernel. By February, 2005, the support was accepted into the 2.6.11 Linux kernel. In November 2005 storage devices finally were released using InfiniBand from vendors such as Engenio. Cisco, desiring to keep technology superior to Ethernet off the market, adopted a "buy to kill" strategy. Cisco successfully killed InfiniBand switching companies such as Topspin via acquisition. Of the top 500 supercomputers in 2009, Gigabit Ethernet was the internal interconnect technology in 259 installations, compared with 181 using InfiniBand. In 2010, market leaders Mellanox and Voltaire merged, leaving just one other IB vendor, QLogic, primarily a Fibre Channel vendor. At the 2011 International Supercomputing Conference, links running at about 56 gigabits per second (known as FDR, see below), were announced and demonstrated by connecting booths in the trade show. In 2012, Intel acquired QLogic's InfiniBand technology, leaving only one independent supplier. By 2014, InfiniBand was the most popular internal connection technology for supercomputers, although within two years, 10 Gigabit Ethernet started displacing it. In 2016, it was reported that Oracle Corporation (an investor in Mellanox) might engineer its own InfiniBand hardware. In 2019 Nvidia acquired Mellanox, the last independent supplier of InfiniBand products. == Specification == Specifications are published by the InfiniBand trade association. === Performance === Original names for speeds were single-data rate (SDR), double-data rate (DDR) and quad-data rate (QDR) as given below. Subsequently, other three-letter initialisms were added for even higher data rates. Notes Each link is duplex. Links can be aggregated: most systems use a 4 link/lane connector (QSFP). HDR often makes use of 2x links (aka HDR100, 100 Gb link using 2 lanes of HDR, while still using a QSFP connector). NDR introduced OSFP connectors which host one or two links at 2x (NDR200) or 4x (NDR400). They are not logically configured as a single 8x link, even when connecting switches together with an OSFP cable. InfiniBand provides remote direct memory access (RDMA) capabilities for low CPU overhead. === Topology === InfiniBand uses a switched fabric topology, as opposed to early shared medium Ethernet. All transmissions begin or end at a channel adapter. Each processor contains a host channel adapter (HCA) and each peripheral has a target channel adapter (TCA). These adapters can also exchange information for security or quality of service (QoS). === Messages === InfiniBand transmits data in packets of up to 4 KB that are taken together to form a message. A message can be: a remote direct memory access read or write a channel send or receive a transaction-based operation (that can be reversed) a multicast transmission an atomic operation === Physical interconnection === In addition to a board form factor connection, it can use both active and passive copper (up to 10 meters) and optical fiber cable (up to 10 km). QSFP connectors are used. The InfiniBand Association also specified the CXP connector system for speeds up to 120 Gbit/s over copper, active optical cables, and optical transceivers using parallel multi-mode fiber cables with 24-fiber MPO connectors. === Software interfaces === Mellanox operating system support is available for Solaris, FreeBSD, Red Hat Enterprise Linux, SUSE Linux Enterprise Server (SLES), Windows, HP-UX, VMware ESX, and AIX. InfiniBand has no specific standard application programming interface (API). The standard only lists a set of verbs such as ibv_open_device or ibv_post_send, which are abstract representations of functions or methods that must exist. The syntax of these functions is left to the vendors. Sometimes for reference this is called the verbs API. The de facto standard software is developed by OpenFabrics Alliance and called the Open Fabrics Enterprise Distribution (OFED). It is released under two licenses GPL2 or BSD license for Linux and FreeBSD, and as Mellanox OFED for Windows (product names: WinOF / WinOF-2; attributed as host controller driver for matching specific ConnectX 3 to 5 devices) under a choice of BSD license for Windows. It has been adopted by most of the InfiniBand vendors, for Linux, FreeBSD, and Microsoft Windows. IBM refers to a software library called libibverbs, for its AIX operating system, as well as "AIX InfiniBand verbs". The Linux kernel support was integrated in 2005 into the kernel version 2.6.11. === Ethernet over InfiniBand === Ethernet over InfiniBand, abbreviated to EoIB, is an Ethernet implementation over the InfiniBand protocol and connector technology. EoIB enables multiple Ethernet bandwidths varying on the InfiniBand (IB) version. Ethernet's implementation of the Internet Protocol Suite, usually referred to as TCP/IP, is different in some details compared to the direct InfiniBand protocol in IP over IB (IPoIB).

Backdoor (computing)

A backdoor is a typically covert method of bypassing normal authentication or encryption in a computer, product, embedded device (e.g. a home router), or its embodiment (e.g. part of a cryptosystem, algorithm, chipset, or even a "homunculus computer"—a tiny computer-within-a-computer such as that found in Intel's AMT technology). Backdoors are most often used for securing remote access to a computer, or obtaining access to plaintext in cryptosystems. From there it may be used to gain access to privileged information like passwords, corrupt or delete data on hard drives, or transfer information within compromised networks. In the United States, the 1994 Communications Assistance for Law Enforcement Act forces internet providers to provide backdoors for government authorities. In 2024, the U.S. government realized that China had been tapping communications in the U.S. using that infrastructure for months, or perhaps longer; China recorded presidential candidate campaign office phone calls—including employees of the then-vice president of the nation, and of the candidates themselves. A backdoor may take the form of a hidden part of a program, a separate program (e.g. Back Orifice may subvert the system through a rootkit), code in the firmware of the hardware, or parts of an operating system such as Windows, for example, device drivers. Trojan horses can be used to create vulnerabilities in a device. A Trojan horse may appear to be an entirely legitimate program, but when executed, it triggers an activity that may install a backdoor. Although some are secretly installed, other backdoors are deliberate and widely known. These kinds of backdoors have "legitimate" uses such as providing the manufacturer with a way to restore user passwords. Many systems that store information within the cloud fail to create accurate security measures. If many systems are connected within the cloud, hackers can gain access to all other platforms through the most vulnerable system. Default passwords (or other default credentials) can function as backdoors if they are not changed by the user. Some debugging features can also act as backdoors if they are not removed in the release version. In 1993, the United States government attempted to deploy an encryption system, the Clipper chip, with an explicit backdoor for law enforcement and national security access. The chip was unsuccessful. Recent proposals to counter backdoors include creating a database of backdoors' triggers and then using neural networks to detect them. == Overview == The threat of backdoors surfaced when multiuser and networked operating systems became widely adopted. Petersen and Turn discussed computer subversion in a paper published in the proceedings of the 1967 AFIPS Conference. They noted a class of active infiltration attacks that use "trapdoor" entry points into the system to bypass security facilities and permit direct access to data. The use of the word trapdoor here clearly coincides with more recent definitions of a backdoor. However, since the advent of public key cryptography the term trapdoor has acquired a different meaning (see: Trapdoor function), and thus the term "backdoor" is now preferred, only after the term trapdoor went out of use. More generally, such security breaches were discussed at length in a RAND Corporation task force report published under DARPA sponsorship by J.P. Anderson and D.J. Edwards in 1970. While initially targeting the computer vision domain, backdoor attacks have expanded to encompass various other domains, including text, audio, ML-based computer-aided design, and ML-based wireless signal classification. Additionally, vulnerabilities in backdoors have been demonstrated in deep generative models, reinforcement learning (e.g., AI GO), and deep graph models. These broad-ranging potential risks have prompted concerns from national security agencies regarding their potentially disastrous consequences. A backdoor in a login system might take the form of a hard coded user and password combination which gives access to the system. An example of this sort of backdoor was used as a plot device in the 1983 film WarGames, in which the architect of the "WOPR" computer system had inserted a hardcoded password-less account which gave the user access to the system, and to undocumented parts of the system (in particular, a video game-like simulation mode and direct interaction with the artificial intelligence). Although the number of backdoors in systems using proprietary software (software whose source code is not publicly available) is not widely credited, they are nevertheless frequently exposed. Programmers have even succeeded in secretly installing large amounts of benign code as Easter eggs in programs, although such cases may involve official forbearance, if not actual permission. == Examples == === Worms === Many computer worms, such as Sobig and Mydoom, install a backdoor on the affected computer (generally a PC on broadband running Microsoft Windows and Microsoft Outlook). Such backdoors appear to be installed so that spammers can send junk e-mail from the infected machines. Others, such as the Sony/BMG rootkit, placed secretly on millions of music CDs through late 2005, are intended as DRM measures—and, in that case, as data-gathering agents, since both surreptitious programs they installed routinely contacted central servers. A sophisticated attempt to plant a backdoor in the Linux kernel, exposed in November 2003, added a small and subtle code change by subverting the revision control system. In this case, a two-line change appeared to check root access permissions of a caller to the sys_wait4 function, but because it used assignment = instead of equality checking ==, it actually granted permissions to the system. This difference is easily overlooked, and could even be interpreted as an accidental typographical error, rather than an intentional attack. In January 2014, a backdoor was discovered in certain Samsung Android products, like the Galaxy devices. The Samsung proprietary Android versions are fitted with a backdoor that provides remote access to the data stored on the device. In particular, the Samsung Android software that is in charge of handling the communications with the modem, using the Samsung IPC protocol, implements a class of requests known as remote file server (RFS) commands, that allows the backdoor operator to perform via modem remote I/O operations on the device hard disk or other storage. As the modem is running Samsung proprietary Android software, it is likely that it offers over-the-air remote control that could then be used to issue the RFS commands and thus to access the file system on the device. === Object code backdoors === Harder to detect backdoors involve modifying object code, rather than source code—object code is much harder to inspect, as it is designed to be machine-readable, not human-readable. These backdoors can be inserted either directly in the on-disk object code, or inserted at some point during compilation, assembly linking, or loading—in the latter case the backdoor never appears on disk, only in memory. Object code backdoors are difficult to detect by inspection of the object code, but are easily detected by simply checking for changes (differences), notably in length or in checksum, and in some cases can be detected or analyzed by disassembling the object code. Further, object code backdoors can be removed (assuming source code is available) by simply recompiling from source on a trusted system. Thus for such backdoors to avoid detection, all extant copies of a binary must be subverted, and any validation checksums must also be compromised, and source must be unavailable, to prevent recompilation. Alternatively, these other tools (length checks, diff, checksumming, disassemblers) can themselves be compromised to conceal the backdoor, for example detecting that the subverted binary is being checksummed and returning the expected value, not the actual value. To conceal these further subversions, the tools must also conceal the changes in themselves—for example, a subverted checksummer must also detect if it is checksumming itself (or other subverted tools) and return false values. This leads to extensive changes in the system and tools being needed to conceal a single change. As object code can be regenerated by recompiling (reassembling, relinking) the original source code, making a persistent object code backdoor (without modifying source code) requires subverting the compiler itself—so that when it detects that it is compiling the program under attack it inserts the backdoor—or alternatively the assembler, linker, or loader. As this requires subverting the compiler, this in turn can be fixed by recompiling the compiler, removing the backdoor insertion code. This defense can in turn be subverted by putting a source meta-backdoor in the compiler, so that when it detects that it is compiling itself

Data grid

A data grid is an architecture or set of services that allows users to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. == Middleware == Middleware provides all the services and applications necessary for efficient management of datasets and files within the data grid while providing users quick access to the datasets and files. There is a number of concepts and tools that must be available to make a data grid operationally viable. However, at the same time not all data grids require the same capabilities and services because of differences in access requirements, security and location of resources in comparison to users. In any case, most data grids will have similar middleware services that provide for a universal name space, data transport service, data access service, data replication and resource management service. When taken together, they are key to the data grids functional capabilities. === Universal namespace === Since sources of data within the data grid will consist of data from multiple separate systems and networks using different file naming conventions, it would be difficult for a user to locate data within the data grid and know they retrieved what they needed based solely on existing physical file names (PFNs). A universal or unified name space makes it possible to create logical file names (LFNs) that can be referenced within the data grid that map to PFNs. When an LFN is requested or queried, all matching PFNs are returned to include possible replicas of the requested data. The end user can then choose from the returned results the most appropriate replica to use. This service is usually provided as part of a management system known as a Storage Resource Broker (SRB). Information about the locations of files and mappings between the LFNs and PFNs may be stored in a metadata or replica catalogue. The replica catalogue would contain information about LFNs that map to multiple replica PFNs. === Data transport service === Another middleware service is that of providing for data transport or data transfer. Data transport will encompass multiple functions that are not just limited to the transfer of bits, to include such items as fault tolerance and data access. Fault tolerance can be achieved in a data grid by providing mechanisms that ensures data transfer will resume after each interruption until all requested data is received. There are multiple possible methods that might be used to include starting the entire transmission over from the beginning of the data to resuming from where the transfer was interrupted. As an example, GridFTP provides for fault tolerance by sending data from the last acknowledged byte without starting the entire transfer from the beginning. The data transport service also provides for the low-level access and connections between hosts for file transfer. The data transport service may use any number of modes to implement the transfer to include parallel data transfer where two or more data streams are used over the same channel or striped data transfer where two or more steams access different blocks of the file for simultaneous transfer to also using the underlying built-in capabilities of the network hardware or specifically developed protocols to support faster transfer speeds. The data transport service might optionally include a network overlay function to facilitate the routing and transfer of data as well as file I/O functions that allow users to see remote files as if they were local to their system. The data transport service hides the complexity of access and transfer between the different systems to the user so it appears as one unified data source. === Data access service === Data access services work hand in hand with the data transfer service to provide security, access controls and management of any data transfers within the data grid. Security services provide mechanisms for authentication of users to ensure they are properly identified. Common forms of security for authentication can include the use of passwords or Kerberos (protocol). Authorization services are the mechanisms that control what the user is able to access after being identified through authentication. Common forms of authorization mechanisms can be as simple as file permissions. However, need for more stringent controlled access to data is done using Access Control Lists (ACLs), Role-Based Access Control (RBAC) and Tasked-Based Authorization Controls (TBAC). These types of controls can be used to provide granular access to files to include limits on access times, duration of access to granular controls that determine which files can be read or written to. The final data access service that might be present to protect the confidentiality of the data transport is encryption. The most common form of encryption for this task has been the use of SSL while in transport. While all of these access services operate within the data grid, access services within the various administrative domains that host the datasets will still stay in place to enforce access rules. The data grid access services must be in step with the administrative domains access services for this to work. === Data replication service === To meet the needs for scalability, fast access and user collaboration, most data grids support replication of datasets to points within the distributed storage architecture. The use of replicas allows multiple users faster access to datasets and the preservation of bandwidth since replicas can often be placed strategically close to or within sites where users need them. However, replication of datasets and creation of replicas is bound by the availability of storage within sites and bandwidth between sites. The replication and creation of replica datasets is controlled by a replica management system. The replica management system determines user needs for replicas based on input requests and creates them based on availability of storage and bandwidth. All replicas are then cataloged or added to a directory based on the data grid as to their location for query by users. In order to perform the tasks undertaken by the replica management system, it needs to be able to manage the underlying storage infrastructure. The data management system will also ensure the timely updates of changes to replicas are propagated to all nodes. ==== Replication update strategy ==== There are a number of ways the replication management system can handle the updates of replicas. The updates may be designed around a centralized model where a single master replica updates all others, or a decentralized model, where all peers update each other. The topology of node placement may also influence the updates of replicas. If a hierarchy topology is used then updates would flow in a tree like structure through specific paths. In a flat topology it is entirely a matter of the peer relationships between nodes as to how updates take place. In a hybrid topology consisting of both flat and hierarchy topologies updates may take place through specific paths and between peers. ==== Replication placement strategy ==== There are a number of ways the replication management system can handle the creation and placement of replicas to best serve the user community. If the storage architecture supports replica placement with sufficient site storage, then it becomes a matter of the needs of the users who access the datasets and a strategy for placement of replicas. There have been numerous strategies proposed and tested on how to best manage replica placement of datasets within the data grid to meet user requirements. There is not one universal strategy that fits every requirement the best. It is a matter of the type of data grid and user community requirements for access that will determine the best strategy to use. Replicas can even be created where the files are encrypted for confidentiality that would be useful in a research project dealing with medical files. The following section contains several strategies for replica placement. ===== Dynamic replication ===== Dynam

And–or tree

An and–or tree is a graphical representation of the reduction of problems (or goals) to conjunctions and disjunctions of subproblems (or subgoals). == Example == The and–or tree: represents the search space for solving the problem P, using the goal-reduction methods: P if Q and R P if S Q if T Q if U == Definitions == Given an initial problem P0 and set of problem solving methods of the form: P if P1 and … and Pn the associated and–or tree is a set of labelled nodes such that: The root of the tree is a node labelled by P0. For every node N labelled by a problem or sub-problem P and for every method of the form P if P1 and ... and Pn, there exists a set of children nodes N1, ..., Nn of the node N, such that each node Ni is labelled by Pi. The nodes are conjoined by an arc, to distinguish them from children of N that might be associated with other methods. A node N, labelled by a problem P, is a success node if there is a method of the form P if nothing (i.e., P is a "fact"). The node is a failure node if there is no method for solving P. If all of the children of a node N, conjoined by the same arc, are success nodes, then the node N is also a success node. Otherwise the node is a failure node. == Search strategies == An and–or tree specifies only the search space for solving a problem. Different search strategies for searching the space are possible. These include searching the tree depth-first, breadth-first, or best-first using some measure of desirability of solutions. The search strategy can be sequential, searching or generating one node at a time, or parallel, searching or generating several nodes in parallel. == Relationship with logic programming == The methods used for generating and–or trees are propositional logic programs (without variables). In the case of logic programs containing variables, the solutions of conjoint sub-problems must be compatible. Subject to this complication, sequential and parallel search strategies for and–or trees provide a computational model for executing logic programs. == Relationship with two-player games == And–or trees can also be used to represent the search spaces for two-person games. The root node of such a tree represents the problem of one of the players winning the game, starting from the initial state of the game. Given a node N, labelled by the problem P of the player winning the game from a particular state of play, there exists a single set of conjoint children nodes, corresponding to all of the opponents responding moves. For each of these children nodes, there exists a set of non-conjoint children nodes, corresponding to all of the player's defending moves. For solving game trees with proof-number search family of algorithms, game trees are to be mapped to and–or trees. MAX-nodes (i.e. maximizing player to move) are represented as OR nodes, MIN-nodes map to AND nodes. The mapping is possible, when the search is done with only a binary goal, which usually is "player to move wins the game".

Data definition specification

In computing, a data definition specification (DDS) is a guideline to ensure comprehensive and consistent data definition. It represents the attributes required to quantify data definition. A comprehensive data definition specification encompasses enterprise data, the hierarchy of data management, prescribed guidance enforcement and criteria to determine compliance. == Overview == A data definition specification may be developed for any organization or specialized field, improving the quality of its products through consistency and transparency. It eliminates redundancy (since all contributing areas are referencing the same specification) and provides standardization and degrees of compliance, making it easier and more efficient to create, modify, verify, analyze and share information across the enterprise. To understand how a data definition specification works in an enterprise, we must look at the elements of a DDS. Writing data definitions, defining business terms (or rules) in the context of a particular environment, provides structure for an organization's data architecture. In developing these definitions, the words used must be traceable to clearly defined data. A data definition specification may be used in the following activities: Business intelligence Business process modeling Business rules management Data analysis and modeling Information architecture Metadata modeling Data mastering Report generation == Criteria == A data definition specification requires data definitions to be: Atomic – singular, describing only one concept. Commonly used and ambiguous terms should be defined. While a term refers to one concept, several words may be used in a term: File – A concept identifiable with one word File extension – A concept identifiable with more than one word Traceable – Mapped to a specific data element. In business, a term may be traced to an entity (for example, a customer) or an attribute (such as a customer's name). A term may be a value in a data set (such as gender), or designate the data set itself. Traceability indicates relationships in the data hierarchy. Consistent - Used in a standard syntax; if used in a specific context, the context is noted Accurate - Precise, correct and unambiguous, stating what the term is and is not Clear - Readily understood by the reader Complete - With the term, its description and contextual references Concise - To avoid circular references == Applications == === Enterprise data === A data definition specification was produced by the Open Mobile Alliance to document charging data. The document, the centralized catalog of data elements defined for interfaces, specifies the mapping of these data elements to protocol fields in the interfaces. Created for the exchange of financial data, Market Data Definition Language (MDDL) is an XML specification designed to enable the interchange of information necessary to account, to analyze, and to trade financial instruments of the world's markets. It defines an XML-based interchange format and common data dictionary on the fields needed to describe: (1) financial instruments, (2) corporate events affecting value and tradability, and (3) market-related, economic and industrial indicators. The principal function of MDDL is to allow entities to exchange market data by standardizing formats and definitions. MDDL provides a common format for market data so that it can be efficiently passed from one processing system to another and provides a common understanding of market data content by standardizing terminology and by normalizing the relationships of various data elements to one another ... From the user perspective, the goal of MDDL is to enable users to integrate data from multiple sources by standardizing both the input feeds used for data warehousing (i.e., define what's being provided by vendors) and the output methods by which client applications request the data (i.e., ensure compatibility on how to get data in and out of applications)." === Clinical submissions === The Clinical Data Interchange Standards Consortium, a global, multidisciplinary, non-profit organization, has established standards to support the acquisition, exchange, submission and archiving of clinical research data and metadata. CDISC standards are vendor-neutral, platform-independent and freely available from the CDISC website. The Case Report Tabulation Data Definition Specification (define.xml) draft version 2.0, the oldest data definition specification, is part of the evolution from the 1999 FDA electronic submission (eSub) guidance and electronic Common Technical Document (eCTD) documents specifying that a document describing the content and structure of included data be included in a submission. Define.xml was developed to automate the review process by generating a machine-readable data-definition document. Define.xml has standardized submissions to the Food and Drug Administration, reducing review times from over two years to several months. === Archival data === A data definition specification is the foundation of metadata for scientific data archiving. The Metadata Encoding and Transmission Standard (METS) uses one principle of a DDS: consistent use of key terms to catalog digital objects for global use. The METS schema is a flexible mechanism for encoding descriptive, administrative and structural metadata for a digital library object and expressing complex links between metadata, and can provide a useful standard for the exchange of digital-library objects between repositories. A similar effort is underway to preserve complex data associated with video-game archiving. Preserving Virtual Worlds attempted to address archival-format deficiencies, citing the lack of suitable documentation for interactive fiction and games at the bit level: specifically, the absence of "representation information" needed to map raw bits into higher-level data constructs. Preserving Virtual Worlds 2 is a research project expanding on initial efforts in this field.