AI Data Warehouse

AI Data Warehouse — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Texture filtering

    Texture filtering

    In computer graphics, texture filtering or texture smoothing is the method used to determine the texture color for a texture mapped pixel, using the colors of nearby texels (ie. pixels of the texture). Filtering describes how a texture is applied at many different shapes, size, angles and scales. Depending on the chosen filter algorithm, the result will show varying degrees of blurriness, detail, spatial aliasing, temporal aliasing and blocking. Depending on the circumstances, filtering can be performed in software (such as a software rendering package) or in hardware, eg. with either real time or GPU accelerated rendering circuits, or in a mixture of both. For most common interactive graphical applications, modern texture filtering is performed by dedicated hardware which optimizes memory access through memory cacheing and pre-fetch, and implements a selection of algorithms available to the user and developer. There are two main categories of texture filtering: magnification filtering and minification filtering. Depending on the situation, texture filtering is either a type of reconstruction filter where sparse data is interpolated to fill gaps (magnification), or a type of anti-aliasing (AA) where texture samples exist at a higher frequency than required for the sample frequency needed for texture fill (minification). There are many methods of texture filtering, which make different trade-offs between computational complexity, memory bandwidth and image quality. == The need for filtering == During the texture mapping process for any arbitrary 3D surface, a texture lookup takes place to find out where on the texture each pixel center falls. For texture-mapped polygonal surfaces composed of triangles typical of most surfaces in 3D games and movies, every pixel (or subordinate pixel sample) of that surface will be associated with some triangle(s) and a set of barycentric coordinates, which are used to provide a position within a texture. Such a position may not lie perfectly on the "pixel grid," necessitating some function to account for these cases. In other words, since the textured surface may be at an arbitrary distance and orientation relative to the viewer, one pixel does not usually correspond directly to one texel. Some form of filtering has to be applied to determine the best color for the pixel. Insufficient or incorrect filtering will show up in the image as artifacts (errors in the image), such as 'blockiness', jaggies, or shimmering. There can be different types of correspondence between a pixel and the texel/texels it represents on the screen. These depend on the position of the textured surface relative to the viewer, and different forms of filtering are needed in each case. Given a square texture mapped on to a square surface in the world, at some viewing distance the size of one screen pixel is exactly the same as one texel. Closer than that, the texels are larger than screen pixels, and need to be scaled up appropriately — a process known as texture magnification. Farther away, each texel is smaller than a pixel, and so one pixel covers multiple texels. In this case an appropriate color has to be picked based on the covered texels, via texture minification. Graphics APIs such as OpenGL allow the programmer to set different choices for minification and magnification filters. Note that even in the case where the pixels and texels are exactly the same size, one pixel will not necessarily match up exactly to one texel. It may be misaligned or rotated, and cover parts of up to four neighboring texels. Hence some form of filtering is still required. == Mipmapping == Mipmapping is a standard technique used to save some of the filtering work needed during texture minification. It is also highly beneficial for cache coherency - without it the memory access pattern during sampling from distant textures will exhibit extremely poor locality, adversely affecting performance even if no filtering is performed. During texture magnification, the number of texels that need to be looked up for any pixel is always four or fewer; during minification, however, as the textured polygon moves farther away potentially the entire texture might fall into a single pixel. This would necessitate reading all of its texels and combining their values to correctly determine the pixel color, a prohibitively expensive operation. Mipmapping avoids this by prefiltering the texture and storing it in smaller sizes down to a single pixel. As the textured surface moves farther away, the texture being applied switches to the prefiltered smaller size. Different sizes of the mipmap are referred to as 'levels', with Level 0 being the largest size (used closest to the viewer), and increasing levels used at increasing distances. == Filtering methods == This section lists the most common texture filtering methods, in increasing order of computational cost and image quality. === Nearest-neighbor interpolation === Nearest-neighbor interpolation is the simplest and crudest filtering method — it simply uses the color of the texel closest to the pixel center for the pixel color. While simple, this results in a large number of artifacts - texture 'blockiness' during magnification, and aliasing and shimmering during minification. This method is fast during magnification but during minification the stride through memory becomes arbitrarily large and it can often be less efficient than MIP-mapping due to the lack of spatially coherent texture access and cache-line reuse. === Nearest-neighbor with mipmapping === This method still uses nearest neighbor interpolation, but adds mipmapping — first the nearest mipmap level is chosen according to distance, then the nearest texel center is sampled to get the pixel color. This reduces the aliasing and shimmering significantly during minification but does not eliminate it entirely. In doing so it improves texture memory access and cache-line reuse through avoiding arbitrarily large access strides through texture memory during rasterization. This does not help with blockiness during magnification as each magnified texel will still appear as a large rectangle. === Linear mipmap filtering === Less commonly used, OpenGL and other APIs support nearest-neighbor sampling from individual mipmaps whilst linearly interpolating the two nearest mipmaps relevant to the sample. === Bilinear filtering === In Bilinear filtering, the four nearest texels to the pixel center are sampled (at the closest mipmap level), and their colors are combined by weighted average according to distance. This removes the 'blockiness' seen during magnification, as there is now a smooth gradient of color change from one texel to the next, instead of an abrupt jump as the pixel center crosses the texel boundary. Bilinear filtering for magnification filtering is common. When used for minification it is often used with mipmapping; though it can be used without, it would suffer the same aliasing and shimmering problems as nearest-neighbor filtering when minified too much. For modest minification ratios, however, it can be used as an inexpensive hardware accelerated weighted texture supersample. The Nintendo 64 used an unusual version of bilinear filtering where only three pixels are used known as 3-point texture filtering, instead of four due to hardware optimization concerns. This introduces a noticeable "triangulation bias" in some textures. === Trilinear filtering === Trilinear filtering is a remedy to a common artifact seen in mipmapped bilinearly filtered images: an abrupt and very noticeable change in quality at boundaries where the renderer switches from one mipmap level to the next. Trilinear filtering solves this by doing a texture lookup and bilinear filtering on the two closest mipmap levels (one higher and one lower quality), and then linearly interpolating the results. This results in a smooth degradation of texture quality as distance from the viewer increases, rather than a series of sudden drops. Of course, closer than Level 0 there is only one mipmap level available, and the algorithm reverts to bilinear filtering. === Anisotropic filtering === Anisotropic filtering is the highest quality filtering available in current consumer 3D graphics cards. Simpler, "isotropic" techniques use only square mipmaps which are then interpolated using bi– or trilinear filtering. (Isotropic means same in all directions, and hence is used to describe a system in which all the maps are squares rather than rectangles or other quadrilaterals.) When a surface is at a high angle relative to the camera, the fill area for a texture will not be approximately square. Consider the common case of a floor in a game: the fill area is far wider than it is tall. In this case, none of the square maps are a good fit. The result is blurriness and/or shimmering, depending on how the fit is chosen. Anisotropic filtering corrects this by sampling the texture as a non-square shape. The goal is

    Read more →
  • Myrinet

    Myrinet

    Myrinet, ANSI/VITA 26-1998, is a high-speed local area networking system designed by the company Myricom to be used as an interconnect between multiple machines to form computer clusters. == Description == Myrinet was promoted as having lower protocol overhead than standards such as Ethernet, and therefore better throughput, less interference, and lower latency while using the host CPU. Although it can be used as a traditional networking system, Myrinet is often used directly by programs that "know" about it, thereby bypassing a call into the operating system. Earlier versions of Myrinet used a variety of media and connectors: Generation 2 used copper media with DC-37 (Myrinet-LAN, M2L- controllers and switches) or microribbon (Myrinet-SAN, M2M-) connectors. Generation 3 used copper media with HSSDC (Myrinet-Serial, M3S-) or microribbon (Myrinet-SAN, M3M-) connectors, or fiber with LC-connectors (Myrinet-Fiber, M3F-). The later versions of Myrinet physically consist of two fibre optic cables, upstream and downstream, connected to the host computers with a single connector. Machines are connected via low-overhead routers and switches, as opposed to connecting one machine directly to another. Myrinet includes a number of fault-tolerance features, mostly backed by the switches. These include flow control, error control, and "heartbeat" monitoring on every link. The "fourth-generation" Myrinet, called Myri-10G, supported a 10 Gbit/s data rate and can use 10 Gigabit Ethernet on PHY, the physical layer (cables, connectors, distances, signaling). Myri-10G started shipping at the end of 2005. Myrinet was approved in 1998 by the American National Standards Institute for use on the VMEbus as ANSI/VITA 26-1998. One of the earliest publications on Myrinet is a 1995 IEEE article. === Performance === Myrinet is a lightweight protocol with little overhead that allows it to operate with throughput close to the basic signaling speed of the physical layer. For supercomputing, the low latency of Myrinet is even more important than its throughput performance, since, according to Amdahl's law, a high-performance parallel system tends to be bottlenecked by its slowest sequential process, which in all but the most embarrassingly parallel supercomputer workloads is often the latency of message transmission across the network. === Deployment === According to Myricom, 141 (28.2%) of the June 2005 TOP500 supercomputers used Myrinet technology. In the November 2005 TOP500, the number of supercomputers using Myrinet was down to 101 computers, or 20.2%, in November 2006, 79 (15.8%), and by November 2007, 18 (3.6%), a long way behind gigabit Ethernet at 54% and InfiniBand at 24.2%. In the June 2014 TOP500 list, the number of supercomputers using Myrinet interconnect was 1 (0.2%). In November 2013, the assets of Myricom (including the Myrinet technology) were acquired by CSP Inc. In 2016, it was reported that Google had also offered to buy the company.

    Read more →
  • Snake oil (cryptography)

    Snake oil (cryptography)

    In cryptography, snake oil is any cryptographic method or product considered to be bogus or fraudulent. The name derives from snake oil, one type of patent medicine widely available in the 19th century United States. Distinguishing secure cryptography from insecure cryptography can be difficult from the viewpoint of a user. Many cryptographers, such as Bruce Schneier and Phil Zimmermann, undertake to educate the public in how secure cryptography is done, as well as highlighting the misleading marketing of some cryptographic products. The Snake Oil FAQ describes itself as "a compilation of common habits of snake oil vendors. It cannot be the sole method of rating a security product, since there can be exceptions to most of these rules. [...] But if you're looking at something that exhibits several warning signs, you're probably dealing with snake oil." == Some examples of snake oil cryptography techniques == This is not an exhaustive list of snake oil signs. A more thorough list is given in the references. Secret system Some encryption systems will claim to rely on a secret algorithm, technique, or device; this is categorized as security through obscurity. Criticisms of this are twofold. First, a 19th-century rule known as Kerckhoffs's principle, later formulated as Shannon's maxim, teaches that "the enemy knows the system" and the secrecy of a cryptosystem algorithm does not provide any advantage. Second, secret methods are not open to public peer review and cryptanalysis, so potential mistakes and insecurities can go unnoticed. Technobabble Snake oil salespeople may use "technobabble" to sell their product since cryptography is a complicated subject. "Unbreakable" Claims of a system or cryptographic method being "unbreakable" are always false (or true under some limited set of conditions), and are generally considered a sure sign of snake oil. "Military grade" There is no accepted standard or criterion for "military grade" ciphers. One-time pads One-time pads are a popular cryptographic method to invoke in advertising, because it is well known that one-time pads, when implemented correctly, are genuinely unbreakable. The problem comes in implementing one-time pads, which is rarely done correctly. Cryptographic systems that claim to be based on one-time pads are considered suspect, particularly if they do not describe how the one-time pad is implemented, or they describe a flawed implementation. Unsubstantiated "bit" claims Cryptographic products are often accompanied with claims of using a high number of bits for encryption, apparently referring to the key length used. However key lengths are not directly comparable between symmetric and asymmetric systems. Furthermore, the details of implementation can render the system vulnerable. For example, in 2008 it was revealed that a number of hard drives sold with built-in "128-bit AES encryption" were actually using a simple and easily defeated "XOR" scheme. AES was only used to store the key, which was easy to recover without breaking AES.

    Read more →
  • ISO 15765-2

    ISO 15765-2

    ISO 15765-2, or ISO-TP (Transport Layer), is an international standard for sending data packets over a CAN bus. The protocol allows for the transport of messages that exceed the eight byte maximum payload of CAN frames. ISO-TP segments longer messages into multiple frames, adding metadata (CAN-TP Header) that allows the interpretation of individual frames and reassembly into a complete message packet by the recipient. It can carry up to 232-1 (4294967295) bytes of payload per message packet starting from the 2016 version. Prior versions were limited to a maximum payload size of 4095 bytes. In the OSI model, ISO-TP covers the layer 3 (network layer) and 4 (transport layer). The most common application for ISO-TP is the transfer of diagnostic messages with OBD-II equipped vehicles using KWP2000 and UDS, but is used broadly in other application-specific CAN implementations where one might need to send messages longer than what the CAN protocol physical layer allows (eight bytes for CAN, 64 bytes for CAN FD, and 2048 bytes for CAN-XL). ISO-TP can be operated with its own addressing as so-called Extended Addressing or without address using only the CAN ID (so-called Normal Addressing). Extended addressing uses the first data byte of each frame as an additional element of the address, reducing the application payload by one byte. For clarity the protocol description below is based on Normal Addressing with eight byte CAN frames. In total, six types of addressing are allowed by the ISO 15765-2 Protocol. ISO-TP prepends one or more metadata bytes to the payload data in the eight byte CAN frame, reducing the payload to seven or fewer bytes per frame. The metadata is called the Protocol Control Information, or PCI. The PCI is one, two or three bytes. The initial field is four bits indicating the frame type, and implicitly describing the PCI length. ISO 15765-2 is a part of ISO 15765 (headlined Road vehicles — Diagnostic communication over Controller Area Network (DoCAN)), which has the following parts: ISO 15765-1 Part 1: General information and use case definition ISO 15765-2 Part 2: Transport protocol and network layer services ISO 15765-3 Part 3: Implementation of unified diagnostic services (UDS on CAN) – replaced by ISO 14229-3 Road vehicles — Unified diagnostic services ISO 15765-4 Part 4: Requirements for emissions-related systems == List of protocol control information (PCI) field types == The ISO-TP defines four frame types: A message of seven bytes or less is sent in a single frame, with the initial byte containing the type (0) and payload length (1-7 bytes). With the 0 in the type field, this can also pass as a simpler protocol with a length-data format and is often misinterpreted as such. A message longer than 7 bytes requires segmenting the message packet over multiple frames. A segmented transfer starts with a First Frame. The PCI is two bytes in this case, with the first 4 bit field the type (type 1) and the following 12 bits the message length (excluding the type and length bytes). The recipient confirms the transfer with a flow control frame. The flow control frame has three PCI bytes specifying the interval between subsequent frames and how many consecutive frames may be sent (Block Size). For CAN FD, the ISO 15765-2 protocol has been extended for Single and First frame, to allow larger size values, but still backwards compatible with traditional ISO 15765. See CAN FD. The initial byte contains the type (type = 3) in the first four bits, and a flag in the next four bits indicating if the transfer is allowed (0 = Continue To Send, 1 = Wait, 2 = Overflow/abort). The next byte is the block size, the count of frames that may be sent before waiting for the next flow control frame. A value of zero allows the remaining frames to be sent without flow control or delay. The third byte is the minimum Separation Time (STmin), the minimum delay time between frames. STmin values up to 127 (0x7F) specify the minimum number of milliseconds to delay between frames, while values in the range 241 (0xF1) to 249 (0xF9) specify delays increasing from 100 to 900 microseconds. Note that the Separation Time is defined as the minimum time between the end of one frame to the beginning of the next. Robust implementations should be prepared to accept frames from a sender that misinterprets this as the frame repetition rate i.e. from start-of-frame to start-of-frame. Even careful implementations may fail to account for the minor effect of bit-stuffing in the physical layer. The sender transmits the rest of the message using Consecutive Frames. Each Consecutive Frame has a one byte PCI, with a four bit type (type = 2) followed by a 4-bit sequence number. The sequence number starts at 1 and increments with each frame sent (1, 2,..., F, 0, 1,...), with which lost or discarded frames can be detected. Each consecutive frame starts at 0, initially for the first set of data in the first frame will be considered as 0th data. So the first set of CF(Consecutive frames) start from 0x1. There afterwards when it reaches 0x2F, will be started from 0x20 (e.g. 0x21, 0x22, 0x23...0x2F, 0x20, 0x21...). The 12-bit length field (as indicated in the First Frame) allows up to 4095 bytes of user data in a segmented message, but in practice the typical application-specific limit is considerably lower because of receive buffer or hardware limitations. == Timing parameters == Timing parameters, such as P1 and P2 timers, have to be mentioned. == Standards == ISO 15765-2:2016 Road vehicles -- Diagnostic communication over Controller Area Network (DoCAN) -- Part 2: Transport protocol and network layer services

    Read more →
  • Integrated test facility

    Integrated test facility

    An integrated test facility (ITF) creates a fictitious entity in a database to process test transactions simultaneously with live input. ITF can be used to incorporate test transactions into a normal production run of a system. Its advantage is that periodic testing does not require separate test processes. However, careful planning is necessary, and test data must be isolated from production data. Moreover, ITF validates the correct operation of a transaction in an application, but it does not ensure that a system is being operated correctly. Integrated test facility is considered a useful audit tool during an IT audit because it uses the same programs to compare processing using independently calculated data. This involves setting up dummy entities on an application system and processing test or production data against the entity as a means of verifying processing accuracy.

    Read more →
  • CANaerospace

    CANaerospace

    CANaerospace is a higher layer protocol based on Controller Area Network (CAN) which has been developed by Stock Flight Systems in 1998 for aeronautical applications. == Background == CANaerospace supports airborne systems employing the Line-replaceable unit (LRU) concept to share data across CAN and ensures interoperability between CAN LRUs by defining CAN physical layer characteristics, network layers, communication mechanisms, data types and aeronautical axis systems. CANaerospace is an open source project, was initiated to standardize the interface between CAN LRUs on the system level. CANaerospace is continuously being developed further and has also been published by NASA as the Advanced General Aviation Transport Experiments Databus Standard in 2001. It found widespread use in aeronautical research worldwide. A major research aircraft that employs several CANaerospace networks for real-time computer interconnection is the Stratospheric Observatory for Infrared Astronomy (SOFIA), a Boeing 747SP with a 2.5m astronomic telescope. CANaerospace is also frequently used in flight simulation and connects entire aircraft cockpits (i.e. in Eurofighter Typhoon simulators) to the simulation host computers. In Italy CANaerospace is used as UAV data bus technology. Furthermore, CANaerospace serves as communication network in several general aviation avionics systems. The CANaerospace interface definition closes the gap between the ISO/OSI layer 1 and 2 CAN protocol (which is implemented in the CAN controller itself) and the specific requirements of distributed systems in aircraft. It may be used as a primary or ancillary avionics network and was designed to meet the following requirements: Democratic network: CANaerospace does not require any master/slave relationships between LRUs or a "bus controller", thereby avoiding a potential single source of failure. Every node in the network has the same rights for participation in the bus traffic. Self-identifying message format: Each CANaerospace message contains information about the type of the data and the transmitting node. This allows the data to be unambiguously recognized at each receiving node. Continuous Message Numbering: Each CANaerospace message contains a continuously incremented number which allows coherent processing of messages in the receiving stations. Message Status Code: Each CANaerospace message contains information about the integrity of the data is conveying. This allows receiving stations to evaluate the quality of the received data and to react accordingly. Emergency Event Signaling: CANaerospace defines a mechanism that allows each node to transmit information about exception or error situations. This information can be used by other stations to determine the network health. Node Service Interface: As an enhancement to CAN, CANaerospace provides a means for individual stations on the network to communicate with each other using connection-oriented and connectionless services. Predefined CAN Identifier Assignment: CANaerospace offers a predefined identifier assignment list for normal operation data. In addition to the predefined list, user-defined identifier assignment lists may be used. Ease of Implementation: The amount of code to implement CANaerospace is very little by design in order to minimize the effort for testing and certification of flight safety critical systems. Openness to Extensions: All CANaerospace definitions are extendable to provide flexibility for future enhancements and to allow adaptions to the requirements of specific applications. Free Availability: No cost whatsoever apply for the use of CANaerospace. The specification can be downloaded from the Internet == Physical interface == To ensure interoperability and reliable communication, CANaerospace specifies the electrical characteristics, bus transceiver requirements and data rates with the corresponding tolerances based on ISO 11898. The bit timing calculation (baud rate accuracy, sample point definition) and robustness to electromagnetic interference are given special emphasis. Also addressed are CAN connector, wiring considerations and design guidelines to maximize electromagnetic compatibility. == Communication layers == The Bosch CAN specification itself allows messages being transmitted both periodically and aperiodically but does not cover issues like data representation, node addressing or connection-oriented protocols. CAN is entirely based on Anyone-to-Many (ATM) communication which means that CAN messages are always received by all stations in the network. The advantage of the CAN concept is inherent data consistency between all stations, the drawback is that it does not allow node addressing which is the basis for Peer-to-Peer (PTP) communication. Using CAN networks in aeronautical applications, however, demands a standard targeted to the specific requirements of airborne systems which implies that communication between individual stations in the network must be possible to enable the required degree of system monitoring. Consequently, CANaerospace defines additional ISO/OSI layer 3, 4 and 6 functions to support node addressing and unified ATM/PTP communication mechanisms. PTP communication allows to set up client/server interactions between individual stations in the network either temporarily or permanently. More than one of these interactions may be in effect at any given time and each node may be client for one operation and server for another at the same time. This CANaerospace mechanism is called "Node Service Concept" and allows i.e. to distribute system functions over several stations in the network or to control dynamic system reconfiguration in case of failure. The Node Service concept supports both connection-oriented and connectionless interactions like with TCP/IP and UDP/IP for Ethernet. Enabling both ATM and PTP communication for CAN requires the introduction of independent network layers to isolate the different types of communication. This is realized for CANaerospace by forming CAN identifier groups as shown in Figure 1. The resulting structure creates Logical Communication Channels (LCCs) and assigns a specific communication type (ATM, PTP) to each of the LCCs. User-defined LCCs provide the necessary freedom for designers and allow the implementation of CANaerospace according to the needs of specific applications. Figure 1: Logical Communication Channels for CANaerospace As a side effect, the CAN identifier groups in Figure 1 affect the priority of the message transmission in case of bus arbitration. The communication channels are therefore arranged according to their relative importance: Emergency Event Data Channel (EED): This communication channel is used for messages which require immediate action (i.e. system degradation or reconfiguration) and have to be transmitted with very high priority. Emergency Event Data uses ATM communication exclusively. High/Low Priority Node Service Data Channel (NSH/NSL): These communication channels are used for client/server interactions using PTP communication. The corresponding services may be of the connection-oriented as well as the connectionless type. NSH/NSL may also be used to support test and maintenance functions. Normal Operation Data Channel (NOD): This communication channel is used for the transmission of the data which is generated during normal system operation and described in the CANaerospace identifier assignment list. These messages may be transmitted periodically or aperiodically as well as synchronously or asynchronously. All messages which cannot be assigned to other communication channels shall use this channel. High/Low Priority User-Defined Data Channel (UDH/UDL): This channel is dedicated to communication which cannot, due to their specific characteristics, be assigned other channels without violating the CANaerospace specification. As long as the defined identifier range is used, the message content and the communication type (ATM, PTP) for these channels may be specified by the system designer. To ensure interoperability it is highly recommended that the use of these channels is minimized. Debug Service Data Channel (DSD): This channel is dedicated to messages which are used temporarily for development and test purposes only and are not transmitted during normal operation. As long as the defined identifier range is used, the message content and the communication type (ATM, PTP) for these channels may be specified by the system designer. == Data representation == The majority of the real-time control systems used in aeronautics employ "big endian" processor architectures. This data representation was therefore specified for CANaerospace as well. With big endian data representation, the most significant bit of any datum is arranged leftmost and transmitted first on CANaerospace as shown in Figure 2. Figure 2: "Big Endian" Data Representation for CANaerospace CANaerospace uses a self-identifying message

    Read more →
  • Data marketplace

    Data marketplace

    Data marketplace is an online platform for sharing and consuming data in the form of data assets or data products. Part of the data management stack, it aims to bring together data producers and data consumers (including business users and AI) in a single space, with the objective of increasing access to understandable, high-quality data. Included within its Data Marketplaces and Exchange (DME) category by Gartner, data marketplaces can provide data internally within an organization, externally with partners, or as open data. == Concept == Digitization has dramatically increased data volumes within organizations, with IDC predicting that by 2025 the world will contain 175 zettabytes of data. This has created a need to both manage this data and provide access to it to enable business intelligence and data analysis. However, data is often scattered within multiple systems (such as data warehouses and data lakes), and is in formats that are only understandable by technical experts, such as data scientists. According to IDC, 81% of IT leaders cite data silos as a major barrier to digital transformation. This means that data is not freely available to business users or external audiences such as partners or citizens, limiting its value, and holding back AI deployments. Data marketplaces solve this issue, providing seamless, self-service access to high-quality data in an understandable, secure and auditable manner. They break down data silos, reduce friction in data access, and enable a broader range of users, including non-technical profiles, to find, understand, and consume data autonomously. Data assets on the marketplace can be raw data, data visualizations or data products. Data marketplaces combine data management functions such as data governance with the user-friendly experience offered by e-commerce marketplaces in order to increase the usage of data. These include features such as powerful search engines, feedback, ratings, subscriptions and product description sheets. According to Gartner, data marketplaces provide infrastructure, transactional capabilities, and services for both consumers and providers of data assets. == History and timeline == Data marketplaces have evolved since they first emerged in terms of both their scope and usage. === 2000s === With the rise of the internet, data brokers began collecting, aggregating, distributing and selling personal, financial and marketing data to third parties online. Data marketplaces were deployed to monetize this data, making it discoverable and accessible to users, either through subscriptions or one-off purchases. At the same time, regulations, such as the US Open Government Initiative of 2009 and others around the world mandated greater transparency and data sharing with the public. Data sharing portals were created by public and government bodies to make this information available through self-service to all users. === 2010s === Due to the growth of big data and cloud platforms, cloud-based data exchange platforms emerged. These were offered by major infrastructure providers, and included Amazon Web Services (AWS) Data Exchange, Snowflake Data Marketplace, and the Google Cloud Platform. These platforms moved beyond simple data brokerage or open data by providing structured, catalogued data sharing between organizations. === 2020s === Driven by a need to increase internal data sharing with both business users and AI, organizations are now looking to adopt internal data marketplaces. These aim to democratize data consumption by providing seamless access for all employees and AI to trusted data, including data products, through an intuitive, e-commerce style experience. According to Gartner analyst Richa Jha, "by providing a single, governed platform for discovering, sharing, and scaling data products, data marketplaces drive productivity, collaboration, and ROI across the enterprise." == Data marketplaces within the overall data architecture == Data marketplaces provide a consumption and collaboration layer for data. That means they complement and integrate with other parts of the overall data architecture, including: === Data warehouses and data lakes === Data marketplaces connect to data sources, such as data warehouses or data lakes, to provide intuitive access to the data stored within them, enabling data to be shared and distributed to non-technical audiences. Access can be direct, with data and data products stored within the data marketplace or virtualized. === Data catalog === A data catalog provides a technical inventory of an organization's data estate. It collects technical information on all available data assets within an organization, based on metadata descriptions. This ensures traceability, and supports compliance and governance requirements. Unlike a data marketplace, a data catalog does not provide access to data, and is designed to be used by data professionals, rather than the business. This means it lacks an intuitive, understandable interface and is consequently not easily accessible by business users. === Data mesh === Data mesh is an architecture and framework for data management, first defined by Zhamak Dehghani in 2019. It aims to decentralize data ownership to delegate responsibility, empowering teams and focusing on delivering data to users in the form of self-service data products. The data marketplace is a central pillar of data mesh, providing intuitive access to these data products, and creating a collaboration space for data owners and data consumers. === Data product === Data products are high-value, consumable data assets that package high-quality data and associated tools to enable seamless usage by business users at scale. First defined by McKinsey in 2022, they have an identified owner, a service level agreement (SLA), and a reusability logic. == Core components of a data marketplace == A data marketplace typically includes specific core components: === E-commerce style interface === An e-commerce style experience that engages non-technical users, minimizes the need for training and builds confidence and trust in data. Look and feel should be customizable to incorporate corporate design guidelines to ensure consistency with other organizational applications. === Built-in data catalog === As in a standalone data catalog, this indexes all available data, based on metadata that includes type, source, owner, freshness, and quality level. === Discovery and search engine === This enables users to search, filter, explore and discover available data intuitively. As in an e-commerce marketplace, it should be intelligent, and provide relevant results based on natural language queries. === Access control and security management === Data marketplaces will contain data that needs to be protected under regulations such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in the United States, and sector-specific frameworks in industries such as finance and healthcare. To ensure both security and compliance while maximizing data consumption, the data marketplace should include granular access management and a full audit trail. === Semantic layer and business glossary === Different parts of the business are likely to use different terms to describe data. This leads to inconsistencies and an inability to share data across systems and teams. The semantic layer and business glossary standardize a shared vocabulary and common definitions of business indicators and concepts, providing a single language for data across the business and for AI agents. === Data governance mechanisms === These enforce corporate data governance policies, ensuring data traceability through data lineage, quality certification, usage monitoring, and continuous improvement through user feedback loops. === Collaboration features === As on an e-commerce website, a data marketplace should provide collaboration features that bring together data users and data owners. This includes the ability to rate data products, share use cases, and provide feedback to data owners, creating a community around data and supporting a data-driven culture. == Types of data marketplace == While they share the same underlying technology, data marketplaces can be deployed in three broad ways: === Internal data marketplaces === These bring together data from across an organization and make it available via self-service to employees from across the business. They aim to widen access to data and consequently to improve decision-making and reporting, increase performance and maximize efficiency. === Ecosystem data marketplaces === These extend sharing beyond a single organization, enabling multiple partners (public institutions, industry players, research bodies) to share and consume data within a governed framework. Data can be provided by all parties or simply by one organization and consumed by others. Ecosystem data marketplaces are particularly relevant in

    Read more →
  • Big memory

    Big memory

    Big-memory computers are machines with a large amount of random-access memory (RAM). The computers are required for databases, graph analytics, or more generally, high-performance computing, data science, and big data. Some database systems called in-memory databases are designed to run mostly in memory, rarely if ever retrieving data from disk or flash memory. See list of in-memory databases. == Details == The performance of big-memory systems depends on how the central processing units (CPUs) access the memory, via a conventional memory controller or via non-uniform memory access (NUMA). Performance also depends on the size and design of the CPU cache. Performance also depends on operating system (OS) design. The huge pages feature in Linux and other OSes can improve the efficiency of virtual memory. The transparent huge pages feature in Linux can offer better performance for some big-memory workloads. The "Large-Page Support" in Microsoft Windows enables server applications to establish large-page memory regions which are typically three orders of magnitude larger than the native page size.

    Read more →
  • LRE Map

    LRE Map

    The LRE Map (Language Resources and Evaluation) is a freely accessible large database on resources dedicated to Natural language processing. The original feature of LRE Map is that the records are collected during the submission of different major Natural language processing conferences. The records are then cleaned and gathered into a global database called "LRE Map". The LRE Map is intended to be an instrument for collecting information about language resources and to become, at the same time, a community for users, a place to share and discover resources, discuss opinions, provide feedback, discover new trends, etc. It is an instrument for discovering, searching and documenting language resources, here intended in a broad sense, as both data and tools. The large amount of information contained in the Map can be analyzed in many different ways. For instance, the LRE Map can provide information about the most frequent type of resource, the most represented language, the applications for which resources are used or are being developed, the proportion of new resources vs. already existing ones, or the way in which resources are distributed to the community. == Context == Several institutions worldwide maintain catalogues of language resources (ELRA, LDC, NICT Universal Catalogue, ACL Data and Code Repository, OLAC, LT World, etc.) However, it has been estimated that only 10% of existing resources are known, either through distribution catalogues or via direct publicity by providers (web sites and the like). The rest remains hidden, the only occasions where it briefly emerges being when a resource is presented in the context of a research paper or report at some conference. Even in this case, nevertheless, it might be that a resource remains in the background simply because the focus of the research is not on the resource per se. == History == The LRE Map originated under the name "LREC Map" during the preparation of LREC 2010 conference. More specifically, the idea was discussed within the FlaReNet project, and in collaboration with ELRA and the Institute of Computational Linguistics of CNR in Pisa, the Map was put in place at LREC 2010. The LREC organizers asked the authors to provide some basic information about all the resources (in a broad sense, i.e. including tools, standards and evaluation packages), either used or created, described in their papers. All these descriptors were then gathered in a global matrix called the LREC Map. The same methodology and requirements from the authors has been then applied and extended to other conferences, namely COLING-2010, EMNLP-2010, RANLP-2011, LREC 2012, LREC 2014 and LREC 2016. After this generalization to other conferences, the LREC Map has been renamed as the LRE Map. == Size and content == The size of the database increases over time. The data collected amount to 4776 entries. Each resource is described according to the following attributes: Resource type, e.g. lexicon, annotation tool, tagger/parser. Resource production status, e.g. newly created finished, existing-updated. Resource availability, e.g. freely available, from data center. Resource modality, e.g. speech, written, sign language. Resource use, e.g. named entity recognition, language identification, machine translation. Resource language, e.g. English, 23 European Union languages, official languages of India. == Uses == The LRE map is a very important tool to chart the NLP field. Compared to other studied based on subjective scorings, the LRE map is made of real facts. The map has a great potential for many uses, in addition to being an information gathering tool: It is a great instrument for monitoring the evolution of the field (useful for funders), if applied in different contexts and times. It can be seen as a huge joint effort, the beginning of an even larger cooperative action not just among few leaders but among all the researchers. It is also an "educational" means towards the broad acknowledgment of the need of meta-research activities with the active involvement of many. It is also instrumental in introducing the new notion of "citation of resources" that could provide an award and a means of scholarly recognition for researchers engaged in resource creation. It is used to help the organization of the conferences of the field like LREC. == Derived matrices == The data were then cleaned and sorted by Joseph Mariani (CNRS-LIMSI IMMI) and Gil Francopoulo (CNRS-LIMSI IMMI + Tagmatica) in order to compute the various matrices of the final FLaReNet reports. One of them, the matrix for written data at LREC 2010 is as follows: English is the most studied language. Secondly, come French and German languages and then Italian and Spanish. == Future == The LRE Map has been extended to Language Resources and Evaluation Journal and other conferences.

    Read more →
  • Story (social media)

    Story (social media)

    In social media, a story is a function in which the user tells a narrative or provides status messages and information in the form of short, time-limited clips in an automatically running sequence. == Definition == A story is a short sequence of images, videos, or other social media content, which can be accompanied by backgrounds, music, text, stickers, animations, filters or emojis. Social media platforms typically advance through the sequence automatically when presenting a story to a viewer. Although the sequential nature of stories can be used to tell a narrative, the pieces of a story can also be unrelated. Social media platforms that offer stories will typically have a primary story for each user which consists of everything the user posted to their story over a certain period of time, usually the most recent 24 hours. Most stories cannot be changed afterwards and are only available for a short time. Stories are almost exclusively created on a mobile device such as a smartphone or tablet computer and are usually displayed vertically. == History == In October 2013, Snapchat first introduced the story function as a series of Snaps that can together tell a narrative through a chronological order, with each Snap being viewable by all of the poster's friends and deleted after 24 hours. Stories soon surpassed private Snaps to become Snapchat's most-viewed type of post. After 2015, Snapchat introduced a feature allowing users to post private stories viewable by a chosen subset of their friends. Later other apps would copy this feature. In August 2016, Instagram introduced a stories function that deletes the content after 24 hours. Various commenters have accused the site of copying Snapchat. In February 2017, the instant messenger WhatsApp introduced the Now Status stories function in beta, which was later renamed Status. In March 2017, a story function was introduced in Facebook Messenger. In February 2018, Google launched AMP Stories, bringing a story-style format to certain Google search results on mobile devices. In August 2018, YouTube introduced a stories function that initially was limited to pictures, but was later expanded to support short video clips. The feature was shut down in June 2023. In August 2018, the GIF website Giphy introduced a story function. In March 2022, TikTok added a story feature which allowed users to create 15 second long videos that delete after 24 hours. In June 2023, Telegram CEO Pavel Durov announced stories for Telegram would be released in July 2023. In July 2023, the feature was released for premium users, and in August 2023 it was rolled out for all users. == User motivations == In 2022, a study performed by Jia-Dai (Evelyn) Lu and Jhih-Syuan (Elaine) Lin examined the various motivations for updating stories on Instagram. The researchers found a new configuration of motivations for using Instagram Stories: exploration, self-enhancement, perceived functionality, entertainment, social sharing, relationship building, novelty, and surveillance. The findings also highlighted that contribution and creation activities are likely to result in positive emotions, while creation alone predicts negative emotions while updating stories on Instagram. == Usage statistics == In 2019, around 1.5 billion people worldwide every day on average used the stories function in a social network or messenger. Younger people in particular use this function. More than 20% of people aged 18 to 24 use Instagram stories, while it is just under 2% of those over 55. In a Facebook survey of 18,000 participants from 12 countries, 68% said they used the stories function at least once a month. Stories in the areas of fashion and tourism are particularly popular. The website Fanpage Karma analyzed several Instagram accounts and determined the average reach of posts and stories per follower, concluding that posts have a higher reach than stories, which often have less than half the reach.

    Read more →
  • Social computing

    Social computing

    Social computing is an area of computer science that is concerned with the intersection of social behavior and computational systems. It is based on creating or fostering existing social conventions and social contexts through the use of software and technology. Blogs, email, instant messaging, social network services, wikis, social bookmarking and other instances of what is often called social software illustrate ideas from social computing. The rise in social computing is attributed to the prevalence of personal devices and increased overall computing power. This enables a growing number of users to participate in sharing content and interact with another. == Definitions == Humans—and human behavior—are profoundly social. Humans tend to orient to one another and develop abilities to interact with each other and other species. This ranges from expression and gesture through spoken, written, and body language. Humans are influenced by the behavior of those around them and can rely on social context and cues to make decisions. An example of a behavior relying on social contexts is applauding at the end of the play. This is based on the context that the show ended, and other audience members are applauding. Social information provides a basis for inferences, planning, and coordinating activity. == Examples == Common tools include blogs, email, instant messaging, social networking sites, wikis, and social bookmarking platforms. These technologies enable users to generate content, share knowledge, and interact in real time. == Applications == The rise of social computing has highlighted opportunities for businesses. Businesses are interacting on social computing platforms and investing in facilities to support and research social computing.Business models can leverage the massive customer bases that accumulate through social computing channels. Some organizations have started their own blogs and networks (McAfee, 2006, Joe, 2005). Organizations from diverse industry sectors such as Google, Cisco, and Fox, have sought to acquire or invest in successful social computing enterprises. A business blog can serve as a source of information and promotion for the company. This allows the company to share content about the company and their initiatives. Businesses have also interacted with social computing to market themselves and interact with customers. A notable example is Wendy's with their X (formerly Twitter) account. The account was primarily used to promote business promotions and interact with users in a playful or meaningful way. E-commerce web sites have allowed users to leave reviews and feedback on purchases which has improved online shopping experience for sellers and consumers.As another example of social computing’s business applications, many e-commerce Web sites have adopted online product/vendor feedback/reputation systems. Such systems provide an asynchronous platform for the consumer community to share experiences collectively and influence their purchasing behavior. They also provide a vehicle for eliciting feedback information valuable to the vendors and e-commerce site operators.Consumers can use the feedback systems to make a more educated choice on a purchase by comparing reviews between products or vendors. Sellers can track consumer behaviors and trends regarding a product and adjust their supply according to the demand. == Challenges and criticism == Social computing raises several concerns related to privacy, data security, and algorithmic bias. The widespread collection and analysis of user-generated data can lead to ethical dilemmas, especially when users are unaware of how their information is used. Critics also highlight issues of digital labor, surveillance, and the spread of misinformation, which can influence public opinion and social dynamics. === Term appearance === The term appeared in the mid 1990s after technology advancements and development of the web. In 1994, the concept of social computing was first proposed by Schuler. He thought, "Social computing is a computing application, with software as the medium or focus of social relationships." === Premise === The premise of social computing is that it is possible to design digital systems that support useful functionality by making socially produced information available to their users. This information may be provided directly, as when systems show the number of users who have rated a review as helpful or not. Or the information may be provided after being filtered and aggregated, as is done when systems recommend a product based on what else people with similar purchase history have purchased. Alternatively, the information may be provided indirectly, as is the case with Google's page rank algorithms which orders search results based on the number of pages that (recursively) point to them. In all of these cases, information that is produced by a group of people is used to provide or enhance the functioning of a system. Social computing is concerned with systems of this sort and the mechanisms and principles that underlie them. Social computing can be defined as follows: "Social Computing" refers to systems that support the gathering, representation, processing, use, and dissemination of information that is distributed across social collectivities such as teams, communities, organizations, and markets. Moreover, the information is not "anonymous" but is significantly precise because it is linked to people, who are in turn linked to other people. More recent definitions, however, have foregone the restrictions regarding anonymity of information, acknowledging the continued spread and increasing pervasiveness of social computing. As an example, Hemmatazad, N. (2014) defined social computing as "the use of computational devices to facilitate or augment the social interactions of their users, or to evaluate those interactions in an effort to obtain new information." Social computing has to do with supporting "computations" that are carried out by groups of people, an idea that has been popularized in James Surowiecki's book, The Wisdom of Crowds. Examples of social computing in this sense include collaborative filtering, online auctions, reputation systems, computational social choice, tagging, and verification games. The social information processing page focuses on this sense of social computing. == History == === Technology infrastructure === Users were able to interact more with websites after the development of Web 2.0. This was an advancement from Web 1.0. Comode G. and Krishnamurthy B. (2008) note that "content creators were few in Web 1.0 with the vast majority of users simply acting as consumers of content." Web 2.0 provided functionalities that allowed for low-cost web-hosting services and introduced features with browser windows that used basic information structure and expanded it to as many devices as possible using HTTP, or Hypertext Transfer Protocol. Sometimes referred to as "Enterprise 2.0", a term derived from Web 2.0, social software for enterprise generally refers to the use of social computing in corporate intranets and in other medium- and large-scale business environments. It consisted of a class of tools that allowed for networking and social changes to businesses at the time. It was a layering of the business tools on Web 2.0 and brought forth several applications and collaborative software with specific uses. FinanceElectronic negotiation, which first came up in 1969 and was adapted over time to suit financial markets networking needs, represents an important and desirable coordination mechanism for electronic markets. Negotiation between agents (software agents as well as humans) allows cooperative and competitive sharing of information to determine a proper price. Recent research and practice has also shown that electronic negotiation is beneficial for the coordination of complex interactions among organizations. Electronic negotiation has recently emerged as a very dynamic, interdisciplinary research area covering aspects from disciplines such as Economics, Information Systems, Computer Science, Communication Theory, Sociology and Psychology.Social computing has become more widely known because of its relationship to a number of recent trends. These include the growing popularity of social software and Web 3.0, increased academic interest in social network analysis, the rise of open source as a viable method of production, and a growing conviction that all of this can have a profound impact on daily life. A February 13, 2006 paper by market research company Forrester Research suggested that: === Developments === PLATO was one of the earliest examples of social computing in a live production environment with initially hundreds and soon thousands of users. The PLATO computer system was developed by the University of Illinois at Urbana Champaign in 1960s. In the 70s, the system supported social software applications for multi-us

    Read more →
  • Data refuge

    Data refuge

    Data Refuge is a public and collaborative project designed to address concerns about federal climate and environmental data that is in danger of being lost. In particular, the initiative addresses five main concerns: What are the best ways to safeguard data? How do federal agencies play a crucial role in collecting, managing, and distributing data? How do government priorities impact data's accessibility? Which projects and research fields depend on federal data? Which data sets are of value to research and local communities, and why? Data Refuge began as a grassroots organization in opposition to government data on climate change and the environment not being archived systemically. Data Refuge's main goal is to collect and allocate data in multiple safe locations to create a sustainable way of archiving old and new data. Data Refuge was initiated in 2016 to protect federal climate and environmental data that is vulnerable under an administration that denies climate change. The system aims to make public research-quality copies of federal climate and environmental data. Data Refuge is supported by the National Geographic Foundation, private donors, Libraries+ Network, Preserving Electronic Governance Initiative (PEGI), the Union of Concerned Scientists (USC), and the Penn Program in Environmental Humanities (PPEH). == Types of data == Data Refuge collects public federal data on the climate and environment in the form of satellite imagery, PDFs, and stories. The data are stored in multiple trusted locations as they are less vulnerable if in only one location, and to ensure accessibility for researchers. Through the Data Rescue events, Data Refuge has accumulated 4 terabytes of data, 30,000 URLs, and 800 participants. === Storytelling === Data Refuge collects stories on vulnerable federal climate and environmental data through: surveys, oral history, photo essays, maps, video shorts, and animations. The stories are archived in a public bank that showcase how federal environmental data support health and safety in communities. Data Stories are collected at Data Rescue events, which are partnered with universities, city and town halls, and advocacy groups. Data stories are collected and used to emphasize the importance of Data Refuge, in how the data on climate change and the environment are being used by people in the United States and across the world for meaningful practices.

    Read more →
  • List of artificial intelligence journals

    List of artificial intelligence journals

    This is a list of notable peer-reviewed academic journals that publish research in the field of artificial intelligence (AI), including areas such as machine learning, computer vision, natural language processing, robotics, and intelligent systems. == General artificial intelligence == Artificial Intelligence (journal) – Elsevier Journal of Artificial Intelligence Research (JAIR) – AI Access Foundation Knowledge-Based Systems – Elsevier == Machine learning == Data Mining and Knowledge Discovery – Springer Machine Learning (journal) – Springer Journal of Machine Learning Research – Microtome Pattern Recognition (journal) – Elsevier Neural Networks (journal) – Elsevier Neural Computation (journal) – MIT Press Neurocomputing (journal) - Elsevier == Deep learning and neural computation == IEEE Transactions on Evolutionary Computation – IEEE IEEE Transactions on Neural Networks and Learning Systems – IEEE Nature Machine Intelligence – Springer Nature == Computer vision == International Journal of Computer Vision – Springer IEEE Transactions on Pattern Analysis and Machine Intelligence – IEEE Machine Vision and Applications – Springer == Natural language processing == Computational Linguistics (journal) – MIT Press Natural Language Processing Transactions of the Association for Computational Linguistics – ACL == Robotics and intelligent systems == IEEE Transactions on Robotics – IEEE Autonomous Robots – Springer Journal of Intelligent & Robotic Systems – Springer == Interdisciplinary and ethics in AI == AI & Society – Springer Artificial Life – MIT Press Philosophy & Technology – Springer Minds and Machines – Springer

    Read more →
  • Data definition specification

    Data definition specification

    In computing, a data definition specification (DDS) is a guideline to ensure comprehensive and consistent data definition. It represents the attributes required to quantify data definition. A comprehensive data definition specification encompasses enterprise data, the hierarchy of data management, prescribed guidance enforcement and criteria to determine compliance. == Overview == A data definition specification may be developed for any organization or specialized field, improving the quality of its products through consistency and transparency. It eliminates redundancy (since all contributing areas are referencing the same specification) and provides standardization and degrees of compliance, making it easier and more efficient to create, modify, verify, analyze and share information across the enterprise. To understand how a data definition specification works in an enterprise, we must look at the elements of a DDS. Writing data definitions, defining business terms (or rules) in the context of a particular environment, provides structure for an organization's data architecture. In developing these definitions, the words used must be traceable to clearly defined data. A data definition specification may be used in the following activities: Business intelligence Business process modeling Business rules management Data analysis and modeling Information architecture Metadata modeling Data mastering Report generation == Criteria == A data definition specification requires data definitions to be: Atomic – singular, describing only one concept. Commonly used and ambiguous terms should be defined. While a term refers to one concept, several words may be used in a term: File – A concept identifiable with one word File extension – A concept identifiable with more than one word Traceable – Mapped to a specific data element. In business, a term may be traced to an entity (for example, a customer) or an attribute (such as a customer's name). A term may be a value in a data set (such as gender), or designate the data set itself. Traceability indicates relationships in the data hierarchy. Consistent - Used in a standard syntax; if used in a specific context, the context is noted Accurate - Precise, correct and unambiguous, stating what the term is and is not Clear - Readily understood by the reader Complete - With the term, its description and contextual references Concise - To avoid circular references == Applications == === Enterprise data === A data definition specification was produced by the Open Mobile Alliance to document charging data. The document, the centralized catalog of data elements defined for interfaces, specifies the mapping of these data elements to protocol fields in the interfaces. Created for the exchange of financial data, Market Data Definition Language (MDDL) is an XML specification designed to enable the interchange of information necessary to account, to analyze, and to trade financial instruments of the world's markets. It defines an XML-based interchange format and common data dictionary on the fields needed to describe: (1) financial instruments, (2) corporate events affecting value and tradability, and (3) market-related, economic and industrial indicators. The principal function of MDDL is to allow entities to exchange market data by standardizing formats and definitions. MDDL provides a common format for market data so that it can be efficiently passed from one processing system to another and provides a common understanding of market data content by standardizing terminology and by normalizing the relationships of various data elements to one another ... From the user perspective, the goal of MDDL is to enable users to integrate data from multiple sources by standardizing both the input feeds used for data warehousing (i.e., define what's being provided by vendors) and the output methods by which client applications request the data (i.e., ensure compatibility on how to get data in and out of applications)." === Clinical submissions === The Clinical Data Interchange Standards Consortium, a global, multidisciplinary, non-profit organization, has established standards to support the acquisition, exchange, submission and archiving of clinical research data and metadata. CDISC standards are vendor-neutral, platform-independent and freely available from the CDISC website. The Case Report Tabulation Data Definition Specification (define.xml) draft version 2.0, the oldest data definition specification, is part of the evolution from the 1999 FDA electronic submission (eSub) guidance and electronic Common Technical Document (eCTD) documents specifying that a document describing the content and structure of included data be included in a submission. Define.xml was developed to automate the review process by generating a machine-readable data-definition document. Define.xml has standardized submissions to the Food and Drug Administration, reducing review times from over two years to several months. === Archival data === A data definition specification is the foundation of metadata for scientific data archiving. The Metadata Encoding and Transmission Standard (METS) uses one principle of a DDS: consistent use of key terms to catalog digital objects for global use. The METS schema is a flexible mechanism for encoding descriptive, administrative and structural metadata for a digital library object and expressing complex links between metadata, and can provide a useful standard for the exchange of digital-library objects between repositories. A similar effort is underway to preserve complex data associated with video-game archiving. Preserving Virtual Worlds attempted to address archival-format deficiencies, citing the lack of suitable documentation for interactive fiction and games at the bit level: specifically, the absence of "representation information" needed to map raw bits into higher-level data constructs. Preserving Virtual Worlds 2 is a research project expanding on initial efforts in this field.

    Read more →
  • Knapsack cryptosystems

    Knapsack cryptosystems

    Knapsack cryptosystems are cryptosystems whose security is based on the hardness of solving the knapsack problem. They remain quite unpopular because simple versions of these algorithms have been broken for several decades. However, that type of cryptosystem is a good candidate for post-quantum cryptography. The most famous knapsack cryptosystem is the Merkle-Hellman Public Key Cryptosystem, one of the first public key cryptosystems, published the same year as the RSA cryptosystem. However, this system has been broken by several attacks: one from Shamir, one by Adleman, and the low density attack. However, there exist modern knapsack cryptosystems that are considered secure so far: among them is Nasako-Murakami 2006. Knapsack cryptosystems, when not subject to classical cryptoanalysis, are believed to be difficult even for quantum computers. That is not the case for systems that rely on factoring large integers, like RSA, or computing discrete logarithms, like ECDSA, problems solved in polynomial time with Shor's algorithm.

    Read more →