Information seeking

Information seeking is the process or activity of attempting to obtain information in both human and technological contexts. Information seeking is related to, but different from, information retrieval (IR). == Compared to information retrieval == Traditionally, IR tools have been designed for IR professionals to enable them to effectively and efficiently retrieve information from a source. It is assumed that the information exists in the source and that a well-formed query will retrieve it (and nothing else). It has been argued that laypersons' information seeking on the internet is very different from information retrieval as performed within the IR discourse. Yet, internet search engines are built on IR principles. Since the late 1990s a body of research on how casual users interact with internet search engines has been forming, but the topic is far from fully understood. IR can be said to be technology-oriented, focusing on algorithms and issues such as precision and recall. Information seeking may be understood as a more human-oriented and open-ended process than information retrieval. In information seeking, one does not know whether there exists an answer to one's query, so the process of seeking may provide the learning required to satisfy one's information need. == In different contexts == Much library and information science (LIS) research has focused on the information-seeking practices of practitioners within various fields of professional work. Studies have been carried out into the information-seeking behaviors of librarians, academics, medical professionals, engineers, lawyers and mini-publics(among others). Much of this research has drawn on the work done by Leckie, Pettigrew (now Fisher) and Sylvain, who in 1996 conducted an extensive review of the LIS literature (as well as the literature of other academic fields) on professionals' information seeking. The authors proposed an analytic model of professionals' information seeking behaviour, intended to be generalizable across the professions, thus providing a platform for future research in the area. The model was intended to "prompt new insights... and give rise to more refined and applicable theories of information seeking" (1996, p. 188). The model has been adapted by Wilkinson (2001) who proposes a model of the information seeking of lawyers. Recent studies in this topic address the concept of information-gathering that "provides a broader perspective that adheres better to professionals' work-related reality and desired skills." (Solomon & Bronstein, 2021). == Theories of information-seeking behavior == A variety of theories of information behavior – e.g. Zipf's Principle of Least Effort, Brenda Dervin's Sense Making, Elfreda Chatman's Life in the Round – seek to understand the processes that surround information seeking. In addition, many theories from other disciplines have been applied in investigating an aspect or whole process of information seeking behavior. A review of the literature on information seeking behavior shows that information seeking has generally been accepted as dynamic and non-linear (Foster, 2005; Kuhlthau 2006). People experience the information search process as an interplay of thoughts, feelings and actions (Kuhlthau, 2006). Donald O. Case (2007) also wrote a good book that is a review of the literature. Information seeking has been found to be linked to a variety of interpersonal communication behaviors beyond question-asking, to include strategies such as candidate answers. Robinson's (2010) research suggests that when seeking information at work, people rely on both other people and information repositories (e.g., documents and databases), and spend similar amounts of time consulting each (7.8% and 6.4% of work time, respectively; 14.2% in total). However, the distribution of time among the constituent information seeking stages differs depending on the source. When consulting other people, people spend less time locating the information source and information within that source, similar time understanding the information, and more time problem solving and decision making, than when consulting information repositories. Furthermore, the research found that people spend substantially more time receiving information passively (i.e., information that they have not requested) than actively (i.e., information that they have requested), and this pattern is also reflected when they provide others with information. == Wilson's nested model of conceptual areas == The concepts of information seeking, information retrieval, and information behaviour are objects of investigation of information science. Within this scientific discipline a variety of studies has been undertaken analyzing the interaction of an individual with information sources in case of a specific information need, task, and context. The research models developed in these studies vary in their level of scope. Wilson (1999) therefore developed a nested model of conceptual areas, which visualizes the interrelation of the here mentioned central concepts. Wilson defines models of information behavior to be "statements, often in the form of diagrams, that attempt to describe an information-seeking activity, the causes and consequences of that activity, or the relationships among stages in information-seeking behaviour" (1999: 250).

IBM ALP

IBM Assembly Language Processor (ALP) is an assembler written by IBM for 32-bit OS/2 Warp (OS/2 3.0), which was released in 1994. ALP accepts source programs compatible with Microsoft Macro Assembler (MASM) version 5.1, which was originally used to build many of the device drivers included with OS/2. For OS/2 versions 3 and 4, ALP was distributed, along with other tools and documentation, as part of the Device Driver Kit (DDK). The DDK was withdrawn in 2004 as part of IBM's discontinuance of OS/2.

Web performance

Web performance refers to the speed in which web pages are downloaded and displayed on the user's web browser. Web performance optimization (WPO), or website optimization is the field of knowledge about increasing web performance. Faster website download speeds have been shown to increase visitor retention and loyalty and user satisfaction, especially for users with slow internet connections and those on mobile devices. Web performance also leads to less data travelling across the web, which in turn lowers a website's power consumption and environmental impact. Some aspects which can affect the speed of page load include browser/server cache, image optimization, and encryption (for example SSL), which can affect the time it takes for pages to render. The performance of the web page can be improved through techniques such as multi-layered cache, light weight design of presentation layer components and asynchronous communication with server side components. == History == In the first decade or so of the web's existence, web performance improvement was focused mainly on optimizing website code and pushing hardware limitations. According to the 2002 book Web Performance Tuning by Patrick Killelea, some of the early techniques used were to use simple servlets or CGI, increase server memory, and look for packet loss and retransmission. Although these principles now comprise much of the optimized foundation of internet applications, they differ from current optimization theory in that there was much less of an attempt to improve the browser display speed. Steve Souders coined the term "web performance optimization" in 2004. At that time Souders made several predictions regarding the impact that WPO as an "emerging industry" would bring to the web, such as websites being fast by default, consolidation, web standards for performance, environmental impacts of optimization, and speed as a differentiator. One major point that Souders made in 2007 is that at least 80% of the time that it takes to download and view a website is controlled by the front-end structure. This lag time can be decreased through awareness of typical browser behavior, as well as of how HTTP works. == Optimization techniques == Web performance optimization improves user experience (UX) when visiting a website and therefore is highly desired by web designers and web developers. They employ several techniques that streamline web optimization tasks to decrease web page load times. This process is known as front end optimization (FEO) or content optimization. FEO concentrates on reducing file sizes and "minimizing the number of requests needed for a given page to load." In addition to the techniques listed below, the use of a content delivery network—a group of proxy servers spread across various locations around the globe—is an efficient delivery system that chooses a server for a specific user based on network proximity. Typically the server with the quickest response time is selected. The following techniques are commonly used web optimization tasks and are widely used by web developers: Web browsers open separate Transmission Control Protocol (TCP) connections for each Hypertext Transfer Protocol (HTTP) request submitted when downloading a web page. These requests total the number of page elements required for download. However, a browser is limited to opening only a certain number of simultaneous connections to a single host. To prevent bottlenecks, the number of individual page elements are reduced using resource consolidation whereby smaller files (such as images) are bundled together into one file. This reduces HTTP requests and the number of "round trips" required to load a web page. Web pages are constructed from code files such JavaScript and Hypertext Markup Language (HTML). As web pages grow in complexity, so do their code files and subsequently their load times. File compression can reduce code files by about 40 percent, thereby improving site responsiveness. Web Caching Optimization reduces server load, bandwidth usage and latency. CDNs use dedicated web caching software to store copies of documents passing through their system. Many website platforms, such as SiteGround, IONOS, Wix, and Hostinger, rely on global CDNs and caching technologies to deliver faster page loads across different geographical regions. Subsequent requests from the cache may be fulfilled should certain conditions apply. Web caches are located on either the client side (forward position) or web-server side (reverse position) of a CDN. Web browsers are also able to store content for re-use through the HTTP cache or web cache. Requests web browsers make are typically routed to the HTTP cache to validate if a cached response may be used to fulfill a request. If such a match is made, the response is fulfilled from the cache. This can be helpful for reducing network latency and costs associated with data-transfer. The HTTP cache is configured using request and response headers. Code minification distinguishes discrepancies between codes written by web developers and how network elements interpret code. Minification removes comments and extra spaces as well as crunch variable names in order to minimize code, decreasing files sizes by as much as 60%. In addition to caching and compression, lossy compression techniques (similar to those used with audio files) remove non-essential header information and lower original image quality on many high resolution images. These changes, such as pixel complexity or color gradations, are transparent to the end-user and do not noticeably affect perception of the image. Another technique is the replacement of raster graphics with resolution-independent vector graphics. Vector substitution is best suited for simple geometric images. Lazy loading of images and video reduces initial page load time, initial page weight, and system resource usage, all of which have positive impacts on website performance. It is used to defer initialization of an object right until the point at which it is needed. The browser loads the images in a page or post when they are needed such as when the user scrolls down the page and not all images at once, which is the default behavior, and naturally, takes more time. == HTTP/1.x and HTTP/2 == Since web browsers use multiple TCP connections for parallel user requests, congestion and browser monopolization of network resources may occur. Because HTTP/1 requests come with associated overhead, web performance is impacted by limited bandwidth and increased usage. Compared to HTTP/1, HTTP/2 is binary instead of textual is fully multiplexed instead of ordered and blocked can therefore use one connection for parallelism uses header compression to reduce overhead allows servers to "push" responses proactively into client caches Instead of a website's hosting server, CDNs are used in tandem with HTTP/2 in order to better serve the end-user with web resources such as images, JavaScript files and Cascading Style Sheet (CSS) files since a CDN's location is usually in closer proximity to the end-user. == Metrics == In recent years, several metrics have been introduced that help developers measure various aspects of the performance of their websites. In 2019, Google introduced metrics such as Time to First Byte (TTFB), First Contentful Paint (FCP), First Paint (FP), First Input Delay (FID), Cumulative Layout Shift (CLS) and Largest Contentful Paint (LCP) allow for website owner to gain insights into issues that might hurt the performance of their websites making it seem sluggish or slow to the user. Other metrics including Request Count (number of requests required to load a page), DOMContentLoaded (time when HTML document is completely loaded and parsed excluding CSS style sheets, images, etc.), Above The Fold Time (content that is visible without scrolling), Round Trip Time, number of Render Blocking Resources (such as scripts, stylesheets), Onload Time, Connection Time, Total Page Size help provide an accurate picture of latencies and slowdowns occurring at the networking level which might slow down a site. Modules to measure metrics such as TTFB, FCP, LCP, FP etc are provided with major frontend JavaScript libraries such as React, NuxtJS and Vue. Google publishes a library, the core-web-vitals library that allows for easy measurement of these metrics in frontend applications. In addition to this, Google also provides the Lighthouse, a Chrome dev-tools component and PageSpeed Insight a site that allows developers to measure and compare the performance of their website with Google's recommended minimums and maximums. In addition to this, tools such as the Network Monitor by Mozilla Firefox help provide insight into network-level slowdowns that might occur during transmission of data.

Blend4Web

Blend4Web is a free and open source framework for creating and displaying interactive 3D computer graphics in web browsers. == Overview == The Blend4Web framework leverages Blender to edit 3D scenes. Content rendering relies on WebGL, Web Audio, WebVR, and other web standards, without the use of plug-ins. It is dual-licensed. The framework is distributed under the free and open source GPLv3 and, a non-free license - with the source code being hosted on GitHub. A 3D scene can be prepared in Blender and then exported as a pair of JSON and binary files to load in a web application. It can also be exported as a single, self-contained HTML file, in which exported data, the web player GUI, and the engine itself are packed. The HTML option is considered to be the simplest way. The resulting file, which has a minimum size of 1 MB, can be embedded in a web page using a standard iframe HTML element. Blend4Web-powered web applications can be deployed on social networking websites such as Facebook. The Blend4Web toolchain consists of JavaScript libraries, the Blender add-on, and a set of tools for tweaking 3D scene parameters, debugging, and optimization. Developed by Moscow-based company Triumph in 2010, Blend4Web was publicly released on March 28, 2014. At the end of 2017, the project founders Yuri and Alex Kovelenov quit Triumph to start the development of a new WebGL framework Verge3D. In October 2019, an "Absolutely new Blend4Web" was announced, planned to make developing 3D apps easier and to add a new marketplace where people can offer their 3D models. == Features == The framework has a number of components typically found in game engines, including a positional audio system, physics engine (a fork of Bullet ported to JavaScript), animation system, and an abstraction layer for game logic programming. Up to 8 different types of animations can be assigned to a single object, including skeletal and per-vertex animation. The speed and the direction of animation (forward/backward play), as well as particle system parameters (size, initial velocity, and count), can be changed through the API. Among other supported features are: scene data dynamic loading and unloading, subsurface scattering simulation, and image-based lighting. Some out-of-box options exist for rendering extended outdoor environments, including foliage-wind interaction, water, atmosphere, and sunlight simulation. One example demonstrating these effects is "The Farm" tech demo, which also features multiple animated NPCs and the ability to walk, interact with objects and drive a vehicle in first-person mode. Being based on the cross-browser WebGL API, Blend4Web runs in the majority of web browsers, including mobile ones. There are some caveats for browsers with experimental WebGL support, such as Internet Explorer. There are also applications developed to run on Tizen-powered devices such as the Samsung Gear S2 smartwatch. Other features include: draw call batching, hidden surface determination, threaded physics simulation and ocean simulation. In version 14.09, Blend4Web introduced the possibility of adding interactivity to 3D scenes using a visual programming tool. The tool is reminiscent of the BGE's logic editor as it uses logic blocks that are placed inside Blender. It plays back animation tracks authored by an artist when the user interacts with predefined 3D objects. Since version 15.03, Blend4Web has supported attaching HTML elements (such as information windows) to 3D objects ("annotations") and copying objects in run time ("instancing"). The following post-processing effects are supported: glow, bloom, depth of field, crepuscular rays, motion blur, and screen space ambient occlusion. == Virtual reality and augmented reality == Virtual reality devices have been supported since the end of 2015. Specifically, Oculus Rift head-mounted display works over experimental WebVR API. The software also now includes preliminary support for gamepads, based on the Gamepad API. In 2017, the option to author augmented reality content was added. The system is based on the open-source tracking library ARToolKit and uses the WebRTC protocols. Starting from version 17.08, finger tracking is supported through the Leap Motion device. == Blender integration == The Blender add-on is written in Python and C and can be compiled for the Linux x86/x64, OS X x64, and MS Windows x86/x64 platforms. A Blend4Web-specific profile can be activated in the add-on settings. When switching to this profile, the Blender interface changes so that it only reveals settings relevant to Blend4Web. Blend4Web supports a set of Blender-specific features such as the node material editor (a tool for visual shader programming) and the particle system. There is basic support for Blender's non-linear animation (NLA) editor for creating simple scenarios. Blend4Web is based on Blender's real-time GLSL rendering engine, which users are recommended to use in order to enable WYSIWYG editing. == Notable uses == NASA developed an interactive web application called Experience Curiosity to celebrate the 3rd anniversary of the Curiosity rover landing on Mars. This Blend4Web-based app makes it possible to operate the rover, control its cameras and the robotic arm, and reproduce some of the prominent events of the Mars Science Laboratory mission. The application got presented at the beginning of the WebGL section at SIGGRAPH 2015. Experience Curiosity was ported to Verge3D for Blender in 2018 with several performance improvements and bug fixes. A General Motors authorized dealer in the United Arab Emirates has placed a functional Chevrolet Camaro 3D configurator on its website. Greenpeace created interactive 3D infographics to back Greenpeace's Detox campaign in Russia. Tallink featured an interactive 3D presentation of its MS Megastar vessel to allow visitors to browse details of the ship.

International Teletraffic Congress

The International Teletraffic Congress (ITC) is the first international conference in networking science and practice. It was created in 1955 by Arne Jensen to initially cater to the emerging need to understand and model traffic in telephone networks using stochastic methodologies, and to bring together researchers with these considerations as a common theme. Up through World War II, teletraffic research was done mainly by engineers and mathematicians working in telephone companies. Most of their work was published in local or company journals. In 1955, however, the field acquired a formal, international, institutional structure, with the organization of the first International Teletraffic Congress (ITC). Over the years, it has broaden its scope to address a wide spectrum ranging from the mathematical theory of traffic processes, stochastic system modelling and analysis, traffic and performance measurements, network management, traffic engineering to network capacity planning and cost optimization, including network economics and reliability for various types of networks. ITC served as a forum for all theoretical fundamentals and engineering practices for large-scale deployment and operation of telecommunications networks. Since its inception, ITC witnessed the evolution of communications and networking: the influence of computer science on telecommunication, the advent of the Internet and the massive deployment of mobile communications and optics, the appearance of peer-to-peer networking and social networks, the ever increasing speed and flexibility of new communication technologies, networks, user devices, and applications, and the ever changing operation challenges arising from this development. ITC documented this evolution with contemporary measurement studies, performance analyses of new technologies, recommendations for provisioning and configuration, and greatly contributed to the methodological toolbox of network scientists. Today, with its conferences, specialist seminars, regional seminars, training courses and publications, the ITC aims at a worldwide forum for all questions related to network and service performance, management, and assessment, both present and futuristic. The notion of traffic is broadly used to encompass data traffic from the MAC layer all the way to application traffic in the application layer. The scope of ITC is thus ranging all issues embedding operations, design, planning, economics and performance analysis of current and emerging communication networks and services, to be addressed by applying a variety of tools from different fields, such as Stochastic Processes, Information theory, Control theory, Signal and Processing, Game theory and optimization techniques, Statistical methodologies and Artificial Intelligence techniques. The target audience of such issues is experts from research organizations, universities, equipment vendors and suppliers, network operators, service providers, system integrators and international technical organizations, guaranteeing a well-balanced contribution from theory, application, and practice. The general goal remains to bring researchers and practitioners together toward operational understanding of all types of current and future networks. The ITC is ruled by the International Advisory Council (IAC) which gathers a number of technical experts, from universities and the research arms of key corporations in the industry, from countries having a strong tradition in teletraffic development. The IAC responsibilities are to disseminate information on teletraffic which is of interest for the whole community and: to select the locations of Plenary Congresses and to ensure their high-level technical programme to support Specialist Seminars on specific topics of current interest to promote Regional Seminars for the dissemination of teletraffic concepts in developing countries to facilitate the liaison activity with the ITU through participation in the standardization process and in the Development Programme The technical program and the organization of each ITC event remains within the responsibilities of the hosting country, but with significant IAC support to guarantee that the event is consistent with the quality standards established during the previous congresses. The ITC Plenary Congresses were scheduled tri-annually from 1955 until 1995 when the interval became bi-annual to account for the ever-accelerating development of network technologies, products and services and the associated dramatic increases in network demands. Similarly, to better cover the impact of dramatic changes undergoing in the field of computer and communication systems, networks and usage, it has been decided to hold the Plenary Congress on an annual basis from 2009. == Content == Teletraffic science is the traditional term for all theoretical fundamentals and engineering practices to describe data flows in telecommunication networks, the performance of the usage of network resources, procedures for sizing of resources and engineering the networks for given traffic load and quality of service requirements. For more than 50 years of the 20th century, traffic or teletraffic has been identified primarily with telephone networks. With the huge development of computers, stored program control of network nodes and computer communication, the traditional teletraffic science field naturally extended to computer networks, mobile and wireless/optical networks, and for a wide spectrum of new applications. The convergence between the voice network, the Internet, the television and mobility raised new questions that request new models and tools to be developed. In addition, the development of community networks, home networking, multiple access networking technologies, and the advent of pervasive and ambient communications dictates new challenges to be addressed. Today, ITC addresses the emerging paradigms such as an increasing diversity of distributed applications and services over various media like mobile/optical networks, enabling new markets and economy. ITC has steered the evolutions in communications since its creation in 1955 and remains at the forefront of innovation regarding modeling and performance. The scientific roots of communications traffic are based on the theory of probability and stochastic processes, modelling and performance evaluation. Modelling is the key for the mathematical description and quantitative performance analysis. Traffic flows are described by stochastic processes with complex dependencies which have to be validated by traffic measurements. Modelling also includes operational properties of resource control reflected by service strategies such as queueing disciplines, admission control, and routing. The results of such performance analyses are used for resource dimensioning (sizing), resource management, and network optimization while providing targeted Quality of Service. Teletraffic science is closely related to methods of operation research (queueing theory, optimization, forecasting) and computational sciences (simulation technology distributed systems). In this context, ITC represents a wide community of researchers and practitioners and is regularly organizing events like Congresses, Specialist Seminars and Workshops in order to discuss the latest changes in the modelling, design and performance of communication systems, networks and services. === The evolution of technologies of the 20th century === ITC has been witnessing the change of communication and networking technologies which are reflected in the proceedings and programs of the congresses. The specialist seminars and the motto of the congresses thereby reflect the hot topics of that time and the evolution. Selected topics of the 70's, 80's and 90's were 1998: Traffic Issues related to Multimedia and Nomadic Communications 1995: Traffic Modeling and Measurement in Broadband and Mobile Communications 1990: Broadband Technologies: Architectures, Applications, Control and Performance 1986: ISDN Traffic Issues 1984: Fundamentals of Teletraffic Theory 1977: Modeling of SPC Exchanges and Data Networks === Recent topics in the 21st century === With the rise of the Internet, new networking paradigms and technologies but also new challenges emerged: 2020: Teletraffic in the era of beyond-5G and AI 2019: Networked Systems and Services 2018: Teletraffic in the Smart World 2017: Ubiquitous, software-based, and sustainable networks and services 2016: Digital Connected World 2015: Traffic, Performance and Big Data 2014: Towards a Sustainable World 2013: Energy Efficient and Green Networking 2010: Multimedia Applications - Traffic, Performance and QoE 2009: Network Virtualization - Concepts and Performance 2008: Future Internet Design and Experimental Facilities 2008: Quality of Experience 2002: Internet Traffic Engineering and Traffic Management == Arne Jensen Lifetime Achievement Awards == The Arne Jensen Lifetime A

Leakage (machine learning)

In statistics and machine learning, leakage (also known as data leakage or target leakage) refers to the use of information during model training that would not be available at prediction time. This results in overly optimistic performance estimates, as the model appears to perform better during evaluation than it actually would in a production environment. Leakage is often subtle and indirect, making it difficult to detect and eliminate. It can lead a statistician or modeler to select a suboptimal model, which may be outperformed by a leakage-free alternative. == Leakage modes == Leakage can occur at multiple stages of the machine learning workflow. Broadly, its sources can be divided into two categories: those arising from features and those arising from training examples. === Feature leakage === Feature or column-wise leakage is caused by the inclusion of columns which are one of the following: a duplicate label, a proxy for the label, or the label itself. These features, known as anachronisms, will not be available when the model is used for predictions, and result in leakage if included when the model is trained. For example, including a "MonthlySalary" column when predicting "YearlySalary"; or "MinutesLate" when predicting "IsLate". === Training example leakage === Row-wise leakage is caused by improper sharing of information between rows of data. Types of row-wise leakage include: Premature featurization; leaking from premature featurization before Cross-validation/Train/Test split (must fit MinMax/ngrams/etc on only the train split, then transform the test set) Duplicate rows between train/validation/test (for example, oversampling a dataset to pad its size before splitting; or, different rotations/augmentations of a single image; bootstrap sampling before splitting; or duplicating rows to up sample the minority class) Non-independent and identically distributed random (non-IID) data Time leakage (for example, splitting a time-series dataset randomly instead of newer data in test set using a train/test split or rolling-origin cross-validation) Group leakage—not including a grouping split column (for example, Andrew Ng's group had 100k x-rays of 30k patients, meaning ~3 images per patient. The paper used random splitting instead of ensuring that all images of a patient were in the same split. Hence the model partially memorized the patients instead of learning to recognize pneumonia in chest x-rays.) A 2023 review found data leakage to be "a widespread failure mode in machine-learning (ML)-based science", having affected at least 294 academic publications across 17 disciplines, and causing a potential reproducibility crisis. == Detection == Data leakage in machine learning can be detected through various methods, focusing on performance analysis, feature examination, data auditing, and model behavior analysis. Performance-wise, unusually high accuracy or significant discrepancies between training and test results often indicate leakage. Inconsistent cross-validation outcomes may also signal issues. Feature examination involves scrutinizing feature importance rankings and ensuring temporal integrity in time series data. A thorough audit of the data pipeline is crucial, reviewing pre-processing steps, feature engineering, and data splitting processes. Detecting duplicate entries across dataset splits is also important. For language models, the Min-K% method can detect the presence of data in a pretraining dataset. It presents a sentence suspected to be present in the pretraining dataset, and computes the log-likelihood of each token, then compute the average of the lowest K of these. If this exceeds a threshold, then the sentence is likely present. This method is improved by comparing against a baseline of the mean and variance. Analyzing model behavior can reveal leakage. Models relying heavily on counter-intuitive features or showing unexpected prediction patterns warrant investigation. Performance degradation over time when tested on new data may suggest earlier inflated metrics due to leakage. Advanced techniques include backward feature elimination, where suspicious features are temporarily removed to observe performance changes. Using a separate hold-out dataset for final validation before deployment is advisable.

Digital Cinema Package

A Digital Cinema Package (DCP) is a collection of digital files used to store and convey digital cinema (DC) audio, image, and data streams. The term was popularized by Digital Cinema Initiatives, LLC in its original recommendation for packaging DC contents. However, the industry tends to apply the term to the structure more formally known as the composition. A DCP is a container format for compositions, a hierarchical file structure that represents a title version. The DCP may carry a partial composition (e.g. not a complete set of files), a single complete composition, or multiple and complete compositions. The composition consists of a Composition Playlist (in XML format) that defines the playback sequence of a set of Track Files. Track Files carry the essence (audio, image, subtitles), which is wrapped using Material eXchange Format (MXF). Track Files must contain only one essence type. Two track files at a minimum must be present in every composition (see SMPTE ST429-2 D-Cinema Packaging – DCP Constraints, or Cinepedia): a track file carrying picture essence, and a track file carrying audio essence. The composition, consisting of a Composition Playlist (CPL) and associated track files, are distributed as a Digital Cinema Package (DCP). A composition is a complete representation of a title version, while the DCP need not carry a full composition. However, as already noted, it is commonplace in the industry to discuss the title in terms of a DCP, as that is the deliverable to the cinema. The Picture Track File essence is compressed using JPEG 2000 and the Audio Track File carries a 24-bit linear PCM uncompressed multichannel WAV file. Encryption may optionally be applied to the essence of a track file to protect it from unauthorized use. The encryption used is AES 128-bit in CBC mode. In practice, there are two versions of composition in use. The original version is called Interop DCP. In 2009, a specification was published by SMPTE (SMPTE ST 429-2 Digital Cinema Packaging – DCP Constraints) for what is commonly referred to as SMPTE DCP. SMPTE DCP is similar but not backwards compatible with Interop DCP, resulting in an uphill effort to transition the industry from Interop DCP to SMPTE DCP. SMPTE DCP requires significant constraints to ensure success in the field, as shown by ISDCF. While legacy support for Interop DCP is necessary for commercial products, new productions are encouraged to be distributed in SMPTE DCP. == Technical specifications == The DCP root folder (in the storage medium) contains a number of files, some used to store the image and audio contents, and some other used to organize and manage the whole playlist. === Picture MXF files === Picture contents may be stored in one or more reels corresponding to one or more MXF files. Each reel contains pictures as MPEG-2 or JPEG 2000 essence, depending on the adopted codec. MPEG-2 is no longer compliant with the DCI specification. JPEG 2000 is the only accepted compression format. Supported frame rates are: SMPTE (JPEG 2000) 24, 25, 30, 48, 50, and 60 fps @ 2K 24, 25, and 30 fps @ 4K 24 and 48 fps @ 2K stereoscopic MXF Interop (JPEG 2000) – Deprecated 24 and 48 fps @ 2K (MXF Interop can be encoded at 25 frame/s but support is not guaranteed) 24 fps @ 4K 24 fps @ 2K stereoscopic MXF Interop (MPEG-2) – Deprecated 23.976 and 24 fps @ 1920 × 1080 Maximum frame sizes are 2048 × 1080 for 2K DC, and 4096 × 2160 for 4K DC. Common formats are: SMPTE (JPEG 2000) Flat (1998 × 1080 or 3996 × 2160), = 1.85:1 aspect ratio Scope (2048 × 858 or 4096 × 1716), ~2.39:1 aspect ratio HDTV (1920 × 1080 or 3840 × 2160), 16:9 aspect ratio (~1.78:1) (although not specifically defined in the DCI specification, this resolution is DCI compliant per section 8.4.3.2). Full (2048 × 1080 or 4096 × 2160) (~1.9:1 aspect ratio, official name by DCI is Full Container. Not widely accepted in cinemas.) MXF Interop (MPEG-2) – Deprecated Full Frame (1920 × 1080) 12 bits per component precision (36 bits total per pixel) XYZ' colorspace; the prime mark indicates gamma encoding (gamma=2.6) Maximum bit rate is 250 Mbit/s (1.3 MBytes per frame at 24 frame per second) === Sound MXF files === Sound contents are also stored in reels corresponding to picture reels in number and duration. In case of multilingual features, separate reels are required to convey different languages. Each file contains linear PCM essence. Sampling rate is 48,000 or 96,000 samples per second Sample precision of 24 bits Linear mapping (no companding) Up to 16 independent channels === Asset map file === List of all files included in the DCP, in XML format. === Composition playlist file === Defines the playback order during presentation. The order is saved in XML format in this file; each picture and sound reel is identified by its UUID. In the following example, a reel is composed by picture and sound: === Packing list file or package key list (PKL) === All files in the composition are hashed and their hash is stored here, in XML format. This file is generally used during ingestion in a digital cinema server to verify if data have been corrupted or tampered with in some way. For example, an MXF picture reel is identified by the following element: The hash value is the Base64 encoding of the SHA-1 checksum. It can be calculated with the command: openssl sha1 -binary "FILE_NAME" | openssl base64 === Volume index file === A single DCP may be stored in more than one medium (e.g., multiple hard disks). The XML file VOLINDEX is used to identify the volume order in the series. == 3D DCP == The DCP format is also used to store stereoscopic (3D) contents for 3D films. In this case, 48 frames exist for every second – 24 frames for the left eye, 24 frames for the right. Depending on the projection system used, the left eye and right eye pictures are either shown alternately (double or triple flash systems) at 48 fps or, on 4k systems, both left and right eye pictures are shown simultaneously, one above the other, at 24 fps. In triple flash systems, active shutter glasses are required whereas optical filtering such as circular polarisation is used in conjunction with passive glasses on polarized systems. Since the maximum bit rate is always 250 Mbit/s, this results in a net 125 Mbit/s for single frame, but the visual quality decrease is generally unnoticeable. == D-Box == D-Box codes for motion controlled seating (labelled as "Motion Data" in the DCP specification), if present, are stored as a monoaural WAV file on Sound Track channel 13. Motion Data tracks are unencrypted and not watermarked. == Creation == Most film producers and distributors rely on digital cinema encoding facilities to produce and quality control check a digital cinema package before release. Facilities follow strict guidelines set out in the DCI recommendations to ensure compatibility with all digital cinema equipment. For bigger studio release films, the facility will usually create a Digital Cinema Distribution Master (DCDM). A DCDM is the post-production step prior to a DCP. The frames are in XYZ TIFF format and both sound and picture are not yet wrapped into MXF files. A DCP can be encoded directly from a DCDM. A DCDM is useful for archiving purposes and also facilities can share them for international re-versioning purposes. They can easily be turned into alternative version DCPs for foreign territories. For smaller release films, the facility will usually skip the creation of a DCDM and instead encode directly from the Digital Source Master (DSM) the original film supplied to the encoding facility. A DSM can be supplied in a multitude of formats and color spaces. For this reason, the encoding facility needs to have extensive knowledge in color space handling including, on occasion, the use of 3D LUTs to carefully match the look of the finished DCP to a celluloid film print. This can be a highly involved process in which the DCP and the film print are "butterflied" (shown side by side) in a highly calibrated cinema. Less demanding DCPs are encoded from tape formats such as HDCAM SR. Quality control checks are always performed in calibrated cinemas and carefully checked for errors. QC checks are often attended by colorists, directors, sound mixers and other personnel to check for correct picture and sound reproduction in the finished DCP. == Accessibility == === Hearing impaired audio === A Hearing Impaired (HI) audio track is designed for people who are hearing-impaired to better hear dialog. Moviegoers can wear headphones which play this audio track synchronized with the film. Hearing Impaired audio is stored in the DCP on Sound Track channel 7. === Audio description === Audio description is narration for people who are blind or visually impaired. Audio description is stored in the DCP as "Visually Impaired-Native" (VI-N) audio on Sound Track channel 8. === Sign Language Video === A Sign Language Video track can be included in a DCP to allow for display of sign la