Out-of-bag error

Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging). Bagging uses subsampling with replacement to create training samples for the model to learn from. OOB error is the mean prediction error on each training sample xi, using only the trees that did not have xi in their bootstrap sample. Bootstrap aggregating allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations that were not used in the building of the next base learner. == Out-of-bag dataset == When bootstrap aggregating is performed, two independent sets are created. One set, the bootstrap sample, is the data chosen to be "in-the-bag" by sampling with replacement. The out-of-bag set is all data not chosen in the sampling process. When this process is repeated, such as when building a random forest, many bootstrap samples and OOB sets are created. The OOB sets can be aggregated into one dataset, but each sample is only considered out-of-bag for the trees that do not include it in their bootstrap sample. The picture below shows that for each bag sampled, the data is separated into two groups. This example shows how bagging could be used in the context of diagnosing disease. A set of patients are the original dataset, but each model is trained only by the patients in its bag. The patients in each out-of-bag set can be used to test their respective models. The test would consider whether the model can accurately determine if the patient has the disease. == Calculating out-of-bag error == Since each out-of-bag set is not used to train the model, it is a good test for the performance of the model. The specific calculation of OOB error depends on the implementation of the model, but a general calculation is as follows. Find all models (or trees, in the case of a random forest) that are not trained by the OOB instance. Take the majority vote of these models' result for the OOB instance, compared to the true value of the OOB instance. Compile the OOB error for all instances in the OOB dataset. The bagging process can be customized to fit the needs of a model. To ensure an accurate model, the bootstrap training sample size should be close to that of the original set. Also, the number of iterations (trees) of the model (forest) should be considered to find the true OOB error. The OOB error will stabilize over many iterations so starting with a high number of iterations is a good idea. Shown in the example to the right, the OOB error can be found using the method above once the forest is set up. == Comparison to cross-validation == Out-of-bag error and cross-validation (CV) are different methods of measuring the error estimate of a machine learning model. Over many iterations, the two methods should produce a very similar error estimate. That is, once the OOB error stabilizes, it will converge to the cross-validation (specifically leave-one-out cross-validation) error. The advantage of the OOB method is that it requires less computation and allows one to test the model as it is being trained. == Accuracy and Consistency == Out-of-bag error is used frequently for error estimation within random forests but with the conclusion of a study done by Silke Janitza and Roman Hornung, out-of-bag error has shown to overestimate in settings that include an equal number of observations from all response classes (balanced samples), small sample sizes, a large number of predictor variables, small correlation between predictors, and weak effects.

Example-based machine translation

Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning. == Translation by analogy == At the foundation of example-based machine translation is the idea of translation by analogy. When applied to the process of human translation, the idea that translation takes place by analogy is a rejection of the idea that people translate sentences by doing deep linguistic analysis. Instead, it is founded on the belief that people translate by first decomposing a sentence into certain phrases, then by translating these phrases, and finally by properly composing these fragments into one long sentence. Phrasal translations are translated by analogy to previous translations. The principle of translation by analogy is encoded to example-based machine translation through the example translations that are used to train such a system. Other approaches to machine translation, including statistical machine translation, also use bilingual corpora to learn the process of translation. == History == Example-based machine translation was first suggested by Makoto Nagao in 1984. He pointed out that it is especially adapted to translation between two totally different languages, such as English and Japanese. In this case, one sentence can be translated into several well-structured sentences in another language, therefore, it is no use to do the deep linguistic analysis characteristic of rule-based machine translation. == Example == Example-based machine translation systems are trained from bilingual parallel corpora containing sentence pairs like the example shown in the table above. Sentence pairs contain sentences in one language with their translations into another. The particular example shows an example of a minimal pair, meaning that the sentences vary by just one element. These sentences make it simple to learn translations of portions of a sentence. For example, an example-based machine translation system would learn three units of translation from the above example: How much is that X ? corresponds to Ano X wa ikura desu ka. red umbrella corresponds to akai kasa small camera corresponds to chiisai kamera Composing these units can be used to produce novel translations in the future. For example, if we have been trained using some text containing the sentences: President Kennedy was shot dead during the parade. and The convict escaped on July 15th., then we could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences. == Phrasal verbs == Example-based machine translation is best suited for sub-language phenomena like phrasal verbs. Phrasal verbs have highly context-dependent meanings. They are common in English, where they comprise a verb followed by an adverb and/or a preposition, which are called the particle to the verb. Phrasal verbs produce specialized context-specific meanings that may not be derived from the meaning of the constituents. There is almost always an ambiguity during word-to-word translation from source to the target language. As an example, consider the phrasal verb "put on" and its Hindustani translation. It may be used in any of the following ways: Ram put on the lights. (Switched on) (Hindustani translation: Jalana) Ram put on a cap. (Wear) (Hindustani translation: Pahenna)

Digital backlot

A digital backlot or virtual backlot is a motion-picture set that is neither a genuine location nor a constructed studio; the shooting takes place entirely on a stage with a blank background (often a greenscreen) that will later on project an artificial environment put in during post-production. Digital backlots are mainly used for genres such as science fiction, where building a real set would be too expensive or outright impossible. == Notable films == Among the first films to introduce the technique was Mini Moni the Movie by Shinji Higuchi in 2002, predated by Rest In Peace by Stolpskott Film (2000). Others include: === Released === Rest in Peace (Sweden, 2000) – Shot entirely with green-screen. Some sections fully CGI. Casshern (Japan, 2004) – Shot on celluloid. A few practical set pieces used. Able Edwards (United States, 2004) – Shot digitally on Canon XL1 cameras. Immortal (France, 2004) – Shot on celluloid. Also showed CGI characters interacting with live actors. Sky Captain and the World of Tomorrow (United States, 2004) – Shot digitally on Sony CineAlta cameras. Sin City (United States, 2005) – Shot digitally on CineAlta cameras. Three practical sets used. MirrorMask (United States/United Kingdom, 2005) – Shot on celluloid. 80% of film uses digital backlot. Some practical set pieces used. The Cabinet of Dr. Caligari (United States, 2005) – Shot digitally. 300 (United States, 2007) – Shot on celluloid. Two practical sets used. Speed Racer (United States, 2008) – Directed by the Wachowskis. Three practical sets used. The Spirit (United States, 2008) – Director Frank Miller shot the film with the same techniques he and Robert Rodriguez used on Sin City. Avatar (United States, 2009) – Directed by James Cameron. Two practical sets used. Goemon (Japan, 2009) – The second film from Casshern helmer Kazuaki Kiriya. Alice in Wonderland (United States, 2010) – Directed by Tim Burton. Practical sets used. Sin City: A Dame to Kill For (United States 2014) – Co-directed by Robert Rodriguez and Frank Miller. Sequel to Sin City. === Upcoming === Tribes of October

MX1 Ltd

MX1 was a global media services provider founded in July 2016 from a merger between digital media services companies, RR Media and SES Platform Services, and a wholly owned subsidiary of global satellite owner and operator, SES. In September 2019, MX1 was merged into the SES Video division and the MX1 brand dropped. Broadcast and streamed content management, playout, distribution, and monetisation services from both MX1 and SES Video are now provided under the SES name. Before merger with SES, MX1 claimed to manage more than 5 million media assets and every day to distribute more than 3,600 TV channels, manage the playout of over 525 channels, distribute content to more than 120 subscription VOD platforms, and deliver over 8,400 hours of online video streaming and more than 620 hours of premium sports and live events. == Services == MX1 video and media services are provided through a single hybrid, cloud and on-premises solution, called MX1 360, which enables video and media solutions including content and metadata management, archiving, localisation solutions, channel playout, VOD, online video (OTT) and content distribution. Services provided by MX1 include: === Content aggregation === Acquisition of content via satellite, fibre or IP with satellite downlinking services (for encryption, re-encryption and re-muxing into different platforms), fibre reception from any location, and IP reception via the public Internet. Live sports, news and entertainment production (including in-studio, outside broadcasting, and SNG) with mobile live streaming and video contribution. === Content management === Digital mastering including scanning, conversion, restoration, quality control and localisation/versioning. Content archiving including secure, cloud and on-premises digital storage, and disaster recovery services. Metadata packaging and platform validation to enhance content discovery, searchability and cataloguing. Playout preparation and delivery to any format. === Channel origination and playout === Managed TV channel origination in SD, HD and UHD including 3D graphics, and video and audio effects, using cloud-based solution accessible from any location, with live content insertion and operation, and 24/7 monitoring. === Online video/VOD services === Content preparation and management for online video, VOD, live streaming services and Online video platforms using an ultra-high capacity content delivery network, including subscriber management, apps, DRM, social media, advertising tools, monetisation tools, metadata management, and analytics. === Content delivery === Delivery in all video formats over hybrid distribution network of satellite (using over 150 platforms), fibre (60 digital media hubs worldwide) and the Internet with complete downlink/uplink turnaround services and OTT content delivery. == Locations == MX1 has 16 offices worldwide, the most recent opened in March 2017 in Seoul, South Korea, as well as media centres in UK (London), US (Hawley, PA), Israel (Emeq Ha'Ela), Romania (Bucharest) and at the headquarters in Unterföhring near Munich, Germany. In the early part of 2017, significant upgrades were made to MX1's US media centre in Hawley, Pennsylvania, including expanding its capabilities for US based and global content aggregation, management and delivery to support US broadcasters and content providers. == History == RRsat was founded in Israel by David Rivel, an electronics, computers and communications engineer in 1981 as a communications provider, and in 2014 changed its name to RR Media to reflect its expanding global service offering. In 2015, RR Media acquired Eastern Space Systems (ESS), a Romanian provider of content management and content distribution services and satellite transmission services provider, SatLink Communications. Digital Playout Centre GmbH (DPC) was founded in 1996 by German media company, Kirch to provide playout, multiplexing, satellite uplinks and other broadcast services to Kirch's Premiere pay-TV platform (now Sky Deutschland) and other private and public German broadcasters. In 2005, SES Astra (a subsidiary of SES Global, now SES) bought 100% of DPC from Premiere and the company renamed ASTRA Platform Services GmbH (APS). In 2012, to reflect the company's expanding worldwide reach, the name was changed to SES Platform Services. In February 2016, it was announced that SES Platform Services had agreed, subject to regulatory approvals, to purchase RR Media. The acquisition was completed in July 2016, with the merged company renamed MX1 and headed by Avi Cohen, the former CEO of RR Media. In October 2017, Cohen was replaced as CEO by Wilfred Urner, the former CEO of SES Platform Services, CEO of SES subsidiary, HD+ and Head of Media Platforms and Product Development, SES Video.

Digital heritage

The Charter on the Preservation of Digital Heritage of UNESCO defines digital heritage as embracing "cultural, educational, scientific and administrative resources, as well as technical, legal, medical and other kinds of information created digitally, or converted into digital form from existing analogue resources". Digital heritage also includes the use of digital media in the service of understanding and preserving cultural or natural heritage. The digitization of both cultural heritage and Natural heritage serves to enable the permanent access of current and future generations to culturally important objects ranging from literature and paintings to flora, fauna, or habitats. It is also used in the preservation and access of objects with enduring or significant historical, scientific, or cultural value including buildings, archeological sites, and natural phenomena. The main idea is the transformation of a material object into a virtual copy. It should not be confused with digital humanities, which uses digitizing technology to specifically help with research. There have been several debates concerning the efficiency of the process of digitizing heritage. Some of the drawbacks refer to the deterioration and technological obsolescence due to the lack of funding for archival materials and underdeveloped policies that would regulate such a process. Another main social debate has taken place around the restricted accessibility due to the digital divide that exists around the world. Nevertheless, new technologies enable easy, instant and cross boarder access to the digitized work. Many of these technologies include spatial and surveying technology to gain aerial or 3D images. Digital heritage is also used to monitor cultural heritage sites over years to help with preservation, maintenance, and sustainable tourism. It aims to observe any changes, diseases, or deterioration that may occur on objects. == Cultural and natural heritage == Digital Heritage that is not born-digital can be divided into two separate groups—digital cultural heritage and digital natural heritage. Digital cultural heritage is the maintenance or preservation of cultural objects through digitization. These are objects, in some cases entire cities, that are considered of cultural importance. These objects are sometimes able to be digitized or physically represented in minute detail. Digital cultural heritage also includes intangible heritage. These are things such as "oral traditions, customs, value systems, skills, traditional dances, diets, performances" and other unique features of a culture. Intangible heritage is particularly vulnerable to destruction due to urbanization. There are several projects and programs which concentrate on digital cultural heritage. One such project is Mapping Gothic France, which aims to document and preserve cathedrals across France using images, VR tours, laser scans, and panoramas. This allows for scientific and historical study and preservation of the cathedrals and also provides detailed access to the sites for anyone in the world. The aim of projects like these is to help with the preservation and restoration of cultural objects. After the fire at Notre-Dame de Paris in 2019, digital scans are a major component in the ongoing restoration. Digital natural heritage pertains to objects of natural heritage that are considered of cultural, scientific, or aesthetic importance. Digital heritage in this instance is used not only to grant access to these objects, but to monitor any changes over time, such as with plant or animal habitats. Geographic information systems are a form of technology that is used primarily in the study of natural heritage. Western Australia has one such digital heritage project where they have created a digital repository of native plants important to both the region and the Aboriginal people. This is in order to protect and preserve the important biological heritage of Western Australia. == Educational impact == The digitization of these heritage objects has impacts around the world and across many disciplines. The increase of digital items means that people, especially the youth, are able to learn about new objects and cultures online through various media. They provide viewers with a more in-depth experience with an item or place, instead of just an image. The media is also able to be curated to age- or educational-level appropriateness, making learning easier. Some of the technology used in education, especially in museums, includes mobile apps, virtual reality, social media, and video games. Cultural heritage institutions are using this technology to try to expand access, increase appreciation for these items, and to gain new viewpoints on their collections. Digital heritage also helps scientists, archeologists, or other historians and specialists collect data on these objects, providing more information on the objects and the past. Digital Heritage is still currently being studied and improved by several sectors invested in cultural and intellectual preservation. It is particularly of interest to museums, governments, and academic institutions. Research by these groups are creating new concepts, methodologies, and techniques for the implementation of digital heritage to protect this type of cultural and natural heritage. As new technologies are created, museums and other heritage institutions are provided with more ways of disseminating their information and engaging with the public. A lack of resources within certain groups may still hinder everyone from accessing digital heritage. == Technologies used == The digitization of cultural heritage is attained through several means. Some of the main technology used is spatial and surveying technology. Space archaeological technology - Observations from space satellites are non-intrusive and can be integrated with other technologies on the ground. It is used to photograph vast areas of earth and help with research. Remnants of ancient civilizations or other human objects are also able to be spotted via satellite imaging. Unmanned aerial vehicles - UAV, such as drones, are commonly used in digitization of cultural heritage objects. The Great Wall of China is one such site that has been digitized and analyzed through unmanned aerial vehicle investigation. The resulting images, 3-D scans, maps, and other data are used to evaluate and maintain the Great Wall. Laser Scanning - Laser scanning is used to scan an area and recreate spatially accurate depictions, such as a 3D model. Virtual and Augmented Reality - VR is used primarily for education but does have uses for reconstruction and research. It is used to provide users with an immersive experience, as though they are actually at the site. Geographic Information systems - GIS are used primarily to study objects and sites over time. It is also important in studying the socioeconomic status of the past. 3D Modeling - 3D modeling has become more widely used due to an increase in technology that works specifically with heritage sites. It is often used in tandem with GIS to reconstruct objects for restoration, documentation, preservation, and educational purposes. Data is collected using satellite or other aerial imaging and ground-based imaging. There is some concern about the accuracy and authenticity of these types of digital reconstructions and their effects on the sites themselves. A major barrier to digital heritage is the amount of resources it takes to undertake such projects, such as money, time, and technology. Money and the lack of qualified personnel are two that are considered the most obstructive. This is especially an issue in less developed areas or within underfunded groups such as minorities. == Virtual heritage == A particular branch of digital heritage, known as "virtual heritage", is formed by the use of information technology with the aim of recreating the experience of existing cultural heritage, as in (approximations of) virtual reality. It is hard to differentiate this branch from the core contribution of digital heritage which is storing the heritage data digitally. Parsinejad et al. developed two techniques for Digital Twinning of the architectural assets and representation of the physical assets virtually in the museum context. Two techniques are hand recording and digital recording and both have challenges in adoption and implementation of Digital Twin as a revolutionary concept. == Digital heritage stewardship == Digital heritage stewardship is a form of digital curation which is modeled after collaborative curation. Digital heritage stewardship means stepping away from typical curatorial practices (e.g. discovering, arranging, and sharing information, material, and/or content) in favor of practices which allow its stakeholders the opportunity to contribute historical, political, and social context and culture. The collaborative practice encourages the creation, engagement, and maintena

Multimedia database

A Multimedia database (MMDB) is a collection of related for multimedia data. The multimedia data include one or more primary media data types such as text, images, graphic objects (including drawings, sketches and illustrations) animation sequences, audio and video. A Multimedia Database Management System (MMDBMS) is a framework that manages different types of data potentially represented in a wide diversity of formats on a wide array of media sources. It provides support for multimedia data types, and facilitate for creation, storage, access, query and control of a multimedia database. == Contents of MMDB == A Multimedia Database (MMDB) hosts one or more multimedia data types (i.e. text, images, graphic objects, audio, video, animation sequences). These data types are broadly categorized into three classes: Static media (time-independent: image and graphic object). Dynamic media (time-dependent: audio, video and animation). Dimensional media(3D game and computer aided drafting programs). === Comparison of multimedia data types === Additionally, a Multimedia Database (MMDB) needs to manage additional information pertaining to the actual multimedia data. The information is about the following: Media data: the actual data representing an object. Media format data: information about the format of the media data after it goes through the acquisition, processing, and encoding phases. Media keyword data: the keyword descriptions, usually relating to the generation of the media data. Media feature data: content dependent data such as contain information about the distribution of colours, the kinds of textures and the different shapes present in an image. The last three types are called metadata as they describe several different aspects of the media data. The media keyword data and media feature data are used as indices for searching purpose. The media format data is used to present the retrieved information. == Requirements of Multimedia databases == Like the traditional databases, Multimedia databases should address the following requirements: Integration Data items do not need to be duplicated for different programs invocations Data independence Separate the database and the management from the application programs Concurrency control Allows concurrent transactions Persistence Data objects can be saved and re-used by different transactions and program invocations Privacy Access and authorization control Integrity control Ensures database consistency between transactions Recovery Failures of transactions should not affect the persistent data storage Query support Allows easy querying of multimedia data Multimedia databases should have the ability to uniformly query data (media data, textual data) represented in different formats and have the ability to simultaneously query different media sources and conduct classical database operations across them. (Query support) They should have the ability to retrieve media objects from a local storage device in a good manner. (Storage support) They should have the ability to take the response generated by a query and develop a presentation of that response in terms of audio-visual media and have the ability to deliver this presentation. (Presentation and delivery support) == Issues and challenges == Multimedia data consists of a variety of media formats or file representations including TIFF, BMP, PPT, IVUE, FPX, JPEG, MPEG, AVI, MID, WAV, DOC, GIF, EPS, PNG, etc. Because of restrictions on the conversion from one format to the other, the use of the data in a specific format has been limited as well. Usually, the data size of multimedia is large such as video; therefore, multimedia data often require a large storage. Multimedia database consume a lot of processing time, as well as bandwidth. Some multimedia data types such as video, audio, and animation sequences have temporal requirements that have implications on their storage, manipulation and presentation, but images, video and graphics data have special constraints in terms of their content. == Application areas == Examples of multimedia database application areas: Digital Libraries News-on-Demand Video-on-Demand Music database Geographic Information Systems (GIS) Telemedicine

Enterprise bookmarking

Enterprise bookmarking is a method for Web 2.0 users to tag, organize, store, and search bookmarks of both web pages on the Internet and data resources stored in a distributed database or fileserver. This is done collectively and collaboratively in a process by which users add tag (metadata) and knowledge tags. In early versions of the software, these tags are applied as non-hierarchical keywords, or terms assigned by a user to a web page, and are collected in tag clouds. Examples of this software are Connectbeam and Dogear. New versions of the software such as Jumper 2.0 and Knowledge Plaza expand tag metadata in the form of knowledge tags that provide additional information about the data and are applied to structured and semi-structured data and are collected in tag profiles. == History == Enterprise bookmarking is derived from Social bookmarking that got its modern start with the launch of the website del.icio.us in 2003. The first major announcement of an enterprise bookmarking platform was the IBM Dogear project, developed in Summer 2006. Version 1.0 of the Dogear software was announced at Lotusphere 2007, and shipped later that year on June 27 as part of IBM Lotus Connections. The second significant commercial release was Cogenz in September 2007. Since these early releases, Enterprise bookmarking platforms have diverged considerably. The most significant new release was the Jumper 2.0 platform, with expanded and customizable knowledge tagging fields. == Differences == === Versus social bookmarking === In a social bookmarking system, individuals create personal collections of bookmarks and share their bookmarks with others. These centrally stored collections of Internet resources can be accessed by other users to find useful resources. Often these lists are publicly accessible, so that other people with similar interests can view the links by category or by the tags themselves. Most social bookmarking sites allow users to search for bookmarks which are associated with given "tags", and rank the resources by the number of users which have bookmarked them. Enterprise bookmarking is a method of tagging and linking any information using an expanded set of tags to capture knowledge about data. It collects and indexes these tags in a web-infrastructure knowledge base server residing behind the firewall. Users can share knowledge tags with specified people or groups, shared only inside specific networks, typically within an organization. Enterprise bookmarking is a knowledge management discipline that embraces Enterprise 2.0 methodologies to capture specific knowledge and information that organizations consider proprietary and are not shared on the public Internet. === Tag management === Enterprise bookmarking tools also differ from social bookmarking tools in the way that they often face an existing taxonomy. Some of these tools have evolved to provide Tag management which is the combination of uphill abilities (e.g. faceted classification, predefined tags, etc.) and downhill gardening abilities (e.g. tag renaming, moving, merging) to better manage the bottom-up folksonomy generated from user tagging.