AI Data Warehouse

AI Data Warehouse — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cloud-based integration

    Cloud-based integration

    Cloud-based integration is a form of systems integration business delivered as a cloud computing service that addresses data, process, service-oriented architecture (SOA) and application integration. == Description == Integration platform as a service (iPaaS) is a suite of cloud services enabling customers to develop, execute and govern integration flows between disparate applications. Under the cloud-based iPaaS integration model, customers drive the development and deployment of integrations without installing or managing any hardware or middleware. The iPaaS model allows businesses to achieve integration without big investment into skills or licensed middleware software. iPaaS used to be regarded primarily as an integration tool for cloud-based software applications, used mainly by small to mid-sized business. Over time, a hybrid type of iPaaS—hybrid-IT iPaaS—that connects cloud to on-premises, is becoming increasingly popular. Additionally, large enterprises are exploring new ways of integrating iPaaS into their existing IT infrastructures. Cloud integration was created to break down the data silos, improve connectivity and optimize the business process. Cloud integration has increased in popularity as the usage of Software as a Service solutions has grown. Prior to the emergence of cloud computing in the early 2000s, integration could be categorized as either internal or business to business (B2B). Internal integration requirements were serviced through an on-premises middleware platform and typically utilized a service bus to manage exchange of data between systems. B2B integration was serviced through EDI gateways or value-added network (VAN). The advent of SaaS applications created a new kind of demand which was met through cloud-based integration. Since their emergence, many such services have also developed the capability to integrate legacy or on-premises applications, as well as function as EDI gateways. The following essential features were proposed by one marketing company: Deployed on a multi-tenant, elastic cloud infrastructure Subscription model pricing (operating expense, not capital expenditure) No software development (required connectors should already be available) Users do not perform deployment or manage the platform itself Presence of integration management and monitoring features The emergence of this sector led to new cloud-based business process management tools that do not need to build integration layers - since those are now a separate service. Drivers of growth include the need to integrate mobile app capabilities with proliferating API publishing resources and the growth in demand for the Internet of things functionalities as more 'things' connect to the Internet.

    Read more →
  • Semantic query

    Semantic query

    Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results (possibly the distinctive selection of one single piece of information) or to answer more fuzzy and wide open questions through pattern matching and digital reasoning. Semantic queries work on named graphs, linked data or triples. This enables the query to process the actual relationships between information and infer the answers from the network of data. This is in contrast to semantic search, which uses semantics (meaning of language constructs) in unstructured text to produce a better search result. (See natural language processing.) From a technical point of view, semantic queries are precise relational-type operations much like a database query. They work on structured data and therefore have the possibility to utilize comprehensive features like operators (e.g. >, < and =), namespaces, pattern matching, subclassing, transitive relations, semantic rules and contextual full text search. The semantic web technology stack of the W3C is offering SPARQL to formulate semantic queries in a syntax similar to SQL. Semantic queries are used in triplestores, graph databases, semantic wikis, natural language and artificial intelligence systems. == Background == Relational databases represent all relationships between data in an implicit manner only. For example, the relationships between customers and products (stored in two content-tables and connected with an additional link-table) only come into existence in a query statement (SQL in the case of relational databases) written by a developer. Writing the query demands exact knowledge of the database schema. Linked-Data represent all relationships between data in an explicit manner. In the above example, no query code needs to be written. The correct product for each customer can be fetched automatically. Whereas this simple example is trivial, the real power of linked-data comes into play when a network of information is created (customers with their geo-spatial information like city, state and country; products with their categories within sub- and super-categories). Now the system can automatically answer more complex queries and analytics that look for the connection of a particular location with a product category. The development effort for this query is omitted. Executing a semantic query is conducted by walking the network of information and finding matches (also called Data Graph Traversal). Another important aspect of semantic queries is that the type of the relationship can be used to incorporate intelligence into the system. The relationship between a customer and a product has a fundamentally different nature than the relationship between a neighbourhood and its city. The latter enables the semantic query engine to infer that a customer living in Manhattan is also living in New York City whereas other relationships might have more complicated patterns and "contextual analytics". This process is called inference or reasoning and is the ability of the software to derive new information based on given facts. == Articles == Velez, Golda (2008). "Semantics Help Wall Street Cope With Data Overload". Wall Street & Technology. wallstreetandtech.com. Zhifeng, Xiao (2009). "Spatial information semantic query based on SPARQL". In Liu, Yaolin; Tang, Xinming (eds.). International Symposium on Spatial Analysis, Spatial-Temporal Data Modeling, and Data Mining. Vol. 7492. SPIE. pp. 74921P. Bibcode:2009SPIE.7492E..60X. doi:10.1117/12.838556. S2CID 62191842. Aquin, Mathieu (2010). "Watson, more than a Semantic Web search engine" (PDF). Semantic Web Journal. Dworetzky, Tom (2011). "How Siri Works: iPhone's 'Brain' Comes from Natural Language Processing". International Business Times. Horwitt, Elisabeth (2011). "The semantic Web gets down to business". computerworld.com. Rodriguez, Marko (2011). "Graph Pattern Matching with Gremlin". Marko A. Rodriguez. markorodriguez.com on Graph Computing. Sequeda, Juan (2011). "SPARQL Nuts & Bolts". Cambridge Semantics. Freitas, Andre (2012). "Querying Heterogeneous Datasets on the Linked Data Web" (PDF). IEEE Internet Computing. Kauppinen, Tomi (2012). "Using the SPARQL Package in R to handle Spatial Linked Data". linkedscience.org. Lorentz, Alissa (2013). "With Big Data, Context is a Big Issue". Wired.

    Read more →
  • Conceptions of Library and Information Science

    Conceptions of Library and Information Science

    Conceptions of Library and Information Science (CoLIS) is a series of conferences about historical, empirical and theoretical perspectives in Library and Information Science. == CoLIS conferences == CoLIS 1 1991 in Tampere, Finland CoLIS 2 1996 in Copenhagen, Denmark CoLIS 3 1999 in Dubrovnik, Croatia CoLIS 4 2002 in Seattle, US CoLIS 5 2005 in Glasgow, Scotland CoLIS 6 2007 in Borås, Sweden CoLIS 7 June 2010 in London, at City University London. CoLIS 8 August 19–22, 2013, in Copenhagen, Denmark, at The Royal School of Library and Information Science. CoLIS 9 June 27–29, 2016, in Uppsala, Sweden, at Uppsala University. CoLIS 10 June 16–19, 2019, in Ljubljana, Slovenia, Faculty of Arts CoLIS 11 May 29–June 1, 2022, in Oslo, Norway, Oslo Metropolitan University.

    Read more →
  • DONE

    DONE

    The Data-based Online Nonlinear Extremumseeker (DONE) algorithm is a black-box optimization algorithm. DONE models the unknown cost function and attempts to find an optimum of the underlying function. The DONE algorithm is suitable for optimizing costly and noisy functions and does not require derivatives. An advantage of DONE over similar algorithms, such as Bayesian optimization, is that the computational cost per iteration is independent of the number of function evaluations. == Methods == The DONE algorithm was first proposed by Hans Verstraete and Sander Wahls in 2015. The algorithm fits a surrogate model based on random Fourier features and then uses a well-known L-BFGS algorithm to find an optimum of the surrogate model. == Applications == DONE was first demonstrated for maximizing the signal in optical coherence tomography measurements, but has since then been applied to various other applications. For example, it was used to help extending the field of view in light sheet fluorescence microscopy.

    Read more →
  • Mistral Vibe

    Mistral Vibe

    Mistral Vibe or Vibe (Le Chat until May 2026), is a chatbot that uses generative artificial intelligence developed in France by Mistral AI. Mistral Vibe is available in iOS and Android. Its services are operated on a freemium model. == History == In February 2024, Mistral AI released Le Chat. In January 2025, Mistral AI made a content deal with Agence France-Presse (AFP) that lets Le Chat query AFP's entire archive dating back to 1983. On 6 February 2025, a mobile app for Le Chat was released for iOS and Android, and a subscription tier, Pro, was introduced at a cost of $14.99 per month. In July 2025, Mistral AI released Voxtral, an open-source language model that understands and generates audio. Mistral introduced a voice mode for chatting that uses Voxtral, and projects, which allows grouping chats and files. In September 2025, Le Chat introduced the capability to remember previous conversations. In May 2026, Mistral AI announced the rebrand from Le Chat to Mistral Vibe and new features were introduced at the same time.

    Read more →
  • Penril

    Penril

    Penril DataComm Networks, Inc. was a computer telecommunications hardware company that made some acquisitions and was eventually split into two parts: one was acquired by Bay Networks and the other was a newly formed company named Access Beyond. The focus of both company's products was end-to-end data transfer. By the mid-1990s, with the popularization of the internet, this was no longer of wide interest. == History == Penril, whose earnings reports and other financials were followed by The New York Times in the 1990s, made several acquisitions but also grew internally. Following its Datability acquisition it renamed itself Penril Datability Networks. By the time the 1968-founded Penril was acquired by Bay their name was Penril DataComm Networks. The company, which as of 1985 "had made 14 acquisitions in 12 years," also had done extensive work regarding quality control, and leveraged their product line by what The Washington Post called clever packaging: "software, cables, instructions and telephone support" sold to those less technically skilled as "Network in a Box." == Datability == Datability Software Systems Inc. was the initial name of what by 1991 became 'Datability, Inc.', "a manufacturer of hardware that links computer networks." The 1977-founded firm began as a software consulting company, especially in the area of databases. To speed up project development they built a program generator, which they marketed as Control 10/20 (targeted at users of Digital Equipment Corporation's DECsystem-10 and DECSYSTEM-20). After trying their hand at time-sharing they built hardware to enhance bridging these computers to DEC's VAX product line. In particular they focused on Digital's LAT protocol, selling "boxes" that reimplemented the protocol, at a lower price than DEC's. They later expanded into other areas of telecommunications hardware The firm relocated to a larger manufacturing plant in 1991 and was acquired by Penril in 1993. == Access Beyond == Access Beyond was initially housed by Penril, from which it was spun off. A securities analyst noted that Access began operations with no debt. They subsequently merged with Hayes Corporation. Some of the funds brought to the merger came from a sale by Penril of two of its divisions, each bringing about $4 million. == Ron Howard == Ron Howard, founder of Datability, became part of Penril when the latter acquired the former, and was CEO of Access Beyond when it was spun off by Penril. Access merged with Hayes Microcomputer Products and was renamed Hayes Corp, at which time Howard became executive VP of business development and corporate vice chairman of Hayes. == People == In the matter of hiring immigrants, in an industry where recent arrivals came from a culture of six day work weeks, and subcontracting was then common, these assembly line workers at Penril comprised about 25%, compared to double in other firms. Placement was overseen by government agencies. == Controversy == Penril had a joint development agreement, beginning in 1990, with a Standard Microsystems Corporation (SMSC) subsidiary. A dispute arose, and the matter was brought to court. Penril was awarded $3.5 million in 1996.

    Read more →
  • Subject (documents)

    Subject (documents)

    In library and information science documents (such as books, articles and pictures) are classified and searched by subject – as well as by other attributes such as author, genre and document type. This makes "subject" a fundamental term in this field. Library and information specialists assign subject labels to documents to make them findable. There are many ways to do this and in general there is not always consensus about which subject should be assigned to a given document. To optimize subject indexing and searching, we need to have a deeper understanding of what a subject is. The question: "what is to be understood by the statement 'document A belongs to subject category X'?" has been debated in the field for more than 100 years (see below) == Theoretical view == === Charles Ammi Cutter (1837–1903) === For Cutter the stability of subjects depends on a social process in which their meaning is stabilized in a name or a designation. A subject "referred [...] to those intellections [...] that had received a name that itself represented a distinct consensus in usage" (Miksa, 1983a, p. 60) and: the "systematic structure of established subjects" is "resident in the public realm" (Miksa, 1983a, p. 69); "[s]ubjects are by their very nature locations in a classificatory structure of publicly accumulated knowledge (Miksa, 1983a, p. 61). Bernd Frohmann adds: "The stability of the public realm in turn relies upon natural and objective mental structures which, with proper education, govern a natural progression from particular to general concepts. Since for Cutter, mind, society, and SKO [Systems of Knowledge Organization] stand one behind the other, each supporting each, all manifesting the same structure, his discursive construction of subjects invites connections with discourses of mind, education, and society. The Dewey Decimal Classification (DDC), by contrast, severs those connections. Melvil Dewey emphasized more than once that his system maps no structure beyond its own; there is neither a "transcendental deduction" of its categories nor any reference to Cutter's objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself. In it, a subject is wholly constituted in terms of its position in the system. The essential characteristic of a subject is a class symbol which refers only to other symbols. Its verbal equivalent is accidental, a merely pragmatic characteristic... .... The conflict of interpretations over "subjects" became explicit in the battles between "bibliography" (an approach to subjects having much in common with Cutter's) and Dewey's "close classification". William Fletcher spoke for the scholarly bibliographer.... Fletcher's "subjects", like Cutter's, referred to the categories of a fantasized, stable social order, whereas Dewey's subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation". (Frohmann, 1994, 112–113). Cutter's early view on what a subject is, is probably wiser than most understandings that dominated the 20th century – and also the understanding reflected in the ISO-standard quoted below. The early statements quoted by Frohmann indicate that subjects are somehow shaped in social processes. When that is said, it should be added that they are not particularly detailed or clear. We only get a vague idea of the social nature of subjects. === S. R. Ranganathan (1892–1972) === A classification system with an explicit theoretical foundation is Ranganathan's Colon Classification. Ranganathan provided an explicit definition of the concept of "subject": Subject – an organized body of ideas, whose extension and intension are likely to fall coherently within the field of interests and comfortably within the intellectual competence and the field of inevitable specialization of a normal person. A related definition is given by one of Ranganathan's students: A subject is an organized and systematized body of ideas. It may consist of one idea or a combination of several... Ranganathan's definition of "subject" is strongly influenced by his Colon Classification system. The colon system is based on the combination of single elements from facets to subject designation. This is the reason why the combined nature of subjects are emphasized so strongly. It leads, however, to absurdities such as the claim that gold cannot be a subject (but is alternatively termed "an isolate"). This aspect of the theory has been criticized by Metcalfe (1973, p. 318). Metcalfe's skepticism regarding Ranganathan's theory is formulated in hard words (op. cit., p. 317): "This pseudo-science imposed itself on British disciples from about 1950 on...". It seems unacceptable that Ranganathan defines the word subject in a way that favors his own system. A scientific concept like "subject" should make it possible to compare different ways of establishing access to information. Whether or not subjects are combined or not should be examined once their definition has been given, it should not determined a priori, in the definition. Besides the emphasis on the combined, organizing and systematizing nature of subjects contains Ranganathan's definition of subject the pragmatic demand, that a subject should be determined in a way that suits a normal person's competency or specialization. Again we see a strange kind of wishful thinking mixing a general understanding of a concept with demands put by his own specific system. One thing is what the word subject means, quite another issue is how to provide subject descriptions that fulfill demands such as the specificity of a given information retrieval language which fulfill demands put on the system, such as precision and recall. If researchers too often define terms in ways that favor specific kinds of systems, that are such definitions not useful to provide more general theories about subjects, subject analysis and IR. Among other things are comparative studies of different kinds of systems made difficult. Based on these arguments, as well as additional arguments which have been used in the literature, we may conclude that Ranganathan's definition of the concept "subject" is not suited for scientific use. Like the definition of "subject" given by the ISO-standard for topic maps, may Ranganathan's definition be useful within his own closed system. The purpose of a scientific and scholarly field is, however, to examine the relative fruitfulness of systems such as topic maps and Colon Classification. For such purpose is another understanding of "subject" necessary. === Patrick Wilson (1927–2003) === In his book Wilson (1968) examined – in particular by thought experiments – the suitability of different methods of examining the subject of a document. The methods were: identifying the author's purpose for writing the document, weighing the relative dominance and subordination of different elements in the picture, which the reading imposes on the reader, grouping or count the document's use of concepts and references, construing a set of rules for selecting elements deemed necessary (as opposed to unnecessary) for the work as a whole. Patrick Wilson shows convincingly that each of these methods are insufficient to determine the subject of a document and is led to conclude ( p. 89): "The notion of the subject of a writing is indeterminate..." or, on p. 92 (about what users may expect to find using a particular position in a library classification system): "For nothing definite can be expected of the things found at any given position". In connection to the last quote has Wilson an interesting footnote in which he writes that authors of documents often use terms in ambiguous ways ("hostility" is used as an example). Even if the librarian could personally develop a very precise understanding of a concept, he would be unable to use it in his classification, because none of the documents use the term in the same precise way. Based on this argumentation is Wilson led to conclude: "If people write on what are for them ill-defined phenomena, a correct description of their subjects must reflect the ill-definedness". Wilson's concept of subject was discussed by Hjørland (1992) who found that it is problematic to give up the precise understanding of such a basic term in LIS. Wilson's arguments led him to an agnostic position which Hjørland found unacceptable and unnecessary. Concerning the authors' use of ambiguous terms, the role of the subject analysis is to determine which documents would be fruitful for users to identify whether or not the documents use one or another term or whether a given term i

    Read more →
  • Systematic review

    Systematic review

    A systematic review is a scholarly synthesis of the evidence on a clearly presented topic using critical methods to identify, define and assess research on the topic. A systematic review extracts and interprets data from published studies on the topic (in the scientific literature), then analyzes, describes, critically appraises and summarizes interpretations into a refined evidence-based conclusion. For example, a systematic review of randomized controlled trials is a way of summarizing and implementing evidence-based medicine. Systematic reviews, sometimes along with meta-analyses, are generally considered the highest level of evidence in medical research. While a systematic review may be applied in the biomedical or health care context, it may also be used where an assessment of a precisely defined subject can advance understanding in a field of research. A systematic review may examine clinical tests, public health interventions, environmental interventions, social interventions, adverse effects, qualitative evidence syntheses, methodological reviews, policy reviews, and economic evaluations. Systematic reviews are closely related to meta-analyses, and often the same instance will combine both (being published with a subtitle of "a systematic review and meta-analysis"). The distinction between the two is that a meta-analysis uses statistical methods to induce a single number from the pooled data set (such as an effect size), whereas the strict definition of a systematic review excludes that step. However, in practice, when one is mentioned, the other may often be involved, as it takes a systematic review to assemble the information that a meta-analysis analyzes, and people sometimes refer to an instance as a systematic review, even if it includes the meta-analytical component. An understanding of systematic reviews and how to implement them in practice is common for professionals in health care, public health, and public policy. Systematic reviews contrast with a type of review often called a narrative review. Systematic reviews and narrative reviews both review the literature (the scientific literature), but the term literature review without further specification refers to a narrative review. == Characteristics == A systematic review can be designed to provide a thorough summary of current literature relevant to a research question. A systematic review uses a rigorous and transparent approach for research synthesis, with the aim of assessing and, where possible, minimizing bias in the findings. While many systematic reviews are based on an explicit quantitative meta-analysis of available data, there are also qualitative reviews and other types of mixed-methods reviews that adhere to standards for gathering, analyzing, and reporting evidence. Systematic reviews of quantitative data or mixed-method reviews sometimes use statistical techniques (meta-analysis) to combine results of eligible studies. Scoring levels are sometimes used to rate the quality of the evidence depending on the methodology used, although this is discouraged by the Cochrane Library. As evidence rating can be subjective, multiple people may be consulted to resolve any scoring differences between how evidence is rated. The EPPI-Centre, Cochrane, and the Joanna Briggs Institute have been influential in developing methods for combining both qualitative and quantitative research in systematic reviews. Several reporting guidelines exist to standardise reporting about how systematic reviews are conducted. Such reporting guidelines are not quality assessment or appraisal tools. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement suggests a standardized way to ensure a transparent and complete reporting of systematic reviews, and is now required for this kind of research by more than 170 medical journals worldwide. The latest version of this commonly used statement corresponds to PRISMA 2020 (the respective article was published in 2021). Several specialized PRISMA guideline extensions have been developed to support particular types of studies or aspects of the review process, including PRISMA-P for review protocols and PRISMA-ScR for scoping reviews. A list of PRISMA guideline extensions is hosted by the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network. However, the PRISMA guidelines have been found to be limited to intervention research and the guidelines have to be changed in order to fit non-intervention research. As a result, Non-Interventional, Reproducible, and Open (NIRO) Systematic Reviews was created to counter this limitation. For qualitative reviews, reporting guidelines include ENTREQ (Enhancing transparency in reporting the synthesis of qualitative research) for qualitative evidence syntheses; RAMESES (Realist And MEta-narrative Evidence Syntheses: Evolving Standards) for meta-narrative and realist reviews; and eMERGe (Improving reporting of Meta-Ethnography) for meta-ethnograph. Developments in systematic reviews during the 21st century included realist reviews and the meta-narrative approach, both of which addressed problems of variation in methods and heterogeneity existing on some subjects. == Types == There are over 30 types of systematic review and Table 1 below non-exhaustingly summarises some of these. There is not always consensus on the boundaries and distinctions between the approaches described below. === Scoping reviews === Scoping reviews are distinct from systematic reviews in several ways. A scoping review is an attempt to search for concepts by mapping the language and data which surrounds those concepts and adjusting the search method iteratively to synthesize evidence and assess the scope of an area of inquiry. This can mean that the concept search and method (including data extraction, organisation and analysis) are refined throughout the process, sometimes requiring deviations from any protocol or original research plan. A scoping review may often be a preliminary stage before a systematic review, which 'scopes' out an area of inquiry and maps the language and key concepts to determine if a systematic review is possible or appropriate, or to lay the groundwork for a full systematic review. The goal can be to assess how much data or evidence is available regarding a certain area of interest. This process is further complicated if it is mapping concepts across multiple languages or cultures. As a scoping review should be systematically conducted and reported (with a transparent and repeatable method), some academic publishers categorize them as a kind of 'systematic review', which may cause confusion. Scoping reviews are helpful when it is not possible to carry out a systematic synthesis of research findings, for example, when there are no published clinical trials in the area of inquiry. Scoping reviews are helpful when determining if it is possible or appropriate to carry out a systematic review, and are a useful method when an area of inquiry is very broad, for example, exploring how the public are involved in all stages systematic reviews. There is still a lack of clarity when defining the exact method of a scoping review as it is both an iterative process and is still relatively new. There have been several attempts to improve the standardisation of the method, for example via a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline extension for scoping reviews (PRISMA-ScR). PROSPERO (the International Prospective Register of Systematic Reviews) does not permit the submission of protocols of scoping reviews, although some journals will publish protocols for scoping reviews. == Stages == While there are multiple kinds of systematic review methods, the main stages of a review can be summarised as follows: === Defining the research question === Some reported that the 'best practices' involve 'defining an answerable question' and publishing the protocol of the review before initiating it to reduce the risk of unplanned research duplication and to enable transparency and consistency between methodology and protocol. Clinical reviews of quantitative data are often structured using the mnemonic PICO, which stands for 'Population or Problem', 'Intervention or Exposure', 'Comparison', and 'Outcome', with other variations existing for other kinds of research. For qualitative reviews, PICo is 'Population or Problem', 'Interest', and 'Context'. === Searching for sources === Relevant criteria can include selecting research that is of good quality and answers the defined question. The search strategy should be designed to retrieve literature that matches the protocol's specified inclusion and exclusion criteria. The methodology section of a systematic review should list all of the databases and citation indices that were searched. The titles and abstracts of identified articles can be checked against predetermined criteria for eligibility and r

    Read more →
  • Charge-coupled device

    Charge-coupled device

    A charge-coupled device (CCD) is an integrated circuit containing an array of linked, or coupled, capacitors. Under the control of an external circuit, each capacitor can transfer its electric charge to a neighboring capacitor. CCD sensors are a major technology used in digital imaging. In a CCD image sensor, pixels are represented by p-doped metal–oxide–semiconductor (MOS) capacitors. These MOS capacitors, the basic building blocks of a CCD, are biased above the threshold for inversion when image acquisition begins, allowing the conversion of incoming photons into electron charges at the semiconductor-oxide interface; the CCD is then used to read out these charges. Although CCDs are not the only technology to allow for light detection, CCD image sensors are widely used in professional, medical, and scientific applications where high-quality image data are required. In applications with less exacting quality demands, such as consumer and professional digital cameras, active pixel sensors, also known as CMOS sensors (complementary MOS sensors), are generally used. However, the large quality advantage CCDs enjoyed early on has narrowed over time and since the late 2010s CMOS sensors are the dominant technology, having largely if not completely replaced CCD image sensors. == History == The basis for the CCD is the metal–oxide–semiconductor (MOS) structure, with MOS capacitors being the basic building blocks of a CCD, and a depleted MOS structure used as the photodetector in early CCD devices. In the late 1960s, Willard Boyle and George E. Smith at Bell Labs were researching MOS technology while working on semiconductor bubble memory. They realized that an electric charge was the analog of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. This led to the invention of the charge-coupled device by Boyle and Smith in 1969. They conceived of the design of what they termed, in their notebook, "Charge 'Bubble' Devices". The initial paper describing the concept in April 1970 listed possible uses as memory, a delay line, and an imaging device. The device could also be used as a shift register. The essence of the design was the ability to transfer charge along the surface of a semiconductor from one storage capacitor to the next. The first experimental device demonstrating the principle was a row of closely spaced metal squares on an oxidized silicon surface electrically accessed by wire bonds. It was demonstrated by Gil Amelio, Michael Francis Tompsett and George Smith in April 1970. This was the first experimental application of the CCD in image sensor technology, and used a depleted MOS structure as the photodetector. The first patent (U.S. patent 4,085,456) on the application of CCDs to imaging was assigned to Tompsett, who filed the application in 1971. The first working CCD made with integrated circuit technology was a simple 8-bit shift register, reported by Tompsett, Amelio and Smith in August 1970. This device had input and output circuits and was used to demonstrate its use as a shift register and as a crude eight pixel linear imaging device. Development of the device progressed at a rapid rate. By 1971, Bell researchers led by Michael Tompsett were able to capture images with simple linear devices. Several companies, including Fairchild Semiconductor, RCA and Texas Instruments, picked up on the invention and began development programs. Fairchild's effort, led by ex-Bell researcher Gil Amelio, was the first with commercial devices, and by 1974 had a linear 500-element device and a 2D 100 × 100 pixel device. Peter L. P. Dillon, a scientist at Kodak Research Labs, invented the first color CCD image sensor by overlaying a color filter array on this Fairchild 100 x 100 pixel Interline CCD starting in 1974. Steven Sasson, an electrical engineer working for the Kodak Apparatus Division, invented a digital still camera using this same Fairchild 100 × 100 CCD in 1975. The interline transfer (ILT) CCD device was proposed by L. Walsh and R. Dyck at Fairchild in 1973 to reduce smear and eliminate a mechanical shutter. To further reduce smear from bright light sources, the frame-interline-transfer (FIT) CCD architecture was developed by K. Horii, T. Kuroda and T. Kunii at Matsushita (now Panasonic) in 1981. The first KH-11 KENNEN reconnaissance satellite equipped with charge-coupled device array (800 × 800 pixels) technology for imaging was launched in December 1976. Under the leadership of Kazuo Iwama, Sony started a large development effort on CCDs involving a significant investment. Eventually, Sony managed to mass-produce CCDs for their camcorders. Before this happened, Iwama died in August 1982. Subsequently, a CCD chip was placed on his tombstone to acknowledge his contribution. The first mass-produced consumer CCD video camera, the CCD-G5, was released by Sony in 1983, based on a prototype developed by Yoshiaki Hagiwara in 1981. Early CCD sensors suffered from shutter lag. This was largely resolved with the invention of the pinned photodiode (PPD). It was invented by Nobukazu Teranishi, Hiromitsu Shiraki and Yasuo Ishihara at NEC in 1980. They recognized that lag can be eliminated if the signal carriers could be transferred from the photodiode to the CCD. This led to their invention of the pinned photodiode, a photodetector structure with low lag, low noise, high quantum efficiency and low dark current. It was first publicly reported by Teranishi and Ishihara with A. Kohono, E. Oda and K. Arai in 1982, with the addition of an anti-blooming structure. The new photodetector structure invented at NEC was given the name "pinned photodiode" (PPD) by B.C. Burkey at Kodak in 1984. In 1987, the PPD began to be incorporated into most CCD devices, becoming a fixture in consumer electronic video cameras and then digital still cameras. Since then, the PPD has been used in nearly all CCD sensors and then CMOS sensors. In January 2006, Boyle and Smith were awarded the National Academy of Engineering Charles Stark Draper Prize, and in 2009 they were awarded the Nobel Prize for Physics for their invention of the CCD concept. Michael Tompsett was awarded the 2010 National Medal of Technology and Innovation, for pioneering work and electronic technologies including the design and development of the first CCD imagers. He was also awarded the 2012 IEEE Edison Medal for "pioneering contributions to imaging devices including CCD Imagers, cameras and thermal imagers". == Basics of operation == In a CCD for capturing images, there is a photoactive region (an epitaxial layer of silicon), and a transmission region made out of a shift register (the CCD, properly speaking). An image is projected through a lens onto the capacitor array (the photoactive region), causing each capacitor to accumulate an electric charge proportional to the light intensity at that location. A one-dimensional array, used in line-scan cameras, captures a single slice of the image, whereas a two-dimensional array, used in video and still cameras, captures a two-dimensional picture corresponding to the scene projected onto the focal plane of the sensor. Once the array has been exposed to the image, a control circuit causes each capacitor to transfer its contents to its neighbor (operating as a shift register). The last capacitor in the array dumps its charge into a charge amplifier, which converts the charge into a voltage. By repeating this process, the controlling circuit converts the entire contents of the array in the semiconductor to a sequence of voltages. In a digital device, these voltages are then sampled, digitized, and usually stored in memory; in an analog device (such as an analog video camera), they are processed into a continuous analog signal (e.g. by feeding the output of the charge amplifier into a low-pass filter), which is then processed and fed out to other circuits for transmission, recording, or other processing. == Detailed physics of operation == === Charge generation === Before the MOS capacitors are exposed to light, they are biased into the depletion region; in n-channel CCDs, the silicon under the bias gate is slightly p-doped or intrinsic. The gate is then biased at a positive potential, above the threshold for strong inversion, which will eventually result in the creation of an n channel below the gate as in a MOSFET. However, it takes time to reach this thermal equilibrium: up to hours in high-end scientific cameras cooled at low temperature. Initially after biasing, the holes are pushed far into the substrate, and no mobile electrons are at or near the surface; the CCD thus operates in a non-equilibrium state called deep depletion. Then, when electron–hole pairs are generated in the depletion region, they are separated by the electric field, the elec

    Read more →
  • Point-in-time recovery

    Point-in-time recovery

    Point-in-time recovery (PITR) in the context of computers involves systems, often databases, whereby an administrator can restore or recover a set of data or a particular setting from a time in the past. Note for example Windows's capability to restore operating-system settings from a past date (for instance, before data corruption occurred). Time Machine for macOS provides another example of point-in-time recovery. Once PITR logging starts for a PITR-capable database, a database administrator can restore that database from backups to the state that it had at any time since.

    Read more →
  • Information seeking

    Information seeking

    Information seeking is the process or activity of attempting to obtain information in both human and technological contexts. Information seeking is related to, but different from, information retrieval (IR). == Compared to information retrieval == Traditionally, IR tools have been designed for IR professionals to enable them to effectively and efficiently retrieve information from a source. It is assumed that the information exists in the source and that a well-formed query will retrieve it (and nothing else). It has been argued that laypersons' information seeking on the internet is very different from information retrieval as performed within the IR discourse. Yet, internet search engines are built on IR principles. Since the late 1990s a body of research on how casual users interact with internet search engines has been forming, but the topic is far from fully understood. IR can be said to be technology-oriented, focusing on algorithms and issues such as precision and recall. Information seeking may be understood as a more human-oriented and open-ended process than information retrieval. In information seeking, one does not know whether there exists an answer to one's query, so the process of seeking may provide the learning required to satisfy one's information need. == In different contexts == Much library and information science (LIS) research has focused on the information-seeking practices of practitioners within various fields of professional work. Studies have been carried out into the information-seeking behaviors of librarians, academics, medical professionals, engineers, lawyers and mini-publics(among others). Much of this research has drawn on the work done by Leckie, Pettigrew (now Fisher) and Sylvain, who in 1996 conducted an extensive review of the LIS literature (as well as the literature of other academic fields) on professionals' information seeking. The authors proposed an analytic model of professionals' information seeking behaviour, intended to be generalizable across the professions, thus providing a platform for future research in the area. The model was intended to "prompt new insights... and give rise to more refined and applicable theories of information seeking" (1996, p. 188). The model has been adapted by Wilkinson (2001) who proposes a model of the information seeking of lawyers. Recent studies in this topic address the concept of information-gathering that "provides a broader perspective that adheres better to professionals' work-related reality and desired skills." (Solomon & Bronstein, 2021). == Theories of information-seeking behavior == A variety of theories of information behavior – e.g. Zipf's Principle of Least Effort, Brenda Dervin's Sense Making, Elfreda Chatman's Life in the Round – seek to understand the processes that surround information seeking. In addition, many theories from other disciplines have been applied in investigating an aspect or whole process of information seeking behavior. A review of the literature on information seeking behavior shows that information seeking has generally been accepted as dynamic and non-linear (Foster, 2005; Kuhlthau 2006). People experience the information search process as an interplay of thoughts, feelings and actions (Kuhlthau, 2006). Donald O. Case (2007) also wrote a good book that is a review of the literature. Information seeking has been found to be linked to a variety of interpersonal communication behaviors beyond question-asking, to include strategies such as candidate answers. Robinson's (2010) research suggests that when seeking information at work, people rely on both other people and information repositories (e.g., documents and databases), and spend similar amounts of time consulting each (7.8% and 6.4% of work time, respectively; 14.2% in total). However, the distribution of time among the constituent information seeking stages differs depending on the source. When consulting other people, people spend less time locating the information source and information within that source, similar time understanding the information, and more time problem solving and decision making, than when consulting information repositories. Furthermore, the research found that people spend substantially more time receiving information passively (i.e., information that they have not requested) than actively (i.e., information that they have requested), and this pattern is also reflected when they provide others with information. == Wilson's nested model of conceptual areas == The concepts of information seeking, information retrieval, and information behaviour are objects of investigation of information science. Within this scientific discipline a variety of studies has been undertaken analyzing the interaction of an individual with information sources in case of a specific information need, task, and context. The research models developed in these studies vary in their level of scope. Wilson (1999) therefore developed a nested model of conceptual areas, which visualizes the interrelation of the here mentioned central concepts. Wilson defines models of information behavior to be "statements, often in the form of diagrams, that attempt to describe an information-seeking activity, the causes and consequences of that activity, or the relationships among stages in information-seeking behaviour" (1999: 250).

    Read more →
  • Upper ontology

    Upper ontology

    In information science, an upper ontology (also known as a top-level ontology, upper model, or foundation ontology) is an ontology (in the sense used in information science) that consists of very general terms (such as "object", "property", "relation") that are common across all domains. An important function of an upper ontology is to support broad semantic interoperability among a large number of domain-specific ontologies by providing a common starting point for the formulation of definitions. Terms in the domain ontology are ranked under the terms in the upper ontology, e.g., the upper ontology classes are superclasses or supersets of all the classes in the domain ontologies. A number of upper ontologies have been proposed, each with its own proponents. Library classification systems predate upper ontology systems. Though library classifications organize and categorize knowledge using general concepts that are the same across all knowledge domains, neither system is a replacement for the other. == Development == Any standard foundational ontology is likely to be contested among different groups, each with its own idea of "what exists". One factor exacerbating the failure to arrive at a common approach has been the lack of open-source applications that would permit the testing of different ontologies in the same computational environment. The differences have thus been debated largely on theoretical grounds, or are merely the result of personal preferences. Foundational ontologies can however be compared on the basis of adoption for the purposes of supporting interoperability across domain ontologies. No particular upper ontology has yet gained widespread acceptance as a de facto standard. Different organizations have attempted to define standards for specific domains. The 'Process Specification Language' (PSL) created by the National Institute of Standards and Technology (NIST) is one example. Another important factor leading to the absence of wide adoption of any existing upper ontology is the complexity. Some upper ontologies—Cyc is often cited as an example in this regard—are very large, ranging up to thousands of elements (classes, relations), with complex interactions among them and with a complexity similar to that of a human natural language, and the learning process can be even longer than for a natural language because of the unfamiliar format and logical rules. The motivation to overcome this learning barrier is largely absent because of the paucity of publicly accessible examples of use. As a result, those building domain ontologies for local applications tend to create the simplest possible domain-specific ontology, not related to any upper ontology. Such domain ontologies may function adequately for the local purpose, but they are very time-consuming to relate accurately to other domain ontologies. To solve this problem, some genuinely top level ontologies have been developed, which are deliberately designed to have minimal overlap with any domain ontologies. Examples are Basic Formal Ontology and the DOLCE (see below). === Arguments for the infeasibility of an upper ontology === Historically, many attempts in many societies have been made to impose or define a single set of concepts as more primal, basic, foundational, authoritative, true or rational than all others. A common objection to such attempts points out that humans lack the sort of transcendent perspective — or God's eye view — that would be required to achieve this goal. Humans are bound by language or culture, and so lack the sort of objective perspective from which to observe the whole terrain of concepts and derive any one standard. Thomasson, under the headline "1.5 Skepticism about Category Systems", wrote: "category systems, at least as traditionally presented, seem to presuppose that there is a unique true answer to the question of what categories of entity there are – indeed the discovery of this answer is the goal of most such inquiries into ontological categories. [...] But actual category systems offered vary so much that even a short survey of past category systems like that above can undermine the belief that such a unique, true and complete system of categories may be found. Given such a diversity of answers to the question of what the ontological categories are, by what criteria could we possibly choose among them to determine which is uniquely correct?" Another objection is the problem of formulating definitions. Top level ontologies are designed to maximize support for interoperability across a large number of terms. Such ontologies must therefore consist of terms expressing very general concepts, but such concepts are so basic to our understanding that there is no way in which they can be defined, since the very process of definition implies that a less basic (and less well understood) concept is defined in terms of concepts that are more basic and so (ideally) more well understood. Very general concepts can often only be elucidated, for example by means of examples, or paraphrase. There is no self-evident way of dividing the world up into concepts, and certainly no non-controversial one There is no neutral ground that can serve as a means of translating between specialized (or "lower" or "application-specific") ontologies Human language itself is already an arbitrary approximation of just one among many possible conceptual maps. To draw any necessary correlation between English words and any number of intellectual concepts, that we might like to represent in our ontologies, is just asking for trouble. (WordNet, for instance, is successful and useful, precisely because it does not pretend to be a general-purpose upper ontology; rather, it is a tool for semantic / syntactic / linguistic disambiguation, which is richly embedded in the particulars and peculiarities of the English language.) Any hierarchical or topological representation of concepts must begin from some ontological, epistemological, linguistic, cultural, and ultimately pragmatic perspective. Such pragmatism does not allow for the exclusion of politics between persons or groups, indeed it requires they be considered as perhaps more basic primitives than any that are represented. Those who doubt the feasibility of general purpose ontologies are more inclined to ask "what specific purpose do we have in mind for this conceptual map of entities and what practical difference will this ontology make?" This pragmatic philosophical position surrenders all hope of devising the encoded ontology version of "The world is everything that is the case." (Wittgenstein, Tractatus Logico-Philosophicus). Finally, there are objections similar to those against artificial intelligence. Technically, the complex concept acquisition and the social / linguistic interactions of human beings suggest any axiomatic foundation of "most basic" concepts must be cognitive biological or otherwise difficult to characterize since we don't have axioms for such systems. Ethically, any general-purpose ontology could quickly become an actual tyranny by recruiting adherents into a political program designed to propagate it and its funding means, and possibly defend it by violence. Historically, inconsistent and irrational belief systems have proven capable of commanding obedience to the detriment or harm of persons both inside and outside a society that accepts them. How much more harmful would a consistent rational one be, were it to contain even one or two basic assumptions incompatible with human life? === Arguments for the feasibility of an upper ontology === Many of those who doubt the possibility of developing wide agreement on a common upper ontology fall into one of two traps: they assert that there is no possibility of universal agreement on any conceptual scheme; but they argue that a practical common ontology does not need to have universal agreement, it only needs a large enough user community (as is the case for human languages) to make it profitable for developers to use it as a means to general interoperability, and for third-party developer to develop utilities to make it easier to use; and they point out that developers of data schemes find different representations congenial for their local purposes; but they do not demonstrate that these different representations are in fact logically inconsistent. In fact, different representations of assertions about the real world (though not philosophical models), if they accurately reflect the world, must be logically consistent, even if they focus on different aspects of the same physical object or phenomenon. If any two assertions about the real world are logically inconsistent, one or both must be wrong, and that is a topic for experimental investigation, not for ontological representation. In practice, representations of the real world are created as and known to be approximations to the basic reality, and their use is circumscribed by the limits of e

    Read more →
  • Ayoba

    Ayoba

    Ayoba is an African communication platform developed in South Africa. It is owned by Progressive Tech Holdings in Mauritius and managed by SIMFY Africa. Launched on May 4, 2019, as of April 2024, it has over 35 million active users. == History == Ayoba was first published on Google Play in February 2019. Its first marketing campaign and brand launch took place in Cameroon on May 4, 2019. In June 2019, the platform introduced its first eight channels. In November 2019, the platform reached one million active users, which increased to two million by June 2020. Subsequently, ayoba expanded its services, including the launch of games for Android in February 2020, Momo (Mobile Money) in Cameroon in May 2020, and MicroApps in May 2020. It also launched music and voice and video calling features in 12 territories in August 2020. The first version of ayoba for iOS was released in September 2020. In December of the same year, games and Messaging 2.0 were launched on the platform. In November 2020, it won Best Mobile Application at the African Digital Awards. In 2021, it won OTT Brand of the Year at the Marketing World Awards in Ghana. In December 2022, it received Top Innovative Technology and Telecom Product of the Year at the National Communications Awards in December 2022. In June 2023 ayoba partnered with BoomPlay and as of April 2024, it had 35 million monthly active users. Ayoba has partnered with Jumia Ghana to offer exclusive deals to users. Ayoba users can get a 10% discount on selected Jumia purchases through the app, with no data charges for MTN users. This partnership aims to make online shopping more affordable and accessible by integrating Jumia's offers into the ayoba app. Ayoba supports over 35 million users across Africa and provides services in 22 languages. To access the deals, users can download the ayoba app from the Google Play Store, iOS Store, or the official website. == Platform features == Chat, Call and Share: ayoba enables instant messaging, voice notes, picture sharing, and file sharing with contacts, even if they do not have the app installed. The app supports voice and video calls on both Android and iOS, as well as group chats, help channel and SMS continuity (non ayoba users receive messages as SMS, their responses appear in the ayoba app). Music: ayoba offers a free music player with daily updates on international and African music. Users can find playlists for different genres. Games: ayoba provides a selection of interactive games, including action, adventure, and children's games available on both Android and iOS. Mobile Money Transfers: In certain territories, ayoba supports mobile money transfers using MTN Mobile Money (MoMo) for transactions within the app. MicroApps: ayoba features individual MicroApps within the platform that offer content and services, including streaming channels, podcasts, and specialized apps. The availability of these apps may vary by country. == Operations == ayoba primarily focuses on the following territories: Nigeria, Cameroon, South Africa, Ghana, Côte d'Ivoire, Uganda, Republic of Congo, Benin, Zambia, Tanzania, Kenya, Senegal, Togo, Guinea Bissau, Guinea Conakry, Sudan, South Sudan, and Liberia. The company operates from its offices in Cape Town and Johannesburg, South Africa. David Gillaranz served as the CEO from 2019 to 2021, and Burak Akinci has been the CEO since 2021.

    Read more →
  • Artificial imagination

    Artificial imagination

    Artificial imagination is a narrow subcomponent of artificial general intelligence which generates, simulates, and facilitates real or possible fiction models to create predictions, inventions, or conscious experiences. The term artificial imagination is also used to describe a property of machines or programs. Some of the traits that researchers hope to simulate include creativity, vision, digital art, humor, and satire. Practitioners in the field are researching various aspects of Artificial imagination, such as Artificial (visual) imagination, Artificial (aural) Imagination, modeling/filtering content based on human emotions and Interactive Search. Some articles on the topic speculate on how artificial imagination may evolve to create an artificial world "people may be comfortable enough to escape from the real world". Some researchers such as G. Schleis and M. Rizki have focused on using artificial neural networks to simulate artificial imagination. Another important project is being led by Hiroharu Kato and Tatsuya Harada at the University of Tokyo in Japan. They have developed a computer capable of translating a description of an object into an image, which could be the easiest way to define what imagination is. Their idea is based on the concept of an image as a series of pixels divided into short sequences that correspond to a specific part of an image. The scientists call this sequences "visual words" and those can be interpreted by the machine using statistical distribution to read an create an image of an object the machine has not encountered. The topic of artificial imagination has garnered interest from scholars outside the computer science domain, such as noted communications scholar Ernest Bormann, who came up with the Symbolic Convergence Theory and worked on a project to develop artificial imagination in computer systems. An interdisciplinary research seminar organized by the artist Grégory Chatonsky on artificial imagination and postdigital art has taken place since 2017 at the Ecole Normale Supérieure in Paris. == Use in interactive search == The typical application of artificial imagination is for an interactive search. Interactive searching has been developed since the mid-1990s, accompanied by the World Wide Web's development and the optimization of search engines. Based on the first query and feedback from a user, the databases to be searched are reorganized to improve the searching results. Artificial imagination allows us to synthesize images and to develop a new image, whether it is in the database, regardless its existence in the real world. For example, the computer shows results that are based on the answer from the initial query. The user selects several relevant images, and then the technology analyzes these selections and reorganizes the images' ranks to fit the query. In this process, artificial imagination is used to synthesize the selected images and to improve the searching result with additional relevant synthesized images. This technique is based on several algorithms, including the Rocchio algorithm and the evolutionary algorithm. The Rocchio algorithm, locating a query point near relevant examples and far away from irrelevant examples, is simple and works well in a small system where the databases are arranged in certain ranks. The evolutionary synthesis is composed of two steps: a standard algorithm and an enhancement of the standard algorithm. Through feedback from the user, there would be additional images synthesized so as to be suited to what the user is looking for. == General artificial imagination == Artificial imagination has a more general definition and wide applications. The traditional fields of artificial imagination include visual imagination and aural imagination. More generally, all the actions to form ideas, images and concepts can be linked to imagination. Thus, artificial imagination means more than only generating graphs. For example, moral imagination is an important research subfield of artificial imagination, although classification of artificial imagination is difficult. Morals are an important part to human beings' logic, while artificial morals are important in artificial imagination and artificial intelligence. A common criticism of artificial intelligence is whether human beings should take responsibility for machines' mistakes or decisions and how to develop well-behaved machines. As nobody can give a clear description of the best moral rules, it is impossible to create machines with commonly accepted moral rules. However, recent research about artificial morals circumvent the definition of moral. Instead, machine learning methods are applied to train machines to imitate human morals. As the data about moral decisions from thousands of different people are considered, the trained moral model can reflect widely accepted rules. Memory is another major field of artificial imagination. Researchers such as Aude Oliva have performed extensive work on artificial memory, especially visual memory. Compared to visual imagination, the visual memory focuses more on how machine understand, analyse and store pictures in a human way. In addition, characters like spatial features are also considered. As this field is based on the brains' biological structures, extensive research on neuroscience has also been performed, which makes it a large intersection between biology and computer science.

    Read more →
  • Data management plan

    Data management plan

    A data management plan or DMP is a formal document that outlines how data are to be handled both during a research project, and after the project is completed. The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; this may lead to data being well-managed in the present, and prepared for preservation in the future. DMPs were originally used in 1966 to manage aeronautical and engineering projects' data collection and analysis, and expanded across engineering and scientific disciplines in the 1970s and 1980s. Up until the early 2000s, DMPs were used "for projects of great technical complexity, and for limited mid-study data collection and processing purposes". In the 2000s and later, E-research and economic policies drove the development and uptake of DMPs. == Importance == Preparing a data management plan before data are collected is claimed to ensure that data are in the correct format, organized well, and better annotated. This could arguably save time in the long term because there is no need to re-organize, re-format, or try to remember details about data. It is also claimed to increase research efficiency since both the data collector and other researchers might be able to understand and use well-annotated data in the future. One component of a data management plan is data archiving and preservation. By deciding on an archive ahead of time, the data collector can format data during collection to make its future submission to a database easier. If data are preserved, they are more relevant since they can be re-used by other researchers. It also allows the data collector to direct requests for data to the database, rather than address requests individually. A frequent argument in favor of preservation is that data that are preserved have the potential to lead to new, unanticipated discoveries, and they prevent duplication of scientific studies that have already been conducted. Data archiving also provides insurance against loss by the data collector. In the 2010s, funding agencies increasingly required data management plans as part of the proposal and evaluation process, despite little or no evidence of their efficacy. == Major components == "There is no general and definitive list of topics that should be covered in a DMP for a research project", and researchers are often left to their own devices as to how to fill out a DMP. === Information about data and data format === A description of data to be produced by the project. This might include (but is not limited to) data that are: Experimental Observational Raw or derived Physical collections Models Simulations Curriculum materials Software Images How will the data be acquired? When and where will they be acquired? After collection, how will the data be processed? Include information about Software used Algorithms Scientific workflows File formats that will be used, justify those formats, and describe the naming conventions used. Quality assurance & quality control measures that will be taken during sample collection, analysis, and processing. If existing data are used, what are their origins? How will the data collected be combined with existing data? What is the relationship between the data collected and existing data? How will the data be managed in the short-term? Consider the following: Version control for files Backing up data and data products Security & protection of data and data products Who will be responsible for management === Metadata content and format === Metadata are the contextual details, including any information important for using data. This may include descriptions of temporal and spatial details, instruments, parameters, units, files, etc. Metadata is commonly referred to as "data about data". Issues to be considered include: How detailed has the metadata to be in order to make the data meaningful? How will the metadata be created and/or captured? Examples include lab notebooks, GPS hand-held units, Auto-saved files on instruments, etc. What format will be used for the metadata? What are the metadata standards commonly used in the respective scientific discipline? There should be justification for the format chosen. === Policies for access, sharing, and re-use === Describe any obligations that exist for sharing data collected. These may include obligations from funding agencies, institutions, other professional organizations, and legal requirements. Include information about how data will be shared, including when the data will be accessible, how long the data will be available, how access can be gained, and any rights that the data collector reserves for using data. Address any ethical or privacy issues with data sharing Address intellectual property & copyright issues. Who owns the copyright? What are the institutional, publisher, and/or funding agency policies associated with intellectual property? Are there embargoes for political, commercial, or patent reasons? Describe the intended future uses/users for the data Indicate how the data should be cited by others. How will the issue of persistent citation be addressed? For example, if the data will be deposited in a public archive, will the dataset have a persistent identifier (e.g., ARK, DOI, Handle, PURL, URN) assigned to it? === Long-term storage and data management === Researchers should identify an appropriate archive for the long-term preservation of their data. By identifying the archive early in the project, the data can be formatted, transformed, and documented appropriately to meet the requirements of the archive. Researchers should consult colleagues and professional societies in their discipline to determine the most appropriate database, and include a backup archive in their data management plan in case their first choice goes out of existence. Early in the project, the primary researcher should identify what data will be preserved in an archive. Usually, preserving the data in its most raw form is desirable, although data derivatives and products can also be preserved. An individual should be identified as the primary contact person for archived data, and ensure contact information is always kept up-to-date in case there are requests for data or information about data. === Budget === Data management and preservation costs may be considerable, depending on the nature of the project. By anticipating costs ahead of time, researchers ensure that the data will be properly managed and archived. Potential expenses that should be considered are Human resources and staff as they handle data preparation, management, documentation, and preservation Hardware and/or software needed for data management, backing up, security, documentation, and preservation Costs associated with submitting the data to an archive The data management plan should include how these costs will be paid. == NSF Data Management Plan == All grant proposals submitted to National Science Foundation (NSF) must include a Data Management Plan that is no more than two pages. This is a supplement (not part of the 15-page proposal) and should describe how the proposal will conform to the Award and Administration Guide policy (see below). It may include the following: The types of data The standards to be used for data and metadata format and content Policies for access and sharing Policies and provisions for re-use Plans for archiving data Policy summarized from the NSF Award and Administration Guide, Section 4 (Dissemination and Sharing of Research Results): Promptly publish with appropriate authorship Share data, samples, physical collections, and supporting materials with others, within a reasonable time frame Share software and inventions Investigators can keep their legal rights over their intellectual property, but they still have to make their results, data, and collections available to others Policies will be implemented via Proposal review Award negotiations and conditions Support/incentives == ESRC Data Management Plan == Since 1995, the UK's Economic and Social Research Council (ESRC) have had a research data policy in place. The current ESRC Research Data Policy states that research data created as a result of ESRC-funded research should be openly available to the scientific community to the maximum extent possible, through long-term preservation and high-quality data management. ESRC requires a data management plan for all research award applications where new data are being created. Such plans are designed to promote a structured approach to data management throughout the data lifecycle, resulting in better quality data that is ready to archive for sharing and re-use. The UK Data Service, the ESRC's flagship data service, provides practical guidance on research data management planning suitable for social science researchers in the UK and around the world. ESRC has a longstanding arrangement with the UK Data A

    Read more →