AI Content Youtube Monetization

AI Content Youtube Monetization — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Multi-scale approaches

    Multi-scale approaches

    The scale space representation of a signal obtained by Gaussian smoothing satisfies a number of special properties, scale-space axioms, which make it into a special form of multi-scale representation. There are, however, also other types of "multi-scale approaches" in the areas of computer vision, image processing and signal processing, in particular the notion of wavelets. The purpose of this article is to describe a few of these approaches: == Scale-space theory for one-dimensional signals == For one-dimensional signals, there exists quite a well-developed theory for continuous and discrete kernels that guarantee that new local extrema or zero-crossings cannot be created by a convolution operation. For continuous signals, it holds that all scale-space kernels can be decomposed into the following sets of primitive smoothing kernels: the Gaussian kernel : g ( x , t ) = 1 2 π t exp ⁡ ( − x 2 / 2 t ) {\displaystyle g(x,t)={\frac {1}{\sqrt {2\pi t}}}\exp({-x^{2}/2t})} where t > 0 {\displaystyle t>0} , truncated exponential kernels (filters with one real pole in the s-plane): h ( x ) = exp ⁡ ( − a x ) {\displaystyle h(x)=\exp({-ax})} if x ≥ 0 {\displaystyle x\geq 0} and 0 otherwise where a > 0 {\displaystyle a>0} h ( x ) = exp ⁡ ( b x ) {\displaystyle h(x)=\exp({bx})} if x ≤ 0 {\displaystyle x\leq 0} and 0 otherwise where b > 0 {\displaystyle b>0} , translations, rescalings. For discrete signals, we can, up to trivial translations and rescalings, decompose any discrete scale-space kernel into the following primitive operations: the discrete Gaussian kernel T ( n , t ) = I n ( α t ) {\displaystyle T(n,t)=I_{n}(\alpha t)} where α , t > 0 {\displaystyle \alpha ,t>0} where I n {\displaystyle I_{n}} are the modified Bessel functions of integer order, generalized binomial kernels corresponding to linear smoothing of the form f o u t ( x ) = p f i n ( x ) + q f i n ( x − 1 ) {\displaystyle f_{out}(x)=pf_{in}(x)+qf_{in}(x-1)} where p , q > 0 {\displaystyle p,q>0} f o u t ( x ) = p f i n ( x ) + q f i n ( x + 1 ) {\displaystyle f_{out}(x)=pf_{in}(x)+qf_{in}(x+1)} where p , q > 0 {\displaystyle p,q>0} , first-order recursive filters corresponding to linear smoothing of the form f o u t ( x ) = f i n ( x ) + α f o u t ( x − 1 ) {\displaystyle f_{out}(x)=f_{in}(x)+\alpha f_{out}(x-1)} where α > 0 {\displaystyle \alpha >0} f o u t ( x ) = f i n ( x ) + β f o u t ( x + 1 ) {\displaystyle f_{out}(x)=f_{in}(x)+\beta f_{out}(x+1)} where β > 0 {\displaystyle \beta >0} , the one-sided Poisson kernel p ( n , t ) = e − t t n n ! {\displaystyle p(n,t)=e^{-t}{\frac {t^{n}}{n!}}} for n ≥ 0 {\displaystyle n\geq 0} where t ≥ 0 {\displaystyle t\geq 0} p ( n , t ) = e − t t − n ( − n ) ! {\displaystyle p(n,t)=e^{-t}{\frac {t^{-n}}{(-n)!}}} for n ≤ 0 {\displaystyle n\leq 0} where t ≥ 0 {\displaystyle t\geq 0} . From this classification, it is apparent that we require a continuous semi-group structure, there are only three classes of scale-space kernels with a continuous scale parameter; the Gaussian kernel which forms the scale-space of continuous signals, the discrete Gaussian kernel which forms the scale-space of discrete signals and the time-causal Poisson kernel that forms a temporal scale-space over discrete time. If we on the other hand sacrifice the continuous semi-group structure, there are more options: For discrete signals, the use of generalized binomial kernels provides a formal basis for defining the smoothing operation in a pyramid. For temporal data, the one-sided truncated exponential kernels and the first-order recursive filters provide a way to define time-causal scale-spaces that allow for efficient numerical implementation and respect causality over time without access to the future. The first-order recursive filters also provide a framework for defining recursive approximations to the Gaussian kernel that in a weaker sense preserve some of the scale-space properties.

    Read more →
  • Single customer view

    Single customer view

    A single customer view is an aggregated, consistent and holistic representation of the data held by an organisation about its customers that can be viewed in one place, such as a single page. The advantage to an organisation of attaining this unified view comes from the ability it gives to analyse past behaviour in order to better target and personalise future customer interactions. A single customer view is also considered especially relevant where organisations engage with customers through multichannel marketing, since customers expect those interactions to reflect a consistent understanding of their history and preferences. However, some commentators have challenged the idea that a single view of customers across an entire organisation is either natural or meaningful, proposing that the priority should instead be consistency between the multiple views that arise in different contexts. Where representations of a customer are held in more than one data set, achieving a single customer view can be difficult: firstly because customer identity must be traceable between the records held in those systems, and secondly because anomalies or discrepancies in the customer data must be data cleansed for data quality. As such, the acquisition by an organisation of a single customer view is one potential outcome of successful master data management. Since 31 December, 2010, maintaining a single customer view, and submitting it within 72 hours, has become mandatory for financial institutions in the United Kingdom due to new rules introduced by the Financial Services Compensation Scheme.

    Read more →
  • Enumeration algorithm

    Enumeration algorithm

    In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms. == Formal definitions == An enumeration problem P {\displaystyle P} is defined as a relation R {\displaystyle R} over strings of an arbitrary alphabet Σ {\displaystyle \Sigma } : R ⊆ Σ ∗ × Σ ∗ {\displaystyle R\subseteq \Sigma ^{}\times \Sigma ^{}} An algorithm solves P {\displaystyle P} if for every input x {\displaystyle x} the algorithm produces the (possibly infinite) sequence y {\displaystyle y} such that y {\displaystyle y} has no duplicate and z ∈ y {\displaystyle z\in y} if and only if ( x , z ) ∈ R {\displaystyle (x,z)\in R} . The algorithm should halt if the sequence y {\displaystyle y} is finite. == Common complexity classes == Enumeration problems have been studied in the context of computational complexity theory, and several complexity classes have been introduced for such problems. A very general such class is EnumP, the class of problems for which the correctness of a possible output can be checked in polynomial time in the input and output. Formally, for such a problem, there must exist an algorithm A which takes as input the problem input x, the candidate output y, and solves the decision problem of whether y is a correct output for the input x, in polynomial time in x and y. For instance, this class contains all problems that amount to enumerating the witnesses of a problem in the class NP. Other classes that have been defined include the following. In the case of problems that are also in EnumP, these problems are ordered from least to most specific: Output polynomial, the class of problems whose complete output can be computed in polynomial time. Incremental polynomial time, the class of problems where, for all i, the i-th output can be produced in polynomial time in the input size and in the number i. Polynomial delay, the class of problems where the delay between two consecutive outputs is polynomial in the input (and independent from the output). Strongly polynomial delay, the class of problems where the delay before each output is polynomial in the size of this specific output (and independent from the input or from the other outputs). The preprocessing is generally assumed to be polynomial. Constant delay, the class of problems where the delay before each output is constant, i.e., independent from the input and output. The preprocessing phase is generally assumed to be polynomial in the input. == Common techniques == Backtracking: The simplest way to enumerate all solutions is by systematically exploring the space of possible results (partitioning it at each successive step). However, performing this may not give good guarantees on the delay, i.e., a backtracking algorithm may spend a long time exploring parts of the space of possible results that do not give rise to a full solution. Flashlight search: This technique improves on backtracking by exploring the space of all possible solutions but solving at each step the problem of whether the current partial solution can be extended to a partial solution. If the answer is no, then the algorithm can immediately backtrack and avoid wasting time, which makes it easier to show guarantees on the delay between any two complete solutions. In particular, this technique applies well to self-reducible problems. Closure under set operations: If we wish to enumerate the disjoint union of two sets, then we can solve the problem by enumerating the first set and then the second set. If the union is non disjoint but the sets can be enumerated in sorted order, then the enumeration can be performed in parallel on both sets while eliminating duplicates on the fly. If the union is not disjoint and both sets are not sorted then duplicates can be eliminated at the expense of a higher memory usage, e.g., using a hash table. Likewise, the cartesian product of two sets can be enumerated efficiently by enumerating one set and joining each result with all results obtained when enumerating the second step. == Examples of enumeration problems == The vertex enumeration problem, where we are given a polytope described as a system of linear inequalities and we must enumerate the vertices of the polytope. Enumerating the minimal transversals of a hypergraph. This problem is related to monotone dualization and is connected to many applications in database theory and graph theory. Enumerating the answers to a database query, for instance a conjunctive query or a query expressed in monadic second-order. There have been characterizations in database theory of which conjunctive queries could be enumerated with linear preprocessing and constant delay. The problem of enumerating maximal cliques in an input graph, e.g., with the Bron–Kerbosch algorithm Listing all elements of structures such as matroids and greedoids Several problems on graphs, e.g., enumerating independent sets, paths, cuts, etc. Enumerating the satisfying assignments of representations of Boolean functions, e.g., a Boolean formula written in conjunctive normal form or disjunctive normal form, a binary decision diagram such as an OBDD, or a Boolean circuit in restricted classes studied in knowledge compilation, e.g., NNF. == Connection to computability theory == The notion of enumeration algorithms is also used in the field of computability theory to define some high complexity classes such as RE, the class of all recursively enumerable problems. This is the class of sets for which there exist an enumeration algorithm that will produce all elements of the set: the algorithm may run forever if the set is infinite, but each solution must be produced by the algorithm after a finite time.

    Read more →
  • Jump-and-Walk algorithm

    Jump-and-Walk algorithm

    Jump-and-Walk is an algorithm for point location in triangulations (though most of the theoretical analysis were performed in 2D and 3D random Delaunay triangulations). Surprisingly, the algorithm does not need any preprocessing or complex data structures except some simple representation of the triangulation itself. The predecessor of Jump-and-Walk was due to Lawson (1977) and Green and Sibson (1978), which picks a random starting point S and then walks from S toward the query point Q one triangle at a time. But no theoretical analysis was known for these predecessors until after mid-1990s. Jump-and-Walk picks a small group of sample points and starts the walk from the sample point which is the closest to Q until the simplex containing Q is found. The algorithm was a folklore in practice for some time, and the formal presentation of the algorithm and the analysis of its performance on 2D random Delaunay triangulation was done by Devroye, Mucke and Zhu in mid-1990s (the paper appeared in Algorithmica, 1998). The analysis on 3D random Delaunay triangulation was done by Mucke, Saias and Zhu (ACM Symposium of Computational Geometry, 1996). In both cases, a boundary condition was assumed, namely, Q must be slightly away from the boundary of the convex domain where the vertices of the random Delaunay triangulation are drawn. In 2004, Devroye, Lemaire and Moreau showed that in 2D the boundary condition can be withdrawn (the paper appeared in Computational Geometry: Theory and Applications, 2004). Jump-and-Walk has been used in many famous software packages, e.g., QHULL, Triangle and CGAL.

    Read more →
  • ROCm

    ROCm

    ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains, including general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), and heterogeneous computing. It offers several programming models: HIP (GPU-kernel-based programming), OpenMP (directive-based programming), and OpenCL. ROCm is free, libre and open-source software (except the GPU firmware blobs), and it is distributed under various licenses. The name initially stood for Radeon Open Compute platform; however, due to Open Compute being a registered trademark, the name no longer functions as an acronym. == Background == The first GPGPU software stack from ATI/AMD was Close to Metal, which became Stream. ROCm was launched around 2016 with the Boltzmann Initiative. ROCm stack builds upon previous AMD GPU stacks; some tools trace back to GPUOpen and others to the Heterogeneous System Architecture (HSA). === Heterogeneous System Architecture Intermediate Language === HSAIL was aimed at producing a middle-level, hardware-agnostic intermediate representation that could be JIT-compiled to the eventual hardware (GPU, FPGA...) using the appropriate finalizer. This approach was dropped for ROCm: now it builds only GPU code, using LLVM, and its AMDGPU backend that was upstreamed, although there is still research on such enhanced modularity with LLVM MLIR. == Programming abilities == ROCm as a stack ranges from the kernel driver to the end-user applications. AMD has introductory videos about AMD GCN hardware, and ROCm programming via its learning portal. One of the best technical introductions about the stack and ROCm/HIP programming, remains, to date, to be found on Reddit. == Hardware support == ROCm is primarily targeted at discrete professional GPUs, but consumer GPUs and APUs of the same architecture as a supported professional GPU are known to work with ROCm. For example, all professional GPUs of the RDNA 2 architecture are officially supported by ROCm 5.x; users report that Consumer RDNA2 units such as the Radeon 6800M APU and the Radeon 6700XT GPU also work. === Professional-grade GPUs === === Consumer-grade GPUs === == Software ecosystem == === Machine learning === Various deep learning frameworks have a ROCm backend: PyTorch TensorFlow ONNX MXNet CuPy MIOpen Caffe Iree (which uses LLVM Multi-Level Intermediate Representation (MLIR)) llama.cpp === Supercomputing === ROCm is gaining significant traction in the top 500. ROCm is used with the Exascale supercomputers El Capitan and Frontier. Some related software is to be found at AMD Infinity hub. === Other acceleration & graphics interoperation === As of version 3.0, Blender can now use HIP compute kernels for its renderer cycles. === Other languages === ==== Julia ==== Julia has the AMDGPU.jl package, which integrates with LLVM and selects components of the ROCm stack. Instead of compiling code through HIP, AMDGPU.jl uses Julia's compiler to generate LLVM IR directly, which is later consumed by LLVM to generate native device code. AMDGPU.jl uses ROCr's HSA implementation to upload native code onto the device and execute it, similar to how HIP loads its own generated device code. AMDGPU.jl also supports integration with ROCm's rocBLAS (for BLAS), rocRAND (for random number generation), and rocFFT (for FFTs). Future integration with rocALUTION, rocSOLVER, MIOpen, and certain other ROCm libraries is planned. === Software distribution === ==== Official ==== Installation instructions are provided for Linux and Windows in the official AMD ROCm documentation. ROCm software is currently spread across several public GitHub repositories. Within the main public meta-repository, there is an XML manifest for each official release: using git-repo, a version control tool built on top of Git, is the recommended way to synchronize with the stack locally. AMD starts distributing containerized applications for ROCm, notably scientific research applications gathered under AMD Infinity Hub. AMD distributes itself packages tailored to various Linux distributions. ==== Third-party ==== There is a growing third-party ecosystem packaging ROCm. Linux distributions are officially packaging (natively) ROCm, with various degrees of advancement: Arch Linux, Gentoo, Debian, Fedora , GNU Guix, and NixOS. There are Spack packages. == Components == There is one kernel-space component, ROCk, and the rest - there is roughly a hundred components in the stack - is made of user-space modules. The unofficial typographic policy is to use: uppercase ROC lowercase following for low-level libraries, i.e. ROCt, and the contrary for user-facing libraries, i.e. rocBLAS. AMD is active developing with the LLVM community, but upstreaming is not instantaneous, and as of January 2022, is still lagging. AMD still officially packages various LLVM forks for parts that are not yet upstreamed – compiler optimizations destined to remain proprietary, debug support, OpenMP offloading, etc. === Low-level === ==== ROCk – Kernel driver ==== ==== ROCm – Device libraries ==== Support libraries implemented as LLVM bitcode. These provide various utilities and functions for math operations, atomics, queries for launch parameters, on-device kernel launch, etc. ==== ROCt – Thunk ==== The thunk is responsible for all the thinking and queuing that goes into the stack. ==== ROCr – Runtime ==== The ROC runtime is a set of APIs/libraries that allows the launch of compute kernels by host applications. It is AMD's implementation of the HSA runtime API. It is different from the ROC Common Language Runtime. ==== ROCm – CompilerSupport ==== ROCm code object manager is in charge of interacting with LLVM intermediate representation. === Mid-level === ==== ROCclr Common Language Runtime ==== The common language runtime is an indirection layer adapting calls to ROCr on Linux and PAL on windows. It used to be able to route between different compilers, like the HSAIL-compiler. It is now being absorbed by the upper indirection layers (HIP and OpenCL). ==== OpenCL ==== ROCm ships its installable client driver (ICD) loader and an OpenCL implementation bundled together. As of January 2022, ROCm 4.5.2 ships OpenCL 2.2, and is lagging behind competition. ==== HIP – Heterogeneous Interface for Portability ==== The AMD implementation for its GPUs is called HIPAMD. There is also a CPU implementation mostly for demonstration purposes. ==== HIPCC ==== HIP builds a `HIPCC` compiler that either wraps Clang and compiles with LLVM open AMDGPU backend, or redirects to the NVIDIA compiler. ==== HIPIFY ==== HIPIFY is a source-to-source compiling tool. It translates CUDA to HIP and reverse, either using a Clang-based tool, or a sed-like Perl script. ==== GPUFORT ==== Like HIPIFY, GPUFORT is a tool compiling source code into other third-generation-language sources, allowing users to migrate from CUDA Fortran to HIP Fortran. It is also in the repertoire of research projects, even more so. === High-level === ROCm high-level libraries are usually consumed directly by application software, such as machine learning frameworks. Most of the following libraries are in the General Matrix Multiply (GEMM) category, which GPU architecture excels at. The majority of these user-facing libraries comes in dual-form: hip for the indirection layer that can route to Nvidia hardware, and roc for the AMD implementation. ==== rocBLAS / hipBLAS ==== rocBLAS and hipBLAS are central in high-level libraries, it is the AMD implementation for Basic Linear Algebra Subprograms. It uses the library Tensile privately. ==== rocSOLVER / hipSOLVER ==== This pair of libraries constitutes the LAPACK implementation for ROCm and is strongly coupled to rocBLAS. === Utilities === ROCm developer tools: Debug, tracer, profiler, System Management Interface, Validation suite, Cluster management. GPUOpen tools: GPU analyzer, memory visualizer... External tools: radeontop (TUI overview) == Comparison with competitors == ROCm competes with other GPU computing stacks: Nvidia CUDA and Intel OneAPI. === Nvidia CUDA === Nvidia's CUDA is closed-source, whereas AMD ROCm is open source. There is open-source software built on top of the closed-source CUDA, for instance RAPIDS. CUDA is able to run on consumer GPUs, whereas ROCm support is mostly offered for professional hardware such as AMD Instinct and AMD Radeon Pro. Nvidia provides a C/C++-centered frontend and its Parallel Thread Execution (PTX) LLVM GPU backend as the Nvidia CUDA Compiler (NVCC). === Intel OneAPI === All the oneAPI corresponding libraries are published on its GitHub Page. ==== Unified Acceleration Foundation (UXL) ==== Unified Acceleration Foundation (UXL) is a new technology consortium that are working on the continuation of the OneAPI initiative, with the goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Specia

    Read more →
  • Semantic translation

    Semantic translation

    Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system. An example of semantic translation is the conversion of XML data from one data model to a second data model using formal ontologies for each system such as the Web Ontology Language (OWL). This is frequently required by intelligent agents that wish to perform searches on remote computer systems that use different data models to store their data elements. The process of allowing a single user to search multiple systems with a single search request is also known as federated search. Semantic translation should be differentiated from data mapping tools that do simple one-to-one translation of data from one system to another without actually associating meaning with each data element. Semantic translation requires that data elements in the source and destination systems have "semantic mappings" to a central registry or registries of data elements. The simplest mapping is of course where there is equivalence. There are three types of Semantic equivalence: Class Equivalence - indicating that class or "concepts" are equivalent. For example: "Person" is the same as "Individual" Property Equivalence - indicating that two properties are equivalent. For example: "PersonGivenName" is the same as "FirstName" Instance Equivalence - indicating that two individual instances of objects are equivalent. For example: "Dan Smith" is the same person as "Daniel Smith" Semantic translation is very difficult if the terms in a particular data model do not have direct one-to-one mappings to data elements in a foreign data model. In that situation, an alternative approach must be used to find mappings from the original data to the foreign data elements. This problem can be alleviated by centralized metadata registries that use the ISO-11179 standards such as the National Information Exchange Model (NIEM).

    Read more →
  • Environmental informatics

    Environmental informatics

    Environmental informatics is the science of information applied to environmental science. As such, it provides the information processing and communication infrastructure to the interdisciplinary field of environmental sciences aiming at data, information and knowledge integration, the application of computational intelligence to environmental data as well as the identification of environmental impacts of information technology. Environmental informatics thus acts as a bridge, providing an interdisciplinary means of analysing, describing and understanding the complex interactions between humans, nature and technology. Since each field of applied computer science has its own subject matter, terminology and methods, specialised disciplines, such as environmental, bio- and geoinformatics have emerged, each of which combines computer science with a specific field of application such as environmental, bio- or geosciences. Environmental informatics, bioinformatics and geoinformatics all deal with computer-based processing of environmental phenomena. However, environmental informatics is the only field that pursues normative goals (e.g., political goals of environmental protection, environmental planning, and sustainability). This also influences the choice of methods. This also distinguishes it from application areas such as numerical weather prediction, which is considered an early and important example of computer simulation of environmental phenomena. The UK Natural Environment Research Council defines environmental informatics as the "research and system development focusing on the environmental sciences relating to the creation, collection, storage, processing, modelling, interpretation, display and dissemination of data and information." Kostas Karatzas defined environmental informatics as the "creation of a new 'knowledge-paradigm' towards serving environmental management needs." Karatzas argued further that environmental informatics "is an integrator of science, methods and techniques and not just the result of using information and software technology methods and tools for serving environmental engineering needs." Environmental informatics emerged in early 1990 in Central Europe. Current initiatives to effectively manage, share, and reuse environmental and ecological data are indicative of the increasing importance of fields like environmental informatics and ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE and Data Conservancy. == Subject matter and objectives == The subject of environmental informatics are environmental information systems (EIS). An EIS 'is a computer-based system that integrates and stores data collected about the natural environment and provides powerful methods for accessing and evaluating it.' This allows environmental data to be processed by computers for environmental protection, planning, research and technology. According to Jaeschke and Bossel, environmental informatics has three interrelated objectives: Environmental informatics serves to procure data and information for describing the state and development of the environment. Of particular importance is information that is needed to prevent or limit undesirable changes and to support desirable changes. Based on the evaluation and analysis of data, environmental informatics improves our understanding of the environment and the interactions between nature, technology and society. It thus supports environmentally relevant decisions. This enables the influence of development (system correction), the assessment of the effects and side effects of potential measures, and the creation of tools for the routine planning, implementation and monitoring of measures. == History == The simulation model World3, which formed the basis of the highly acclaimed study The Limits to Growth, is considered the starting point of environmental informatics. It incorporated environmental information, among other things, to calculate scenarios for global development. In the mid-1980s, interest grew in structuring environmental protection as an area of application for computer science. One of the first publications in German was the book Informatik im Umweltschutz. Anwendungen und Perspektiven (Computer science in environmental protection. Applications and perspectives) from 1986. The term 'environmental informatics' did not appear until around 1993, which is why the development of environmental informatics is usually referred to as having taken place in the 1990s. In 1993, the first university chair for environmental informatics was established in Cottbus. In 1994, the anthology Umweltinformatik. Informatikmethoden für Umweltschutz und Umweltforschung (Environmental Informatics: Informatics Methods for Environmental Protection and Environmental Research) was published. The development of environmental informatics was 'primarily initiated by German computer science.' In the English-speaking world, the volume Environmental Informatics was published in 1995, mainly based on the German anthology of 1994. An article in the conference proceedings of the World Computer Congress of the International Federation for Information Processing (IFIP) in Hamburg in 1994 describes the initial situation of environmental informatics as follows: 'On the one hand, we suffer from the huge amount of available data – people sometimes speak of data graveyards – on the other hand, the really relevant data may still be missing.' This statement indicates the need that led to the emergence of environmental informatics as a specialised discipline of applied computer science. Furthermore, the specific characteristics and processing requirements of environmental data necessitated the emergence of environmental informatics. The special features of environmental data include: The data structures required are highly heterogeneous due to specific processes and differing perspectives on environmental aspects (e.g., water protection, emission control, hazardous substances). In addition to the heterogeneity of the data, heterogeneous databases also play a role, as environmental data is often obtained and presented in an interdisciplinary manner. Obligations change frequently as a result of new legislation, whether regional (e.g. state regulations on water protection), national (e.g. federal emission control regulations) or international (e.g. Registration, Evaluation, Authorisation and Restriction of Chemicals|REACH). The objects represented are often multidimensional and, therefore, require complex geometric representation using curves or polygons. It is often necessary to process uncertain, imprecise or incomplete data, which is, for example, the result of extrapolations or forecasts. A new "knowledge paradigm" has emerged to meet the requirements of environmental management. Environmental informatics produces its own concepts, methods and techniques and is not merely the result of using information and communication technology methods and tools to meet environmental requirements. The development of environmental informatics since the 1990s has been significantly influenced by the newly established conferences EnviroInfo, ISESS and ITEE and is documented in the respective proceedings. Aspects of sustainability and sustainable development were increasingly integrated into environmental informatics after 2000, thereby expanding the field. In 2004, the Working Group on Sustainable Information Society of the Gesellschaft für Informatik e. V. (German Informatics Society, GI) published the Memorandum on a Sustainable Information Society, which formulates recommendations for an information society that is compatible with human, social and natural needs. Since 2007, environmental informatics has often been described in more detail as informatics for environmental protection, sustainable development and risk management. The increased focus on sustainability has also contributed to the formation of the research focus Information and Communications Technology for Sustainability (ICT4S) and to the emergence of the international conference ICT4S in 2013. ICT-ENSURE, the European Commission's funding measure for the establishment of a European research area on "ICT for Environmental Sustainability Research" (2008–2010), has also contributed to the structuring of environmental informatics. == Environmental informatics and sustainable development == Efforts to place environmental informatics within the context of sustainable development have been growing since 2000 and were significantly influenced by the Memorandum on a Sustainable Information Society. According to this Memorandum, the information society offers great but unevenly distributed opportunities for education, participation and intercultural understanding. In addition, the Memorandum highlighted the material and energy consumption of inf

    Read more →
  • Point-in-time recovery

    Point-in-time recovery

    Point-in-time recovery (PITR) in the context of computers involves systems, often databases, whereby an administrator can restore or recover a set of data or a particular setting from a time in the past. Note for example Windows's capability to restore operating-system settings from a past date (for instance, before data corruption occurred). Time Machine for macOS provides another example of point-in-time recovery. Once PITR logging starts for a PITR-capable database, a database administrator can restore that database from backups to the state that it had at any time since.

    Read more →
  • TIMIT

    TIMIT

    TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time. TIMIT was designed to further acoustic-phonetic knowledge and automatic speech recognition systems. It was commissioned by DARPA and corpus design was a joint effort between the Massachusetts Institute of Technology, SRI International, and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and verified and prepared for publishing by the National Institute of Standards and Technology (NIST). There is also a telephone bandwidth version called NTIMIT (Network TIMIT). TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset. == Data == TIMIT contains ~5 hours of speech, of 10 sentences spoken by each of 630 speakers. The sentences were randomly sampled from a corpus of 2342 sentences. The speakers were native speakers of American English, classified under 8 major dialect regions: New England, Northern, North Midland, South Midland, Southern, New York City, Western, Army Brat (moved around). The speakers were 70% male and 30% female. Recordings were made in a noise-isolated recording booth at Texas Instrument, using a semi-automatic computer system (STEROIDS) to control the presentation of prompts to the speaker and the recording. Two-channel recordings were made using a Sennheiser HMD 414 headset-mounted microphone and a Brüel & Kjær 1/2" far-field pressure microphone (#4165). The speech was digitized at a sample rate of 20 kHz then and downsampled to 16 kHz. == History == The TIMIT telephone corpus was an early attempt to create a database with speech samples. It was published in the year 1988 on CD-ROM and consists of only 10 sentences per speaker. Two 'dialect' sentences were read by each speaker, as well as another 8 sentences selected from a larger set Each sentence averages 3 seconds long and is spoken by 630 different speakers. It was the first notable attempt in creating and distributing a speech corpus and the overall project has produced costs of 1.5 million US$. An update was released in October 1990. It included full 630-speaker corpus; checked and corrected transcriptions; word-alignment transcriptions; NIST SPHERE-headered waveform files and header manipulation software; phonemic dictionary; new test and training subsets balanced for dialectal and phonetic coverage; more extensive documentation. The full name of the project is DARPA-TIMIT Acoustic-Phonetic Continuous Speech Corpus and the acronym TIMIT stands for Texas Instruments/Massachusetts Institute of Technology. The main reason why a corpus of telephone speech was created was to train speech recognition software. In the Blizzard challenge, different software has the obligation to convert audio recordings into textual data and the TIMIT corpus was used as a standardized baseline.

    Read more →
  • CENDI

    CENDI

    CENDI (Commerce, Energy, NASA, Defense Information Managers Group) is an interagency group of senior Scientific and Technical Information (STI) managers from 14 United States federal agencies. CENDI managers cooperate by exchanging information and ideas, collaborating to address common issues, and undertaking joint initiatives. CENDI's accomplishments range from impacting federal information policy to educating a broad spectrum of stakeholders on all aspects of federal STI systems, including its value to research and the taxpayer, and to operational improvements in agency and interagency STI operations. == History == CENDI traces its roots to the Committee on Scientific and Technical Information (COSATI) of the Federal Council on Science and Technology. COSATI was established in the early 1960s to coordinate the management of the results from the U.S. government's increasing commitment to scientific research and technology development. The scientific and technical information (STI) managers of the government's major research and development (R&D) agencies worked within COSATI to standardize guidelines for cataloging and indexing technical reports. COSATI ceased formal operations in the early 1970s. To continue the cooperation begun under COSATI, managers of agency STI programs from Commerce (National Technical Information Service), Energy (Office of Scientific and Technical Information), NASA (HQ/STI Division), and Defense (Defense Technical Information Center) began meeting periodically to discuss common topics and stimulate more effective cooperation. In 1985, a Memorandum of Understanding was signed by the four charter agencies and CENDI was established. From this small core of STI managers, CENDI has grown to its current membership, which represents the major science agencies, the national libraries, and agencies involved in the dissemination and long-term management of scientific and technical information. The vision of CENDI is to facilitate cooperative enterprise where capabilities are shared and challenges are faced together so that the sum of the accomplishments is greater than each individual agency can achieve on its own amongst federal STI agencies. The abbreviation CENDI refers to the "Commerce, Energy, NASA, Defense Information Managers Group". == Membership == New members from other federal R&D information organizations may be admitted by unanimous agreement of the members. However, it is the intent of the group that membership in CENDI should remain small and focus on organizations with STI or supporting responsibilities. Each agency provides funding to CENDI. == Members == The members of CENDI are: Defense Technical Information Center (United States Department of Defense) Office of Research and Development and Office of Environmental Information (United States Environmental Protection Agency) Government Printing Office Library of Congress NASA Scientific and Technical Information Program National Agricultural Library (United States Department of Agriculture) National Archives and Records Administration National Library of Education (United States Department of Education) National Library of Medicine (United States Department of Health and Human Services) National Science Foundation National Technical Information Service (United States Department of Commerce) National Transportation Library (United States Department of Transportation) Office of Scientific and Technical Information (United States Department of Energy) USGS/Biological Resources Discipline (United States Department of the Interior) == Mission and operation == CENDI's mission is to help improve the productivity of federal science- and technology-based programs through effective scientific, technical, and related information support systems. In fulfilling its mission, CENDI agencies play an important role in addressing science- and technology-based national priorities and strengthening U.S. competitiveness. === Goals === STI Coordination and Leadership: Provide coordination and leadership for information exchange on important STI policy issues. Improvement of STI Systems: Promote the development of improved STI systems through the productive interrelationship of content and technology. STI Understanding: Promote better understanding of STI and STI management. === Principals and Alternates === CENDI is made up of senior federal STI managers and each organization appoints a Principal representative. This person is the point of contact for that organization within CENDI. Each Principal has an Alternate. The Principals and Alternates comprise the main group that meets on a regular basis, usually every other month. === Secretariat === A Tennessee-based information management company, -- Information International Associates, Inc., currently serves as the CENDI Secretariat. The Secretariat provides day-to-day operations to CENDI. The Secretariat prepares the necessary materials for the Principals' meetings, provides support for the working group and task group meetings, assists in developing papers, and maintains the CENDI files and outreach tools. === Task Groups and Working Groups === The chair(s) of a working group is appointed by the Principals and has the overall responsibility for the group's activities. The Secretariat provides support at the request of the Working Group chair(s). The Working Groups and Task Groups that are currently operating are: Copyright and Intellectual Property Working Group Distribution Markings Task Group Digital Preservation Task Group Digitization Specifications Task Group Image Metadata Task Group Science.gov (see below) STI Policy Working Group Terminology Resources Task Group === Science.gov and Worldwidescience.org === In 2001, in response to the April 2001 workshop on "Strengthening the Public Information Infrastructure for Science", and taking into consideration a request from Firstgov (now USA.gov) to develop specialized topical portals, CENDI formed an alliance to develop an interagency website for access to STI. This website, called Science.gov, is a one-stop source of STI, including both selected, authoritative government websites and deep Web databases of technical reports, journal articles, conference proceedings, and other published materials. Through the volunteer efforts of members and involving over 100 staff, content and architecture is developed for the site. The Science.gov website is hosted by the Department of Energy (DOE) Office of Scientific and Technical Information (OSTI). The site was formally launched in December 2002. As a result of the success of Science.gov, under DOE leadership and in cooperation with the International Council of Scientific and Technical Information, a worldwide coordination across national portals called WorldWideScience was launched in 2008. === Work with non-member organizations === CENDI works with several cooperating non-member organizations on a regular basis. These agencies are in academia, federal government, legal and policy analysis, international, non-governmental, and private organizations.

    Read more →
  • E-Science librarianship

    E-Science librarianship

    E-Science librarianship refers to a role for librarians in e-Science. == Early scholars == Early references to e-Science and librarianship involve information studies scholars researching cyberinfrastructure and emerging networked information and knowledge communities. Notably Christine Borgman, Professor and Presidential Chair in Information Studies at the University of California, Los Angeles (UCLA) was a key player in bringing e-Science, and the idea of networked knowledge communities, to the attention of the library profession. In 2004, as a visiting fellow at the Oxford Internet Institute, she conducted research and lectured publicly on e-Science, Digital Libraries, and Knowledge Communities. In 2007 Anna K. Gold, formerly of MIT and Cal Poly, San Luis Obispo, authored a series of articles in D-Lib Magazine that opened the door for academic libraries to begin exploring roles, skills, and strategies for engaging in e-Science: Cyberinfrastructure, Data, and Libraries, Part 1: A Cyberinfrastructure Primer for Librarians and Cyberinfrastructure, Data, and Libraries, Part 2: Libraries and the Data Challenge: Roles and Actions for Libraries. == Academic research and health sciences libraries == In 2007, the Association of Research Libraries (ARL) e-Science task force issued its report on e-Science and librarianship. The ARL's report encouraged its member libraries to position themselves to engage with researchers involved in e-Science (eScience) by cultivating new research support strategies and developing their digital scholarship infrastructure. E-Science has multiple attributes; Tony and Jessie Hey framed e-Science for the library community by characterizing it as a research methodology: "e-Science is not a new scientific discipline in its own right: e-Science is shorthand for the set of tools and technologies required to support collaborative, networked science". In addition to academic libraries' interests in providing support for their researchers engaging in e-Science, the health sciences library community also emerged as a major proponent for creating librarian positions for supporting the information needs of large-scale, networked, research collaborations on their campuses. Neil Rambo, current director of NYU's Health Sciences Library and former director of University of Washington Health Sciences Library, was the first to use the term in the Journal of the Medical Library Association, in his 2009 editorial e-Science and the Biomedical Library. Rambo's definition of e-Science highlighted the potential e-Science held for creating data as a research product: "E-science is a new research methodology, fueled by networked capabilities and the practical possibility of gathering and storing vast amounts of data." In response to this article the University of Massachusetts Medical School Lamar Soutter Library and National Network of Libraries of Medicine, New England Region encouraged health sciences libraries to cooperate to identify skills and develop a program for training e-Science Librarians. Then, in 2013, Shannon Bohle, an archivist who was employed in the library at Cold Spring Harbor Laboratory, an NCI-designated basic cancer research facility, used experience gained there and previous papers and presentations about preserving scientific archival materials to expand the traditional definition of e-Science by including the terms, principles, and practices used in archival science. These included in the definition the "long-term storage and accessibility of all materials generated through the scientific process," as well as examples of material types traditionally preserved in archives, like "electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints," as well as library materials ("print and/or electronic publications"). == Roles == Many areas of science are about to be transformed by the availability of vast amounts of new scientific data that can potentially provide insights at a level of detail never before envisaged. However, this new data dominant era brings new challenges for the scientists and they will need the skills and technologies both of computer scientists and of the library community to manage, search and curate these new data resources. Libraries will not be immune from change in this new world of research. Karen Williams identifies roles in the following areas for librarians in the developing world of e-Science. Campus Engagement Content/Collection Development and Management Teaching and Learning Scholarly Communication E-Scholarship and Digital Tools Reference/Help Services Outreach Fund Raising Exhibit and Event Planning Leadership == Challenges for research libraries == E-science tends toward inter- and multidisciplinary approaches that depend on computation and computer science. Research libraries have traditionally been discipline focused and, although increasingly technologically sophisticated, do not have systems of the scale or complexity of the e-science environment. E-science is data intensive, but research libraries have not typically been responsible for scientific data. E-science is frequently conducted in a team context, often distributed across multiple institutions and on a global scale. The primary constituency of libraries generally comprises those affiliated with the local institution. Licenses for electronic content are typically restricted to a particular institutional community, and the infrastructure to move institutional licenses into a multi-institutional environment is not well developed. E-science challenges all these traditional paradigms of research library organization and services. == Skills == Garritano & Carlson were among the first to outline a skill set for librarians seeking to support the data needs of e-Science; they identified five skill categories librarians new to this area should expect to adapt or develop when participating on such projects: Library and information science expertise Subject expertise Partnerships and outreach (both internal and external) Participating in sponsored research Balancing workload An example of librarians reconfiguring traditional librarian skills to meet the needs of researchers engaging in e-Science is Witt & Carlson's adaptation of the traditional reference interview into a "data interview" in order to provide effective data management and e-Science services. This interview consists of ten practical queries necessary for understanding the provenance and expectations for the preservation of datasets typical of e-Science that also help illustrate some of the educational tools and skills needed by a librarian new to e-Science. "What is the story of the data? What form and format are the data in? What is the expected lifespan of the dataset? How could the data be used, reused, and repurposed? How large is the dataset, and what is its rate of growth? Who are the potential audiences for the data? Who owns the data? Does the dataset include any sensitive information? What publications or discoveries have resulted from the data? How should the data be made accessible?" == Resources == In 2009 the Lamar Soutter Library at the University of Massachusetts Medical School (UMMS) and the National Network of Libraries of Medicine, New England Region (NN/LM NER) funded an e-Science program for building the skills highlighted above for librarians. Elaine Russo Martin, Director of Library Services at the Lamar Soutter Library and Director of the NN/LM NER developed this comprehensive e-Science program to build librarians' subject expertise in the sciences, developing their data management skills, and their familiarity with cyberinfrastructure and e-Science. Three major products of this program are the e-Science web portal for librarians, the E-Science Symposium, and the New England Collaborative Data Management Curriculum (NECDMC). This portal includes educational resources for specific tools and subject/discipline tutorials and modules to assist librarians new to e-Science. UMMS and NN/LM NER also publish an open access journal called the Journal of eScience Librarianship.

    Read more →
  • Microsoft Query

    Microsoft Query

    Microsoft Query is a visual method of creating database queries using examples based on a text string, the name of a document or a list of documents. The QBE system converts the user input into a formal database query using Structured Query Language (SQL) on the backend, allowing the user to perform powerful searches without having to explicitly compose them in SQL, and without even needing to know SQL. It is derived from Moshé M. Zloof's original Query by Example (QBE) implemented in the mid-1970s at IBM's Research Centre in Yorktown, New York. In the context of Microsoft Access, QBE is used for introducing students to database querying, and as a user-friendly database management system for small businesses. Microsoft Excel allows results of QBE queries to be embedded in spreadsheets.

    Read more →
  • Mastodon (social network)

    Mastodon (social network)

    Mastodon is a free and open-source software platform for decentralized social networking with microblogging features similar to Twitter. It operates as a federated network of independently managed servers that communicate using the ActivityPub protocol, allowing users to connect across different instances within the Fediverse. Each Mastodon instance establishes its own moderation policies and content guidelines, distinguishing it from centrally controlled social media platforms. First released in 2016 by Eugen Rochko, Mastodon has positioned itself as an alternative to mainstream social media, particularly for users seeking decentralized, community-driven spaces. The platform has experienced multiple surges in adoption, most notably following the Twitter acquisition by Elon Musk in 2022, as users sought alternatives to Twitter. It is part of a broader shift toward decentralized social networks, including Bluesky and Lemmy. Mastodon emphasizes user privacy and moderation flexibility, offering features such as granular post visibility controls, content warning options, and local community-driven moderation. The software is written in Ruby on Rails and Node.js, with a web interface built using React and Redux. It is interoperable with other ActivityPub-based platforms, such as Threads, and supports various third-party applications on desktop and mobile devices. == Functionality == Users post short-form status messages, historically known as "toots", for others to see and interact with. On a standard Mastodon instance, these messages can include up to 500 text-based characters, greater than Twitter's 280-character limit. Some instances support even longer messages. Images, audio files, videos or polls can also be added to a message. Users join a specific Mastodon server, rather than a single centralized website or application. The servers are connected as nodes in a network, and each server can administer its own rules, account privileges, and whether to share messages to and from other servers. Users can communicate and follow each other across connected Mastodon servers with usernames similar in format to full email addresses. Since version 2.9.0, Mastodon's web user interface has offered a single-column mode for new users by default. In advanced mode, the interface approximates the microblogging interface of TweetDeck. === Privacy === Mastodon includes a number of specific privacy features. Each message has a variety of privacy options available, and users can choose whether the message is public or private. Messages can display public on a global feed, known as a timeline, or can be shared only to the user's followers. Messages can also be marked as unlisted from timelines or direct between users. Users can also mark their accounts as completely private. In the timeline, messages can display with an optional content warning feature, which requires readers to click on the hidden main body of the message to reveal it. Mastodon servers have used this feature to hide spoilers, trigger warnings, and not safe for work (NSFW) content, though some accounts use the feature to hide links and thoughts others might not want to read. Mastodon aggregates messages in local and federated timelines in real time. The local timeline shows messages from users on a singular server, while the federated timeline shows messages across all participating Mastodon servers. === Content moderation === In early 2017, journalists like Sarah Jeong distinguished Mastodon from Twitter for its approach to combating harassment. Mastodon uses community-based moderation, in which each server can limit or filter out undesirable types of content, while Twitter uses a single, global policy on content moderation. Servers can choose to limit or filter out messages with disparaging content. The founder of Mastodon, Eugen Rochko, believes that small, closely related communities deal with unwanted behavior more effectively than a large company's small safety team. In Move Slowly and Build Bridges, Robert W. Gehl argues that predominantly white participation has shaped Mastodon in ways that affect how reports of racism are received and limit its ability to replicate Black Twitter on Twitter. Users can also block and report others to administrators, much like on Twitter. Instance administrators can block other instances from interacting with their own, an action called defederation. By posting toots hashtagged with #fediblock, some instance administrators and users alert others of issues requiring moderation. === Searching === Mastodon by default allows searching for hashtags and mentioned accounts in the Fediverse. Server administrators can optionally enable Elasticsearch to search the full-text of public posts that have opted in to being indexed. == Versions == In September 2018, with the release of version 2.5 with redesigned public profile pages, Mastodon marked its 100th release. Mastodon 2.6 was released in October 2018, introducing the possibilities of verified profiles and live, in-stream link previews for images and videos. Version 2.7, in January 2019, made it possible to search for multiple hashtags at once, instead of searching for just a single hashtag, with more robust moderation capabilities for server administrators and moderators, while accessibility, such as contrast for users with sight issues, was improved. The ability for users to create and vote in polls, as well as a new invitation system to manage registrations was integrated in April 2019. Mastodon 2.8.1, released in May 2019, made images with content warnings blurred instead of completely hidden. In version 2.9 in June 2019, an optional single-column view was added. This view became the default displayed to new users, with a user "preferences" option to switch to a multiple-column-based view. In August 2020, Mastodon 3.2 was released. It included a redesigned audio player with custom thumbnails and the ability to add personal notes to one's profile. In July 2021, an official client for iOS devices was released. According to the project's then CEO, Eugen Rochko, the release was part of an effort to attract new users. Mastodon 4.0 was released in November 2022, including language support for translating posts, editing posts and following hashtags. Mastodon 4.5 was released in November 2025. Among other features it introduced quote posts, which were previously rejected from being implemented due to concerns about toxicity and harassment. To mitigate these issues Mastodon's quote post feature has been designed in a way that lets users decide if and by whom their posts can be quoted. == Software == Mastodon is published as free and open-source software under the Affero GPL license, allowing anyone to use the software or modify it as they wish. Servers can be run by any individual or organization, and users can join these servers as they wish. The server software itself is powered by Ruby on Rails and Node.js, with its web client being written in React.js and Redux. The only database software supported is PostgreSQL, with Redis being used for job processing and various actions that Mastodon needs to process. The service is interoperable with the fediverse, a collection of social networking services which use the ActivityPub protocol for communication between each other, with previous versions containing support for OStatus. Client apps for interacting with the Mastodon API are available for desktop computer operating systems, including Windows, macOS and the Linux family of operating systems, as well as mobile phones running iOS and Android. The API is open for anyone to utilize, allowing clients to be built for any operating system that can connect to the internet. === Integration with Fediverse === Mastodon uses the ActivityPub protocol for federation; this allows users to communicate between independent Mastodon instances and other ActivityPub compatible services. Thus, Mastodon is generally considered to be a part of the Fediverse. Services utilizing the ActivityPub protocol exist which allow for searching all posts on all instances as long as users opt-in. For similar reasons, only hashtags can appear in a Mastodon instance's trending topics, not arbitrary popular words. Trending topics vary between instances, since individual instances are aware of different subsets of posts from the whole fediverse. === Security concerns === While Mastodon's decentralized structure is one of its most distinctive features, it also poses additional security challenges. Since many Mastodon instances are run by volunteers, some security experts are concerned about data security and responsiveness to new threats and vulnerabilities across the network, considering the difficulty of configuring and maintaining an instance as well as uneven skill levels among administrators. Administrators of an instance also have access to the private information of any users that are either registered with that instance or have federated

    Read more →
  • Enterprise Objects Framework

    Enterprise Objects Framework

    The Enterprise Objects Framework, or simply EOF, was introduced by NeXT in 1994 as a pioneering object-relational mapping product for its NeXTSTEP and OpenStep development platforms. EOF abstracts the process of interacting with a relational database by mapping database rows to Java or Objective-C objects. This largely relieves developers from writing low-level SQL code. EOF enjoyed some niche success in the mid-1990s among financial institutions who were attracted to the rapid application development advantages of NeXT's object-oriented platform. Since Apple Inc's merger with NeXT in 1996, EOF has evolved into a fully integrated part of WebObjects, an application server also originally from NeXT. Many of the core concepts of EOF re-emerged as part of Core Data, which further abstracts the underlying data formats to allow it to be based on non-SQL stores. == History == In the early 1990s NeXT Computer recognized that connecting to databases was essential to most businesses and yet also potentially complex. Every data source has a different data-access language (or API), driving up the costs to learn and use each vendor's product. The NeXT engineers wanted to apply the advantages of object-oriented programming, by getting objects to "talk" to relational databases. As the two technologies are very different, the solution was to create an abstraction layer, insulating developers from writing the low-level procedural code (SQL) specific to each data source. The first attempt came in 1992 with the release of Database Kit (DBKit), which wrapped an object-oriented framework around any database. Unfortunately, NEXTSTEP at the time was not powerful enough and DBKit had serious design flaws. NeXT's second attempt came in 1994 with the Enterprise Objects Framework (EOF) version 1, a complete rewrite that was far more modular and OpenStep compatible. EOF 1.0 was the first product released by NeXT using the Foundation Kit and introduced autoreleased objects to the developer community. The development team at the time was only four people: Jack Greenfield, Rich Williamson, Linus Upson and Dan Willhite. EOF 2.0, released in late 1995, further refined the architecture, introducing the editing context. At that point, the development team consisted of Dan Willhite, Craig Federighi, Eric Noyau and Charly Kleissner. EOF achieved a modest level of popularity in the financial programming community in the mid-1990s, but it would come into its own with the emergence of the World Wide Web and the concept of web applications. It was clear that EOF could help companies plug their legacy databases into the Web without any rewriting of that data. With the addition of frameworks to do state management, load balancing and dynamic HTML generation, NeXT was able to launch the first object-oriented Web application server, WebObjects, in 1996, with EOF at its core. In 2000, Apple Inc. (which had merged with NeXT) officially dropped EOF as a standalone product, meaning that developers would be unable to use it to create desktop applications for the forthcoming Mac OS X. It would, however, continue to be an integral part of a major new release of WebObjects. WebObjects 5, released in 2001, was significant for the fact that its frameworks had been ported from their native Objective-C programming language to the Java language. Critics of this change argue that most of the power of EOF was a side effect of its Objective-C roots, and that EOF lost the beauty or simplicity it once had. Third-party tools, such as EOGenerator, help fill the deficiencies introduced by Java (mainly due to the loss of categories). The Objective-C code base was re-introduced with some modifications to desktop application developers as Core Data, part of Apple's Cocoa API, with the release of Mac OS X Tiger in April 2005. == How EOF works == Enterprise Objects provides tools and frameworks for object-relational mapping. The technology specializes in providing mechanisms to retrieve data from various data sources, such as relational databases via JDBC and JNDI directories, and mechanisms to commit data back to those data sources. These mechanisms are designed in a layered, abstract approach that allows developers to think about data retrieval and commitment at a higher level than a specific data source or data source vendor. Central to this mapping is a model file (an "EOModel") that you build with a visual tool — either EOModeler, or the EOModeler plug-in to Xcode. The mapping works as follows: Database tables are mapped to classes. Database columns are mapped to class attributes. Database rows are mapped to objects (or class instances). You can build data models based on existing data sources or you can build data models from scratch, which you then use to create data structures (tables, columns, joins) in a data source. The result is that database records can be transposed into Java objects. The advantage of using data models is that applications are isolated from the idiosyncrasies of the data sources they access. This separation of an application's business logic from database logic allows developers to change the database an application accesses without needing to change the application. EOF provides a level of database transparency not seen in other tools and allows the same model to be used to access different vendor databases and even allows relationships across different vendor databases without changing source code. Its power comes from exposing the underlying data sources as managed graphs of persistent objects. In simple terms, this means that it organizes the application's model layer into a set of defined in-memory data objects. It then tracks changes to these objects and can reverse those changes on demand, such as when a user performs an undo command. Then, when it is time to save changes to the application's data, it archives the objects to the underlying data sources. === Using Inheritance === In designing Enterprise Objects developers can leverage the object-oriented feature known as inheritance. A Customer object and an Employee object, for example, might both inherit certain characteristics from a more generic Person object, such as name, address, and phone number. While this kind of thinking is inherent in object-oriented design, relational databases have no explicit support for inheritance. However, using Enterprise Objects, you can build data models that reflect object hierarchies. That is, you can design database tables to support inheritance by also designing enterprise objects that map to multiple tables or particular views of a database table. == Enterprise Objects (EOs) == An Enterprise Object is analogous to what is often known in object-oriented programming as a business object — a class which models a physical or conceptual object in the business domain (e.g. a customer, an order, an item, etc.). What makes an EO different from other objects is that its instance data maps to a data store. Typically, an enterprise object contains key-value pairs that represent a row in a relational database. The key is basically the column name, and the value is what was in that row in the database. So it can be said that an EO's properties persist beyond the life of any particular running application. More precisely, an Enterprise Object is an instance of a class that implements the com.webobjects.eocontrol.EOEnterpriseObject interface. An Enterprise Object has a corresponding model (called an EOModel) that defines the mapping between the class's object model and the database schema. However, an enterprise object doesn't explicitly know about its model. This level of abstraction means that database vendors can be switched without it affecting the developer's code. This gives Enterprise Objects a high degree of reusability. == EOF and Core Data == Despite their common origins, the two technologies diverged, with each technology retaining a subset of the features of the original Objective-C code base, while adding some new features. === Features Supported Only by EOF === EOF supports custom SQL; shared editing contexts; nested editing contexts; and pre-fetching and batch faulting of relationships, all features of the original Objective-C implementation not supported by Core Data. Core Data also does not provide the equivalent of an EOModelGroup—the NSManagedObjectModel class provides methods for merging models from existing models, and for retrieving merged models from bundles. === Features Supported Only by Core Data === Core Data supports fetched properties; multiple configurations within a managed object model; local stores; and store aggregation (the data for a given entity may be spread across multiple stores); customization and localization of property names and validation warnings; and the use of predicates for property validation. These features of the original Objective-C implementation are not supported by the Java implementation.

    Read more →
  • Algorithm IMED

    Algorithm IMED

    In multi-armed bandit problems, IMED (for Indexed Minimum Empirical Divergence) is an algorithm developed in 2015 by Junya Honda and Akimichi Takemura. It is the first algorithm proved to be asymptotically optimal respect to the problem-dependant Lai–Robbins lower bound for distributions in ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} . == Multi-armed bandit problem == The Multi-armed bandit problem is a sequential game where one player has to choose at each turn between K {\displaystyle K} actions (arms). Behind every arm a {\displaystyle a} there is an unknown distribution ν a {\displaystyle \nu _{a}} that lies in a set D {\displaystyle {\mathcal {D}}} known by the player (for example, D {\displaystyle {\mathcal {D}}} can be the set of Gaussian distributions or Bernoulli distributions). At each turn t {\displaystyle t} the player chooses (pulls) an arm a t {\displaystyle a_{t}} , he then gets an observation X t {\displaystyle X_{t}} of the distribution ν a t {\displaystyle \nu _{a_{t}}} . === Regret minimization === The goal is to minimize the regret at time T {\displaystyle T} that is defined as R T := ∑ a = 1 K Δ a E [ N a ( T ) ] {\displaystyle R_{T}:=\sum _{a=1}^{K}\Delta _{a}\mathbb {E} [N_{a}(T)]} where μ a := E [ ν a ] {\displaystyle \mu _{a}:=\mathbb {E} [\nu _{a}]} is the mean of arm a {\displaystyle a} μ ∗ := max a μ a {\displaystyle \mu ^{}:=\max _{a}\mu _{a}} is the highest mean Δ a := μ ∗ − μ a {\displaystyle \Delta _{a}:=\mu ^{}-\mu _{a}} N a ( t ) {\displaystyle N_{a}(t)} is the number of pulls of arm a {\displaystyle a} up to turn t {\displaystyle t} The player has to find an algorithm that chooses at each turn t {\displaystyle t} which arm to pull based on the previous actions and observations ( a s , X s ) s < t {\displaystyle (a_{s},X_{s})_{s μ } {\displaystyle {\mathcal {K}}_{inf}(\nu ,\mu ,{\mathcal {D}}):=\inf \left\{\mathrm {KL} (\nu ,{\tilde {\nu }})\ |\ {\tilde {\nu }}\in {\mathcal {P}}([-\infty ,1]),\ \mathbb {E} [{\tilde {\nu }}]>\mu \right\}} K L {\displaystyle \mathrm {KL} } is the Kullback–Leibler divergence P ( [ − ∞ , 1 ] ) {\displaystyle {\mathcal {P}}([-\infty ,1])} is the set of distribution in [ − ∞ , 1 ] {\displaystyle [-\infty ,1]} ν ^ a ( t ) {\displaystyle {\hat {\nu }}_{a}(t)} is the empirical distribution of arm a {\displaystyle a} at turn t {\displaystyle t} μ ^ ∗ ( t ) {\displaystyle {\hat {\mu }}^{}(t)} is the highest empirical mean of turn t {\displaystyle t} Remark : For arms a {\displaystyle a} that verify μ ^ a ( t ) = μ ^ ∗ ( t ) {\displaystyle {\hat {\mu }}_{a}(t)={\hat {\mu }}^{}(t)} we have K i n f ( ν ^ a ( t ) , μ ^ ∗ ( t ) ) = 0 {\displaystyle K_{inf}({\hat {\nu }}_{a}(t),{\hat {\mu }}^{}(t))=0} . Then there index is equal to ln ⁡ ( N a ( t ) ) {\displaystyle \ln(N_{a}(t))} === Pseudocode === for each arm i do: n[i] ← 1; nu[i] ← None; mu[i] ← None for t from 1 to K do: select arm t observe reward r n[t] ← n[t] + 1 nu[t] ← update empirical distribution mu[t] ← update empirical mean for t from K+1 to T do: mu ← highest mu for each arm i do: scoreK[i] ← n[i] K_inf(nu[i],mu) scoreN[i] ← ln(n[i]) index[i] ← scoreK[i] + scoreN[i] select arm a with smallest index[a] observe reward r n[a] ← n[a] + 1 nu[a] ← update empirical distribution mu[a] ← update empirical mean == Theoretical results == In the multi-armed bandit problem we have the asymptotic Lai–Robbins lower bound asymptotic lower bound on regret. The algorithm IMED is the first algorithm that matches this lower bound for distribution in ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} in the first order. If the distribution are also bounded then it also match the second order. It is the first algorithm that match the second under of this lower bound. === Lai–Robbins lower bound === In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order It states that for every consistent algorithm on the set P ( [ − ∞ , 1 ] ) {\displaystyle {\mathcal {P}}([-\infty ,1])} — that is, an algorithm for which, for every ( ν 1 , … , ν K ) ∈ P ( [ − ∞ , 1 ] ) K {\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}} , the regret R T {\displaystyle R_{T}} is subpolynomial (i.e. R T = o T → + ∞ ( T α ) {\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })} for all α > 0 {\displaystyle \alpha >0} ) — we have: R T ≥ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T − Ω T → + ∞ ( ln ⁡ ln ⁡ T ) . {\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).} This bound is asymptotic (as T → + ∞ {\displaystyle T\to +\infty } ) and gives a first-order lower bound of order ln ⁡ T {\displaystyle \ln T} with the optimal constant in front of it and the second order in − Ω ( ln ⁡ ln ⁡ T ) {\displaystyle -\Omega (\ln \ln T)} . === Regret bound for IMED === If the distribution of every arm a {\displaystyle a} is ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} ( i.e. ν a ∈ P ( [ − ∞ , 1 ] ) ) {\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))} then the regret of the algorithm IMED verify R T ≤ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T + O ( 1 ) {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T+O(1)} If all the distribution ν a {\displaystyle \nu _{a}} are bounded then it exists a constant C > 0 {\displaystyle C>0} such that for T {\displaystyle T} large enough the regret of IMED is upper bounded by R T ≤ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T − C ln ⁡ ln ⁡ T {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T-C\ln \ln T} == Computation time == The algorithm only requiere to compute the K i n f {\displaystyle K_{inf}} for suboptimal arms who are pulled O ( ln ⁡ T ) {\displaystyle O(\ln T)} times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the K i n f {\displaystyle K_{inf}} in the first order .

    Read more →