AI Video Tools

Explore the best AI Video Tools — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • List of Fortran software and tools

    List of Fortran software and tools

    This is a list of Fortran software and tools, including IDEs, compilers, libraries, debugging tools, numerical and scientific computing tools, and related projects. == Fortran compilers == Absoft Pro Fortran — Absoft Pro Fortran is discontinued and ran on Linux and macOS AOCC — from AMD Classic Flang — part of the LLVM Project LLVM Flang — part of the LLVM Project Fortran 77 — Fortran 77 was developed by Digital Equipment Corporation, it is discontinued. G95 – portable open-source Fortran 95 compiler GCC (GNU Fortran) PGI compilers – NVIDIA developed compilers after acquiring The Portland Group IBM XL Fortran — IBM XL Fortran is current and runs on Linux (Power/AIX) and integrates with Eclipse Intel Fortran Compiler – part of Intel OneAPI HPC toolkit LFortran — LFortran is current, cross-platform, and has IDE support. MinGW – cross compiler and forked into Mingw-w64 nAG Fortran Compiler - from nAG Open64 — Open64 is an open-source compiler that has been terminated and ran on Linux Open Watcom — Open Watcom is current, runs on MS-DOS and OS/2, and has IDE support. Oracle Fortran — Oracle Fortran is discontinued, ran on Linux and Solaris. ROSE — source-to-source compiler framework developed at Lawrence Livermore National Laboratory Silverfrost FTN95 — FTN95 from Silverfrost is current, runs on Windows, and has IDE support. == Integrated development environments (IDEs) and editors == Code::Blocks — supports Fortran with plugins Eclipse IDE — with Fortran support via Photran Emacs — extensible text editor with built-in Fortran modes and support for modern tooling via language servers Geany — lightweight cross-platform IDE based on GTK IntelliJ IDEA — cross-platform IDE by JetBrains with Fortran pluggin KDevelop — KDE-based IDE NetBeans — Apache software foundation IDE with Fortran configuration OpenWatcom — IDE and compiler suite for C, C++, and Fortran Simply Fortran — standalone Fortran IDE for Windows, Linux, and macOS Vim — modal text editor with native Fortran syntax support and extensive plugin-based development features Visual Studio — with Intel Fortran integration Visual Studio Code — supports Fortran via extensions == Mathematical libraries == == Scientific libraries == ABINIT — software suite to calculate optical, mechanical, vibrational, and other observable properties of materials Cantera — chemical kinetics, thermodynamics, and transport tool suite CERN Program Library — collection of Fortran libraries for physics applications from CERN CP2K — quantum chemistry and solid-state physics software package for atomistic simulations Dalton — molecular electronic structure program FFTPACK — subroutines for the fast Fourier transform Kinetic PreProcessor – open-source software tool used in atmospheric chemistry MESA — Modules for Experiments in Stellar Astrophysics Nek5000 — MPI parallel higher-order spectral element CFD solver NWChem — open-source high-performance computational chemistry software Octopus — real-space Time-Dependent Density Functional Theory code MODTRAN – model atmospheric propagation of electromagnetic radiation MOLCAS — quantum chemistry software package for multiconfigurational electronic structure calculations NOVAS – software library for astrometry-related numerical computations Physics Analysis Workstation – data analysis and graphical presentation in high-energy physics Quantum ESPRESSO — integrated suite for electronic-structure calculations and materials modeling SIESTA — first-principles materials simulation code using density functional theory Tinker — software tools for molecular design == Debugging and performance tools == GDB — GNU Debugger with Fortran support Valgrind — memory debugging and profiling tool VTune Profiler — performance analysis tool Allinea Forge — debugger and profiler for HPC applications == Build and package management == Autotools — build system supporting Fortran projects CMake — cross-platform build system supporting Fortran Make — build automation tool Spack — package manager for HPC software including Fortran libraries == Machine learning and AI libraries == Athena Fiats (Functional Inference And Training for Surrogates) FNN (Fortran Neural Network) FortNN Fortran-TF-lib (Fortran interface to TensorFlow) FTorch (Fortran interface to PyTorch) MlFortran RoseNNa == Parallel and high-performance computing tools == MPI Fortran bindings — standard interface for distributed-memory parallelism OpenMP — shared-memory parallel programming support through compiler directives Coarray Fortran — parallel programming model introduced in Fortran 2008 ScaLAPACK — parallel linear algebra package built on top of LAPACK == Testing frameworks == FUnit — open-source unit testing framework developed at NASA’s Langley Research Center, for Fortran 90, 95, and 2003. pFUnit — unit testing framework for Fortran, modeled after JUnit == Documentation and code analysis tools == FORD — automatic documentation generator for modern Fortran projects SQuORE — software quality and management platform with code analysis support Understand — static analysis and code comprehension tool for large Fortran projects

    Read more →
  • Artificial intelligence industry in Italy

    Artificial intelligence industry in Italy

    The artificial intelligence industry in Italy is growing and supports industrial development. In 2024 it reached a new record, reaching 1.2 billion euros with a growth of +58% compared to 2023. While in 2025, the growth of artificial intelligence in the industrial application was even greater than in 2024 both in terms of value and application to industrial sectors. == History == The roots of AI research in Italy extend back to the 1970s, when Italian scholars began exploring automated reasoning, programming language semantics, and pattern recognition. Researchers such as those involved in early projects at the National Research Council and various universities laid the groundwork for subsequent academic and industrial developments in the field. During this period, the focus was predominantly on developing algorithms for automated theorem proving and building systems to reason about complex mathematical problems. This era witnessed the birth of methodologies that would later influence numerous AI subfields, from natural language processing (NLP) to robotics. === Institutional milestones and academic contributions === A turning point in the Italian AI landscape was the formation of the Italian Association for Artificial Intelligence (AIxIA) in 1988. Founded by academics, including Luigia Carlucci Aiello, the association established a platform for collaboration between universities, research centers, and industry. Led by Aiello, AIIA played a role in promoting research, organizing national conferences, and fostering international partnerships that connected Italy's AI community to global networks. At the same time, professors such as Roberto Navigli and numerous practitioners contributed to the advancement of AI in Italy. Navigli has worked in multilingual NLP, including the creation of BabelNet, and led the Minerva project. === Industrial AI === Over recent decades, numerous national and European initiatives supported by funding from programs such as the National Recovery and Resilience Plan (PNRR) have spurred the transition from theoretical research to practical applications. Industrial sectors including manufacturing, banking, and healthcare increasingly embraced AI-driven automation, while research institutions collaborated with industrial partners to deploy cutting-edge solutions. In recent years, Italy has also seen the establishment of specialized research centers and institutes aimed at bridging the gap between academic innovation and industrial application. These initiatives indicate a broader national commitment to integrating AI into the fabric of Italian industry. == Recent developments == === Emergence of generative AI === A landmark in Italy's modern AI evolution is the development of Minerva AI. Developed by the Sapienza NLP research group at Sapienza University of Rome and led by Professor Roberto Navigli, Minerva represents the first family of large language models (LLMs) trained from scratch with a primary focus on the Italian language. ==== Minerva 7B ==== The latest iteration, Minerva 7B, has 7 billion parameters and has been trained on an extensive corpus of over 1.5 trillion words. By using advanced instruction tuning techniques, Minerva 7B is able to produce highly accurate, coherent, and contextually sensitive responses addressing common issues such as hallucinations and inappropriate content generation. This breakthrough sets a benchmark for transparent, open-source AI development in the country. Minerva's development, carried out within the FAIR (Future Artificial Intelligence Research) project in collaboration with CINECA and supported by supercomputing resources like the Leonardo (supercomputer), aligns closely with Italy's cultural and linguistic heritage. === Establishment of AI4I === The recent establishment of the Istituto Italiano per l’Intelligenza Artificiale (AI4I) is part of Italy's strategy to improve its industrial competitiveness in AI. This dedicated institute aims to bridge the gap between research institutions and industrial enterprises; promote training and R&D support to nurture the next generation of Italian AI experts; and enhance national competitiveness. This initiative is expected to serve as a hub for applied AI research, driving innovations that are tailored to the specific needs of Italian industry and public administration. === Benefits of InvestAI === Italy's AI industry stands to benefit from the European InvestAI initiative, a plan unveiled at the recent AI Action Summit in Paris. InvestAI is an effort by the European Commission to mobilize €200 billion for AI investments, with a dedicated €20 billion fund earmarked for building AI gigafactories. These gigafactories are planned as large-scale hubs for training advanced, complex AI models using approximately 100,000 last-generation AI chips. For Italy, this investment presents several major opportunities: Access to State-of-the-Art Infrastructure: Italian companies, research institutions, and start-ups can leverage the gigafactories’ immense computational resources, enabling them to train highly sophisticated language models and other AI systems. Enhanced Competitiveness and Collaboration: With InvestAI's layered funding model where EU funds help de-risk private investments Italian firms can access capital more readily. This will bolster public–private partnerships and create a more dynamic AI ecosystem that spans from academic research to industrial applications. Alignment with National and Regional Initiatives: The Istituto Italiano per l’Intelligenza Artificiale (AI4I), based in Turin, is already recognized as a strategic asset by both Italy and the European Union. As the main recipient of InvestAI funds in Italy, AI4I will play a pivotal role in implementing these investments locally, fostering innovation in sectors like manufacturing, healthcare and aerospace. Commission President Ursula von der Leyen emphasized that InvestAI is designed to democratize AI innovation throughout Europe by ensuring that even smaller companies have access to high-performance computing power. For Italy, this means not only keeping pace with global leaders but also harnessing European-scale investments to transform its AI industry and drive economic growth.

    Read more →
  • Automatic image annotation

    Automatic image annotation

    Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database. This method can be regarded as a type of multi-class image classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images. The first methods learned the correlations between image features and training annotations. Subsequently, techniques were developed using machine translation to attempt to translate the textual vocabulary into the 'visual vocabulary,' represented by clustered regions known as blobs. Subsequent work has included classification approaches, relevance models, and other related methods. The advantages of automatic image annotation versus content-based image retrieval (CBIR) are that queries can be more naturally specified by the user. At present, Content-Based Image Retrieval (CBIR) generally requires users to search by image concepts such as color and texture or by finding example queries. However, certain image features in example images may override the concept that the user is truly focusing on. Traditional methods of image retrieval, such as those used by libraries, have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly growing image databases in existence.

    Read more →
  • Ontology-based data integration

    Ontology-based data integration

    Ontology-based data integration involves the use of one or more ontologies to effectively combine data or information from multiple heterogeneous sources. It is one of the multiple data integration approaches and may be classified as Global-As-View (GAV). The effectiveness of ontology‑based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process. == Background == Data from multiple sources are characterized by multiple types of heterogeneity. The following hierarchy is often used: Syntactic heterogeneity: is a result of differences in representation format of data Schematic or structural heterogeneity: the native model or structure to store data differ in data sources leading to structural heterogeneity. Schematic heterogeneity that particularly appears in structured databases is also an aspect of structural heterogeneity. Semantic heterogeneity: differences in interpretation of the 'meaning' of data are source of semantic heterogeneity System heterogeneity: use of different operating system, hardware platforms lead to system heterogeneity Ontologies, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics and biomedicine, the rapid development, adoption and public availability of ontologies [1] Archived 2007-06-16 at the Wayback Machine has made it possible for the data integration community to leverage them for semantic integration of data and information. == The role of ontologies == Ontologies enable the unambiguous identification of entities in heterogeneous information systems and assertion of applicable named relationships that connect these entities together. Specifically, ontologies play the following roles: Content Explication The ontology enables accurate interpretation of data from multiple sources through the explicit definition of terms and relationships in the ontology. Query Model In some systems like SIMS, the query is formulated using the ontology as a global query schema. Verification The ontology verifies the mappings used to integrate data from multiple sources. These mappings may either be user specified or generated by a system. == Approaches using ontologies for data integration == There are three main architectures that are implemented in ontology‑based data integration applications, namely, Single ontology approach A single ontology is used as a global reference model in the system. This is the simplest approach as it can be simulated by other approaches. SIMS is a prominent example of this approach. The Structured Knowledge Source Integration component of Research Cyc is another prominent example of this approach. (Title = Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries). The Gellish Taxonomic Dictionary-Ontology follows this approach as well. Multiple ontologies Multiple ontologies, each modeling an individual data source, are used in combination for integration. Though, this approach is more flexible than the single ontology approach, it requires creation of mappings between the multiple ontologies. Ontology mapping is a challenging issue and is focus of large number of research efforts in computer science [2]. The OBSERVER system is an example of this approach. Hybrid approaches The hybrid approach involves the use of multiple ontologies that subscribe to a common, top-level vocabulary. The top-level vocabulary defines the basic terms of the domain. Thus, the hybrid approach makes it easier to use multiple ontologies for integration in presence of the common vocabulary.

    Read more →
  • Sprite (computer graphics)

    Sprite (computer graphics)

    In computer graphics, a sprite is a two-dimensional bitmap that is integrated into a larger scene, most often in a 2D video game. Originally, the term sprite referred to fixed-sized objects composited together, by hardware, with a background. Use of the term has since become more general. Systems with hardware sprites include arcade video games of the 1970s and 1980s; game consoles including as the Atari VCS (1977), ColecoVision (1982), Famicom (1983), Genesis/Mega Drive (1988); and home computers such as the TI-99/4 (1979), Atari 8-bit computers (1979), Commodore 64 (1982), MSX (1983), Amiga (1985), and X68000 (1987). Hardware varies in the number of sprites supported, the size and colors of each sprite, and special effects such as scaling or reporting pixel-precise overlap. Hardware composition of sprites occurs as each scan line is prepared for the video output device, such as a cathode-ray tube, without involvement of the main CPU and without the need for a full-screen frame buffer. Sprites can be positioned or altered by setting attributes used during the hardware composition process. The number of sprites which can be displayed per scan line is often lower than the total number of sprites a system supports. For example, the Texas Instruments TMS9918 chip supports 32 sprites, but only four can appear on the same scan line. The CPUs in modern computers, video game consoles, and mobile devices are fast enough that bitmaps can be drawn into a frame buffer without special hardware assistance. Beyond that, GPUs can render vast numbers of scaled, rotated, anti-aliased, partially translucent, very high resolution images in parallel with the CPU. == Etymology == According to Karl Guttag, one of two engineers for the 1979 Texas Instruments TMS9918 video display processor, this use of the word sprite came from David Ackley, a manager at TI. It was also used by Danny Hillis at Texas Instruments in the late 1970s. The term was derived from the fact that sprites "float" on top of the background image without overwriting it, much like a ghost or mythological sprite. Some hardware manufacturers used different terms, especially before sprite became common: Player/Missile Graphics was a term used by Atari, Inc. for hardware sprites in the Atari 8-bit computers (1979) and Atari 5200 console (1982). The term reflects the use for both characters ("players") and smaller associated objects ("missiles") that share the same color. The earlier Atari Video Computer System and some Atari arcade games used player, missile, and ball. Stamp was used in some arcade hardware in the early 1980s, including Ms. Pac-Man. Movable Object Block, or MOB, was used in MOS Technology's graphics chip literature. Commodore, the main user of MOS chips and the owner of MOS for most of the chip maker's lifetime, instead used the term sprite for the Commodore 64. OBJs (short for objects) is used in the developer manuals for the NES, Super NES, and Game Boy. The region of video RAM used to store sprite attributes and coordinates is called OAM (Object Attribute Memory). This also applies to the Game Boy Advance and Nintendo DS. == History == === Arcade video games === The use of sprites originated with arcade video games. Nolan Bushnell came up with the original concept when he developed the first arcade video game, Computer Space (1971). Technical limitations made it difficult to adapt the early mainframe game Spacewar! (1962), which performed an entire screen refresh for every little movement, so he came up with a solution to the problem: controlling each individual game element with a dedicated transistor. The rockets were essentially hardwired bitmaps that moved around the screen independently of the background, an important innovation for producing screen images more efficiently and providing the basis for sprite graphics. The earliest video games to represent player characters as human player sprites were arcade sports video games, beginning with Taito's TV Basketball, released in April 1974 and licensed to Midway Manufacturing for release in North America. Designed by Tomohiro Nishikado, he wanted to move beyond simple Pong-style rectangles to character graphics, by rearranging the rectangle shapes into objects that look like basketball players and basketball hoops. Ramtek released another sports video game in October 1974, Baseball, which similarly displayed human-like characters. The Namco Galaxian arcade system board, for the 1979 arcade game Galaxian, displays animated, multi-colored sprites over a scrolling background. It became the basis for Nintendo's Radar Scope and Donkey Kong arcade hardware and home consoles such as the Nintendo Entertainment System. According to Steve Golson from General Computer Corporation, the term "stamp" was used instead of "sprite" at the time. === Home systems === Signetics devised the first chips capable of generating sprite graphics (referred to as objects by Signetics) for home systems. The Signetics 2636 video processors were first used in the 1978 1292 Advanced Programmable Video System and later in the 1979 Elektor TV Games Computer. The Atari VCS, released in 1977, has a hardware sprite implementation where five graphical objects can be moved independently of the game playfield. The term sprite was not in use at the time. The VCS's sprites are called movable objects in the programming manual, further identified as two players, two missiles, and one ball. These each consist of a single row of pixels that are displayed on a scan line. To produce a two-dimensional shape, the sprite's single-row bitmap is altered by software from one scan line to the next. The 1979 Atari 400 and 800 home computers have similar, but more elaborate, circuitry capable of moving eight single-color objects per scan line: four 8-bit wide players and four 2-bit wide missiles. Each is the full height of the display—a long, thin strip. DMA from a table in memory automatically sets the graphics pattern registers for each scan line. Hardware registers control the horizontal position of each player and missile. Vertical motion is achieved by moving the bitmap data within a player or missile's strip. The feature was called player/missile graphics by Atari. Texas Instruments developed the TMS9918 chip with sprite support for its 1979 TI-99/4 home computer. An updated version is used in the 1981 TI-99/4A. === In 2.5D and 3D games === Sprites remained popular with the rise of 2.5D games (those which recreate a 3D game space from a 2D map) in the late 1980s and early 1990s. A technique called billboarding allows 2.5D games to keep onscreen sprites rotated toward the player view at all times. Some 2.5D games, such as 1993's Doom, allow the same entity to be represented by different sprites depending on its rotation relative to the viewer, furthering the illusion of 3D. Fully 3D games usually present world objects as 3D models, but sprites are supported in some 3D game engines, such as GoldSrc and Unreal, and may be billboarded or locked to fixed orientations. Sprites remain useful for small details, particle effects, and other applications where the lack of a third dimension is not a major detriment. == Systems with hardware sprites == These are base hardware specs and do not include additional programming techniques, such as using raster interrupts to repurpose sprites mid-frame.

    Read more →
  • SQL/PSM

    SQL/PSM

    SQL/PSM (SQL/Persistent Stored Modules) is an ISO standard mainly defining an extension of SQL with a procedural language for use in stored procedures. Initially published in 1996 as an extension of SQL-92 (ISO/IEC 9075-4:1996, a version sometimes called PSM-96 or even SQL-92/PSM), SQL/PSM was later incorporated into the multi-part SQL:1999 standard, and has been part 4 of that standard since then, most recently in SQL:2023. The SQL:1999 part 4 covered less than the original PSM-96 because the SQL statements for defining, managing, and invoking routines were actually incorporated into part 2 SQL/Foundation, leaving only the procedural language itself as SQL/PSM. The SQL/PSM facilities are still optional as far as the SQL standard is concerned; most of them are grouped in Features P001-P008. SQL/PSM standardizes syntax and semantics for control flow, exception handling (called "condition handling" in SQL/PSM), local variables, assignment of expressions to variables and parameters, and (procedural) use of cursors. It also defines an information schema (metadata) for stored procedures. SQL/PSM is one language in which methods for the SQL:1999 structured types can be defined. The other is Java, via SQL/JRT. SQL/PSM is derived, seemingly directly, from Oracle's PL/SQL. Oracle developed PL/SQL and released it in 1991, basing the language on the US Department of Defense's Ada programming language. However, Oracle has maintained a distance from the standard in its documentation. IBM's SQL PL (used in DB2) and Mimer SQL's PSM were the first two products officially implementing SQL/PSM. It is commonly thought that these two languages, and perhaps also MySQL/MariaDB's procedural language, are closest to the SQL/PSM standard. However, a PostgreSQL addon implements SQL/PSM (alongside its other procedural languages like the PL/SQL-derived plpgsql), although it is not part of the core product. RDF functionality in OpenLink Virtuoso was developed entirely through SQL/PSM, combined with custom datatypes (e.g., ANY for handling URI and Literal relation objects), sophisticated indexing, and flexible physical storage choices (column-wise or row-wise).

    Read more →
  • Source criticism

    Source criticism

    Source criticism (or information evaluation) is the process of evaluating an information source, i.e.: a document, a person, a speech, a fingerprint, a photo, an observation, or anything used in order to obtain knowledge. In relation to a given purpose, a given information source may be more or less valid, reliable or relevant. Broadly, "source criticism" is the interdisciplinary study of how information sources are evaluated for given tasks. == Meaning == Problems in translation: The Danish word kildekritik, like the Norwegian word kildekritikk and the Swedish word källkritik, derived from the German Quellenkritik and is closely associated with the German historian Leopold von Ranke (1795–1886). Historian Wolfgang Hardtwig wrote: His [Ranke's] first work Geschichte der romanischen und germanischen Völker von 1494–1514 (History of the Latin and Teutonic Nations from 1494 to 1514) (1824) was a great success. It already showed some of the basic characteristics of his conception of Europe, and was of historiographical importance particularly because Ranke made an exemplary critical analysis of his sources in a separate volume, Zur Kritik neuerer Geschichtsschreiber (On the Critical Methods of Recent Historians). In this work he raised the method of textual criticism used in the late eighteenth century, particularly in classical philology to the standard method of scientific historical writing. (Hardtwig, 2001, p. 12739) Historical theorist Chris Lorenz wrote: The larger part of the nineteenth and twentieth centuries would be dominated by the research-oriented conception of historical method of the so-called Historical School in Germany, led by historians as Leopold Ranke and Berthold Niebuhr. Their conception of history, long been regarded as the beginning of modern, 'scientific' history, harked back to the 'narrow' conception of historical method, limiting the methodical character of history to source criticism. (Lorenz, 2001) In the early 21st century, source criticism is a growing field in, among other fields, library and information science. In this context source criticism is studied from a broader perspective than just, for example, history, classical philology, or biblical studies (but there, too, it has more recently received new attention). == Principles == The following principles are from two Scandinavian textbooks on source criticism, written by the historians Olden-Jørgensen (1998) and Thurén (1997): Human sources may be relics (e.g. a fingerprint) or narratives (e.g. a statement or a letter). Relics are more credible sources than narratives. A given source may be forged or corrupted; strong indications of the originality of the source increases its reliability. The closer a source is to the event which it purports to describe, the more one can trust it to give an accurate description of what really happened A primary source is more reliable than a secondary source, which in turn is more reliable than a tertiary source and so on. If a number of independent sources contain the same message, the credibility of the message is strongly increased. The tendency of a source is its motivation for providing some kind of bias. Tendencies should be minimized or supplemented with opposite motivations. If it can be demonstrated that the witness (or source) has no direct interest in creating bias, the credibility of the message is increased. Two other principles are: Knowledge of source criticism cannot substitute for subject knowledge: "Because each source teaches you more and more about your subject, you will be able to judge with ever-increasing precision the usefulness and value of any prospective source. In other words, the more you know about the subject, the more precisely you can identify what you must still find out". (Bazerman, 1995, p. 304). The reliability of a given source is relative to the questions put to it. "The empirical case study showed that most people find it difficult to assess questions of cognitive authority and media credibility in a general sense, for example, by comparing the overall credibility of newspapers and the Internet. Thus these assessments tend to be situationally sensitive. Newspapers, television and the Internet were frequently used as sources of orienting information, but their credibility varied depending on the actual topic at hand" (Savolainen, 2007). The following questions are often good ones to ask about any source according to the American Library Association (1994) and Engeldinger (1988): How was the source located? What type of source is it? Who is the author and what are the qualifications of the author in regard to the topic that is discussed? When was the information published? In which country was it published? What is the reputation of the publisher? Does the source show a particular cultural or political bias? For literary sources complementing criteria are: Does the source contain a bibliography? Has the material been reviewed by a group of peers, or has it been edited? How does the article/book compare with similar articles/books? == Levels of generality == Some principles of source criticism are universal, other principles are specific for certain kinds of information sources. There is today no consensus about the similarities and differences between source criticism in the natural science and humanities. Logical positivism claimed that all fields of knowledge were based on the same principles. Much of the criticism of logical positivism claimed that positivism is the basis of the sciences, whereas hermeneutics is the basis of the humanities. This was, for example, the position of Jürgen Habermas. A newer position, in accordance with, among others, Hans-Georg Gadamer and Thomas Kuhn, understands both science and humanities as determined by researchers' preunderstanding and paradigms. Hermeneutics is thus a universal theory. The difference is, however, that the sources of the humanities are themselves products of human interests and preunderstanding, whereas the sources of the natural sciences are not. Humanities are thus "doubly hermeneutic". Natural scientists, however, are also using human products (such as scientific papers) which are products of preunderstanding (and can lead to, for example, academic fraud). == Contributing fields == === Epistemology === Epistemological theories are the basic theories about how knowledge is obtained and are thus the most general theories about how to evaluate information sources. Empiricism evaluates sources by considering the observations (or sensations) on which they are based. Sources without basis in experience are not seen as valid. Rationalism provides low priority to sources based on observations. In order to be meaningful, observations must be explained by clear ideas or concepts. It is the logical structure and the well definedness that is in focus in evaluating information sources from the rationalist point of view. Historicism evaluates information sources on the basis of their reflection of their sociocultural context and their theoretical development. Pragmatism evaluate sources on the basis of how their values and usefulness to accomplish certain outcomes. Pragmatism is skeptical about claimed neutral information sources. The evaluation of knowledge or information sources cannot be more certain than is the construction of knowledge. If one accepts the principle of fallibilism then one also has to accept that source criticism can never 100% verify knowledge claims. As discussed in the next section, source criticism is intimately linked to scientific methods. The presence of fallacies of argument in sources is another kind of philosophical criterion for evaluating sources. Fallacies are presented by Walton (1998). Among the fallacies are the ad hominem fallacy (the use of personal attack to try to undermine or refute a person's argument) and the straw man fallacy (when one arguer misrepresents another's position to make it appear less plausible than it really is, in order more easily to criticize or refute it.) === Research methodology === Research methods are methods used to produce scholarly knowledge. The methods that are relevant for producing knowledge are also relevant for evaluating knowledge. An example of a book that turns methodology upside-down and uses it to evaluate produced knowledge is Katzer; Cook & Crouch (1998). === Science studies === Studies of quality evaluation processes such as peer review, book reviews and of the normative criteria used in evaluation of scientific and scholarly research. Another field is the study of scientific misconduct. Harris (1979) provides a case study of how a famous experiment in psychology, Little Albert, has been distorted throughout the history of psychology, starting with the author (Watson) himself, general textbook authors, behavior therapists, and a prominent learning theorist. Harris proposes possible causes for these distortions and analyzes the Albert study as an ex

    Read more →
  • Emotion-sensitive software

    Emotion-sensitive software

    Emotion-sensitive software (ESS) is software specifically designed to target and monitor emotional response in a human being. Some software measures anger by comparing the pitch of a voice to a regular, or calm, pitch. Another approach is the measurement of physical appearance. If a camera or similar recording device picks up a certain amount of red pigmentation in the skin the system can be alerted that this person is angered. The competitive landscape in the Electronic Surveillance Software (ESS) industry is marked by a high level of secrecy regarding the operational details of these software systems. Many producers deliberately withhold information about the inner workings of their ESS products, a strategy that serves dual purposes: firstly, it intensifies competition among companies in the sector, as each strives to maintain a unique edge without revealing trade secrets that could be leveraged by competitors; secondly, this secrecy acts as a deterrent against individuals or entities who might try to circumvent the surveillance mechanisms. One application of ESS was developed by University of Notre Dame Assistant Professor of Psychology Sidney D'Mello, Art Graesser from the University of Memphis and a colleague from Massachusetts Institute of Technology. They used the technology to create an electronic tutor that could assess a student's level of boredom and frustration based on facial expression and body language, and react accordingly.

    Read more →
  • Rclone

    Rclone

    Rclone is an open source, multi threaded, command line computer program to manage or migrate content on cloud and other high latency storage. Its capabilities include sync, transfer, crypt, cache, union, compress and mount. The rclone website lists supported backends including S3 and Google Drive. Descriptions of rclone often carry the strapline "Rclone syncs your files to cloud storage". Those prior to 2020 include the alternative "Rsync for Cloud Storage". Rclone is well known for its rclone sync and rclone mount commands. It provides further management functions analogous to those ordinarily used for files on local disks, but which tolerate some intermittent and unreliable service. Rclone is commonly used with media servers such as Plex, Emby or Jellyfin to stream content direct from consumer file storage services. Official Ubuntu, Debian, Fedora, Gentoo, Arch, Brew, Chocolatey, and other package managers include rclone. == History == Nick Craig-Wood was inspired by rsync. Concerns about the noise and power costs arising from home computer servers prompted him to embrace cloud storage and he began developing rclone as open source software in 2012 under the name swiftsync. Rclone was promoted to stable version 1.00 in July 2014. In May 2017, Amazon Drive barred new users of rclone and other upload utilities, citing security concerns. Amazon Drive had been advertised as offering unlimited storage for £55 per year. Amazon's AWS S3 service continues to support new rclone users. The original rclone logo was updated in September 2018. In March 2020, Nick Craig-Wood resigned from Memset Ltd, a cloud hosting company he founded, to focus on open source software. Amazon's AWS April 2020 public sector blog explained how the Fred Hutch Cancer Research Center were using rclone in their Motuz tool to migrate very large biomedical research datasets in and out of AWS S3 object stores. In November 2020, rclone was updated to correct a weakness in the way it generated passwords. Passwords for encrypted remotes can be generated randomly by rclone or supplied by the user. In all versions of rclone from 1.49.0 to 1.53.2 the seed value for generated passwords was based on the number of seconds elapsed in the day, and therefore not truly random. CVE-2020-28924 recommended users upgrade to the latest version of rclone and check the passwords protecting their encrypted remotes. Release 1.55 of rclone in March 2021 included features sponsored by CERN and their CS3MESH4EOSC project. The work was EU funded to promote vendor-neutral application programming interfaces and protocols for synchronisation and sharing of academic data on cloud storage. == Backends and commands == Rclone supports the following services as backends. There are others, built on standard protocols such as WebDAV or S3, that work. WebDAV backends do not support rclone functionality dependent on server side checksum or modtime. Remotes are usually defined interactively from these backends, local disk, or memory (as S3), with rclone config. Rclone can further wrap those remotes with one or more of alias, chunk, compress, crypt or union, remotes. Once defined, the remotes are referenced by other rclone commands interchangeably with the local drive. Remote names are followed by a colon to distinguish them from local drives. For example, a remote example_remote containing a folder, or pseudofolder, myfolder is referred to within a command as a path example_remote:/myfolder. Rclone commands directly apply to remotes, or mount them for file access or streaming. With appropriate cache options the mount can be addressed as if a conventional, block level disk. Commands are provided to serve remotes over SFTP, HTTP, WebDAV, FTP and DLNA. Commands can have sub-commands and flags. Filters determine which files on a remote that rclone commands are applied to. rclone rc passes commands or new parameters to existing rclone sessions and has an experimental web browser interface. === Crypt remotes === Rclone's crypt implements encryption of files at rest in cloud storage. It layers an encrypted remote over a pre-existing, cloud or other remote. Crypt is commonly used to encrypt / decrypt media, for streaming, on consumer storage services such as Google Drive. Rclone's configuration file contains the crypt password. The password can be lightly obfuscated, or the whole rclone.conf file can be encrypted. Crypt can either encrypt file content and name, or additionally full paths. In the latter case there is a potential clash with encryption for cloud backends, such as Microsoft OneDrive, having limited path lengths. Crypt remotes do not encrypt object modification time or size. The encryption mechanism for content, name and path is available, for scrutiny, on the rclone website. Key derivation is with scrypt. === Example syntax (Linux) === These examples describe paths and file names but object keys behave similarly. To recursively copy files from directory remote_stuff, at the remote xmpl, to directory stuff in the home folder:- -v enables logging and -P, progress information. By default rclone checks the file integrity (hash) after copy; can retry each file up to three times if the operation is interrupted; uses up to four parallel transfer threads, and does not apply bandwidth throttling. Running the above command again copies any new or changed files at the remote to the local folder but, like default rsync behaviour, will not delete from the local directory, files which have been removed from the remote. To additionally delete files from the local folder which have been removed from the remote - more like the behaviour of rsync with a --delete flag:- And to delete files from the source after they have been transferred to the local directory - more like the behaviour of rsync with a --remove-source-file flag:- To mount the remote directory at a mountpoint in the pre-existing, empty stuff directory in the home directory (the ampersand at the end makes the mount command run as a background process):- Default rclone syntax can be modified. Alternative transfer, filter, conflict and backend specific flags are available. Performance choices include number of concurrent transfer threads; chunk size; bandwidth limit profiling, and cache aggression. == Academic evaluation == In 2018, University of Kentucky researchers published a conference paper comparing use of rclone and other command line, cloud data transfer agents for big data. The paper was published as a result of funding by the National Science Foundation. Later that year, University of Utah's Center for High Performance Computing examined the impact of rclone options on data transfer rates. == Rclone use at HPC research sites == Examples are University of Maryland, Iowa State University, Trinity College Dublin, NYU, BYU, Indiana University, CSC Finland, Utrecht University, University of Nebraska, University of Utah, North Carolina State University, Stony Brook, Tulane University, Washington State University, Georgia Tech, National Institutes of Health, Wharton, Yale, Harvard, Minnesota, Michigan State, Case Western Reserve University, University of South Dakota, Northern Arizona University, University of Pennsylvania, Stanford, University of Southern California, UC Santa Barbara, UC Irvine, UC Berkeley, and SURFnet. == Rclone and cybercrime == May 2020 reports stated rclone had been used by hackers to exploit Diebold Nixdorf ATMs with ProLock ransomware. The FBI issued a Flash Alert MI-000125-MW on May 4, 2020, in relation to the compromise. They issued a further, related alert 20200901–001 in September 2020. Attackers had exfiltrated / encrypted data from organisations involved in healthcare, construction, finance, and legal services. Multiple US government agencies, and industrial entities were affected. Researchers established the hackers spent about a month exploring the breached networks, using rclone to archive stolen data to cloud storage, before encrypting the target system. Reported targets included LaSalle County, and the city of Novi Sad. The FBI warned January 2021, in Private Industry Notification 20210106–001, of extortion activity using Egregor ransomware and rclone. Organisations worldwide had been threatened with public release of exfiltrated data. In some cases rclone had been disguised under the name svchost. Bookseller Barnes & Noble, US retailer Kmart, games developer Ubisoft and the Vancouver metro system have been reported as victims. An April 2021, cybersecurity investigation into SonicWall VPN zero-day vulnerability SNWLID-2021-0001 by FireEye's Mandiant team established attackers UNC2447 used rclone for reconnaissance and exfiltration of victims' files. Cybersecurity and Infrastructure Security Agency Analysis Report AR21-126A confirmed this use of rclone in FiveHands ransomware attacks. A June 2021, Microsoft Security Intelligence Twitter post identified use of rclone in BazaCall cyber attacks. The attackers sent emails e

    Read more →
  • Online public access catalog

    Online public access catalog

    The online public access catalog (OPAC), now frequently synonymous with library catalog, is an online database of materials held by a library or group of libraries. Online catalogs have largely replaced the analog card catalogs previously used in libraries. == History == === Early online === Although a handful of experimental systems existed as early as the 1960s, the first large-scale online catalogs were developed at Ohio State University in 1975 and the Dallas Public Library in 1978. These and other early online catalog systems tended to closely reflect the card catalogs that they were intended to replace. Using a dedicated terminal or telnet client, users could search a handful of pre-coordinate indexes and browse the resulting display in much the same way they had previously navigated the card catalog. Throughout the 1980s, the number and sophistication of online catalogs grew. The first commercial systems appeared, and would by the end of the decade largely replace systems built by libraries themselves. Library catalogs began providing improved search mechanisms, including Boolean and keyword searching, as well as ancillary functions, such as the ability to place holds on items that had been checked-out. At the same time, libraries began to develop applications to automate the purchase, cataloging, and circulation of books and other library materials. These applications, collectively known as an integrated library system (ILS) or library management system, included an online catalog as the public interface to the system's inventory. Most library catalogs are closely tied to their underlying ILS system. === Stagnation and dissatisfaction === The 1990s saw a relative stagnation in the development of online catalogs. Although the earlier character-based interfaces were replaced with ones for the Web, both the design and the underlying search technology of most systems did not advance much beyond that developed in the late 1980s. At the same time, organizations outside of libraries began developing more sophisticated information retrieval systems. Web search engines like Google and popular e-commerce websites such as Amazon.com provided simpler to use (yet more powerful) systems that could provide relevancy ranked search results using probabilistic and vector-based queries. Prior to the widespread use of the Internet, the online catalog was often the first information retrieval system library users ever encountered. Now accustomed to web search engines, newer generations of library users have grown increasingly dissatisfied with the complex (and often arcane) search mechanisms of older online catalog systems. This has, in turn, led to vocal criticisms of these systems within the library community itself, and in recent years to the development of newer (often termed 'next-generation') catalogs. === Next-generation catalogs === Newer generations of library catalog systems, typically called discovery systems (or a discovery layer), are distinguished from earlier OPACs by their use of more sophisticated search technologies, including relevancy ranking and faceted search, as well as features aimed at greater user interaction and participation with the system, including tagging and reviews. These new features rely heavily on existing metadata which may be poor or inconsistent, particularly for older records. Newer catalog platforms may be independent of the organization's integrated library system (ILS), instead providing drivers that allow for the synchronization of data between the two systems. While the original online catalog interfaces were almost exclusively built by ILS vendors, libraries have increasingly sought next-generation catalogs built by enterprise search companies and open-source software projects, often led by libraries themselves. == Union catalogs == Although library catalogs typically reflect the holdings of a single library, they can also contain the holdings of a group or consortium of libraries. These systems, known as union catalogs, are usually designed to aid the borrowing of books and other materials among the member institutions via interlibrary loan. Examples of this type of catalogs include COPAC, SUNCAT, NLA Trove, and WorldCat—the last catalogs the collections of libraries worldwide. == Related systems == There are a number of systems that share much in common with library catalogs, but have traditionally been distinguished from them. Libraries utilize these systems to search for items not traditionally covered by a library catalog, although these systems are sometimes integrated into a more comprehensive discovery system. Bibliographic databases—such as Medline, ERIC, PsycINFO, Scopus, Web of Science, and many others—index journal articles and other research data. There are also a number of applications aimed at managing documents, photographs, and other digitized or born-digital items such as Digital Commons and DSpace. Particularly in academic libraries, these systems (often known as digital library systems or institutional repository systems) assist with efforts to preserve documents created by faculty and students. Electronic resource management helps librarians to track selection, acquisition, and licensing of a library's electronic information resources.

    Read more →
  • Taxonomic database

    Taxonomic database

    A taxonomic database is a database created to hold information on biological taxa – for example groups of organisms organized by species name or other taxonomic identifier – for efficient data management and information retrieval. Taxonomic databases are routinely used for the automated construction of biological checklists such as floras and faunas, both for print publication and online; to underpin the operation of web-based species information systems; as a part of biological collection management (for example in museums and herbaria); as well as providing, in some cases, the taxon management component of broader science or biology information systems. They are also a fundamental contribution to the discipline of biodiversity informatics. == Goals == Taxonomic databases digitize scientific biodiversity data and provide access to taxonomic data for research. Taxonomic databases vary in breadth of the groups of taxa and geographical space they seek to include, for example: beetles in a defined region, mammals globally, or all described taxa in the tree of life. A taxonomic database may incorporate organism identifiers (scientific name, author, and – for zoological taxa – year of original publication), synonyms, taxonomic opinions, literature sources or citations, illustrations or photographs, and biological attributes for each taxon (such as geographic distribution, ecology, descriptive information, threatened or vulnerable status, etc.). Some databases, such as the Global Biodiversity Information Facility(GBIF) database and the Barcode of Life Data System, store the DNA barcode of a taxon if one exists (also called the Barcode Index Number (BIN) which may be assigned, for example, by the International Barcode of Life project (iBOL) or UNITE, a database for fungal DNA barcoding). A taxonomic database aims to accurately model the characteristics of interest that are relevant to the organisms which are in scope for the intended coverage and usage of the system. For example, databases of fungi, algae, bryophytes and vascular plants ("higher plants") encode conventions from the International Code of Botanical Nomenclature while their counterparts for animals and most protists encode equivalent rules from the International Code of Zoological Nomenclature. Modelling the relevant taxonomic hierarchy for any taxon is a natural fit with the relational model employed in almost all database systems. Scientific consensus is not reached for all taxon groups, and new species continue to be described; therefore, another goal of taxonomic databases is to aid in resolving conflicts of scientific opinion and unify taxonomy. == History == Possibly the earliest documented management of taxonomic information in computerised form comprised the taxonomic coding system developed by Richard Swartz et al. at the Virginia Institute of Marine Science for the Biota of Chesapeake Bay and described in a published report in 1972. This work led directly or indirectly to other projects with greater profile including the NODC Taxonomic Code system which went through 8 versions before being discontinued in 1996, to be subsumed and transformed into the still current Integrated Taxonomic Information System (ITIS). A number of other taxonomic databases specializing in particular groups of organisms that appeared in the 1970s through to the present jointly contribute to the Species 2000 project, which since 2001 has been partnering with ITIS to produce a combined product, the Catalogue of Life. While the Catalogue of Life currently concentrates on assembling basic name information as a global species checklist, numerous other taxonomic database projects such as Fauna Europaea, the Australian Faunal Directory, and more supply rich ancillary information including descriptions, illustrations, maps, and more. Many taxonomic database projects are currently listed at the TDWG "Biodiversity Information Projects of the World" site. == Issues == The representation of taxonomic information in machine-encodable form raises a number of issues not encountered in other domains, such as variant ways to cite the same species or other taxon name, the same name used for multiple taxa (homonyms), multiple non-current names for the same taxon (synonyms), changes in name and taxon concept definition through time, and more. Non-standardized categories and metadata in taxonomic databases hampers the ability for researchers to analyze the data. One forum that has promoted discussion and possible solutions to these and related problems since 1985 is the Biodiversity Information Standards (TDWG), originally called the Taxonomic Database Working Group. While online databases have great benefits (for example, increased access to taxonomic information), they also have issues such as data integrity risks due to on- and off-line versions and continuous updates, technical access issues due to server or internet outage, and differing capacities for complex queries to extract taxonomic data into lists. As the quantity of information in online taxonomic databases rapidly expands, data aggregation, and the integration and alignment of non-standardized data across databases, is a big challenge in taxonomy and biodiversity informatics.

    Read more →
  • Algorithm IMED

    Algorithm IMED

    In multi-armed bandit problems, IMED (for Indexed Minimum Empirical Divergence) is an algorithm developed in 2015 by Junya Honda and Akimichi Takemura. It is the first algorithm proved to be asymptotically optimal respect to the problem-dependant Lai–Robbins lower bound for distributions in ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} . == Multi-armed bandit problem == The Multi-armed bandit problem is a sequential game where one player has to choose at each turn between K {\displaystyle K} actions (arms). Behind every arm a {\displaystyle a} there is an unknown distribution ν a {\displaystyle \nu _{a}} that lies in a set D {\displaystyle {\mathcal {D}}} known by the player (for example, D {\displaystyle {\mathcal {D}}} can be the set of Gaussian distributions or Bernoulli distributions). At each turn t {\displaystyle t} the player chooses (pulls) an arm a t {\displaystyle a_{t}} , he then gets an observation X t {\displaystyle X_{t}} of the distribution ν a t {\displaystyle \nu _{a_{t}}} . === Regret minimization === The goal is to minimize the regret at time T {\displaystyle T} that is defined as R T := ∑ a = 1 K Δ a E [ N a ( T ) ] {\displaystyle R_{T}:=\sum _{a=1}^{K}\Delta _{a}\mathbb {E} [N_{a}(T)]} where μ a := E [ ν a ] {\displaystyle \mu _{a}:=\mathbb {E} [\nu _{a}]} is the mean of arm a {\displaystyle a} μ ∗ := max a μ a {\displaystyle \mu ^{}:=\max _{a}\mu _{a}} is the highest mean Δ a := μ ∗ − μ a {\displaystyle \Delta _{a}:=\mu ^{}-\mu _{a}} N a ( t ) {\displaystyle N_{a}(t)} is the number of pulls of arm a {\displaystyle a} up to turn t {\displaystyle t} The player has to find an algorithm that chooses at each turn t {\displaystyle t} which arm to pull based on the previous actions and observations ( a s , X s ) s < t {\displaystyle (a_{s},X_{s})_{s μ } {\displaystyle {\mathcal {K}}_{inf}(\nu ,\mu ,{\mathcal {D}}):=\inf \left\{\mathrm {KL} (\nu ,{\tilde {\nu }})\ |\ {\tilde {\nu }}\in {\mathcal {P}}([-\infty ,1]),\ \mathbb {E} [{\tilde {\nu }}]>\mu \right\}} K L {\displaystyle \mathrm {KL} } is the Kullback–Leibler divergence P ( [ − ∞ , 1 ] ) {\displaystyle {\mathcal {P}}([-\infty ,1])} is the set of distribution in [ − ∞ , 1 ] {\displaystyle [-\infty ,1]} ν ^ a ( t ) {\displaystyle {\hat {\nu }}_{a}(t)} is the empirical distribution of arm a {\displaystyle a} at turn t {\displaystyle t} μ ^ ∗ ( t ) {\displaystyle {\hat {\mu }}^{}(t)} is the highest empirical mean of turn t {\displaystyle t} Remark : For arms a {\displaystyle a} that verify μ ^ a ( t ) = μ ^ ∗ ( t ) {\displaystyle {\hat {\mu }}_{a}(t)={\hat {\mu }}^{}(t)} we have K i n f ( ν ^ a ( t ) , μ ^ ∗ ( t ) ) = 0 {\displaystyle K_{inf}({\hat {\nu }}_{a}(t),{\hat {\mu }}^{}(t))=0} . Then there index is equal to ln ⁡ ( N a ( t ) ) {\displaystyle \ln(N_{a}(t))} === Pseudocode === for each arm i do: n[i] ← 1; nu[i] ← None; mu[i] ← None for t from 1 to K do: select arm t observe reward r n[t] ← n[t] + 1 nu[t] ← update empirical distribution mu[t] ← update empirical mean for t from K+1 to T do: mu ← highest mu for each arm i do: scoreK[i] ← n[i] K_inf(nu[i],mu) scoreN[i] ← ln(n[i]) index[i] ← scoreK[i] + scoreN[i] select arm a with smallest index[a] observe reward r n[a] ← n[a] + 1 nu[a] ← update empirical distribution mu[a] ← update empirical mean == Theoretical results == In the multi-armed bandit problem we have the asymptotic Lai–Robbins lower bound asymptotic lower bound on regret. The algorithm IMED is the first algorithm that matches this lower bound for distribution in ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} in the first order. If the distribution are also bounded then it also match the second order. It is the first algorithm that match the second under of this lower bound. === Lai–Robbins lower bound === In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order It states that for every consistent algorithm on the set P ( [ − ∞ , 1 ] ) {\displaystyle {\mathcal {P}}([-\infty ,1])} — that is, an algorithm for which, for every ( ν 1 , … , ν K ) ∈ P ( [ − ∞ , 1 ] ) K {\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}} , the regret R T {\displaystyle R_{T}} is subpolynomial (i.e. R T = o T → + ∞ ( T α ) {\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })} for all α > 0 {\displaystyle \alpha >0} ) — we have: R T ≥ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T − Ω T → + ∞ ( ln ⁡ ln ⁡ T ) . {\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).} This bound is asymptotic (as T → + ∞ {\displaystyle T\to +\infty } ) and gives a first-order lower bound of order ln ⁡ T {\displaystyle \ln T} with the optimal constant in front of it and the second order in − Ω ( ln ⁡ ln ⁡ T ) {\displaystyle -\Omega (\ln \ln T)} . === Regret bound for IMED === If the distribution of every arm a {\displaystyle a} is ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} ( i.e. ν a ∈ P ( [ − ∞ , 1 ] ) ) {\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))} then the regret of the algorithm IMED verify R T ≤ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T + O ( 1 ) {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T+O(1)} If all the distribution ν a {\displaystyle \nu _{a}} are bounded then it exists a constant C > 0 {\displaystyle C>0} such that for T {\displaystyle T} large enough the regret of IMED is upper bounded by R T ≤ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T − C ln ⁡ ln ⁡ T {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T-C\ln \ln T} == Computation time == The algorithm only requiere to compute the K i n f {\displaystyle K_{inf}} for suboptimal arms who are pulled O ( ln ⁡ T ) {\displaystyle O(\ln T)} times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the K i n f {\displaystyle K_{inf}} in the first order .

    Read more →
  • History of natural language processing

    History of natural language processing

    The history of natural language processing describes the advances of natural language processing. There is some overlap with the history of machine translation, the history of speech recognition, and the history of artificial intelligence. == Early history == The history of machine translation dates back to the seventeenth century, when philosophers such as Leibniz and Descartes put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine. The first patents for "translating machines" were applied for in the mid-1930s. One proposal, by Georges Artsrouni, was simply an automatic bilingual dictionary using paper tape. The other proposal, by Peter Troyanskii, a Russian, was more detailed. Troyanskii’s proposal included both the bilingual dictionary and a method for dealing with grammatical roles between languages, based on Esperanto. == Logical period == In 1950, Alan Turing published his famous article "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence. This criterion depends on the ability of a computer program to impersonate a human in a real-time written conversation with a human judge, sufficiently well that the judge is unable to distinguish reliably — on the basis of the conversational content alone — between the program and a real human. In 1957, Noam Chomsky’s Syntactic Structures revolutionized Linguistics with 'universal grammar', a rule-based system of syntactic structures. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem. However, real progress was much slower, and after the ALPAC report in 1966, which found that ten years long research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted until the late 1980s, when the first statistical machine translation systems were developed. Some notably successful NLP systems developed in the 1960s were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted vocabularies. In 1969 Roger Schank introduced the conceptual dependency theory for natural language understanding. This model, partially influenced by the work of Sydney Lamb, was extensively used by Schank's students at Yale University, such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner. In 1970, William A. Woods introduced the augmented transition network (ATN) to represent natural language input. Instead of phrase structure rules ATNs used an equivalent set of finite-state automata that were called recursively. ATNs and their more general format called "generalized ATNs" continued to be used for a number of years. During the 1970s many programmers began to write 'conceptual ontologies', which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, many chatterbots were written including PARRY, Racter, and Jabberwacky. == Statistical period == Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in NLP with the introduction of machine learning algorithms for language processing. This was due both to the steady increase in computational power resulting from Moore's law and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly, however, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks. === Datasets === The emergence of statistical approaches was aided by both increase in computing power and the availability of large datasets. At that time, large multilingual corpora were starting to emerge. Notably, some were produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. Many of the notable early successes occurred in the field of machine translation. In 1993, the IBM alignment models were used for statistical machine translation. Compared to previous machine translation systems, which were symbolic systems manually coded by computational linguists, these systems were statistical, which allowed them to automatically learn from large textual corpora. Though these systems do not work well in situations where only small corpora is available, so data-efficient methods continue to be an area of research and development. In 2001, a one-billion-word large text corpus, scraped from the Internet, referred to as "very very large" at the time, was used for word disambiguation. To take advantage of large, unlabelled datasets, algorithms were developed for unsupervised and self-supervised learning. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results. == Neural period == Neural language models were developed in 1990s. In 1990, the Elman network, using a recurrent neural network, encoded each word in a training set as a vector, called a word embedding, and the whole vocabulary as a vector database, allowing it to perform such tasks as sequence-predictions that are beyond the power of a simple multilayer perceptron. A shortcoming of the static embeddings was that they didn't differentiate between multiple meanings of homonyms. Yoshua Bengio developed the first neural probabilistic language model in 2000. Novel algorithms, availability of larger datasets and higher processing power made possible training of larger and larger language models. Attention mechanism was introduced by Bahdanau et al. in 2014. This work laid the foundations for the famous "Attention Is All You Need" paper that introduced the Transformer architecture in 2017. The concept of large language model (LLM) emerged in late 2010s. LLM is a language model trained with self-supervised learning on vast amount of text. Earliest public LLMs had hundreds of millions of parameters, but this number quickly rose to billion and even trillions. In recent years, advancements in deep learning and large language models have significantly enhanced the capabilities of natural language processing, leading to widespread applications in areas such as healthcare, customer service, and content generation. == Software ==

    Read more →
  • Transliteracy

    Transliteracy

    Transliteracy is "a fluidity of movement across a range of technologies, media and contexts". It is an ability to use diverse techniques to collaborate across different social groups. Transliteracy combines a range of capabilities required to move across a range of contexts, media, technologies and genres. Conceptually, transliteracy is situated across five capabilities: information capabilities (see information literacy), ICT (information and communication technologies), communication and collaboration, creativity and critical thinking. It is underpinned by literacy and numeracy. (See figure below) The concept of transliteracy is impacting the system of education and libraries. == History == While the term appears to come from the prefix trans- ('across') and the word literacy, the scholars who coined it say they developed it from the practice of transliteration, which means to use the letters of one language to write down a different language. The study of transliteracy was first developed in 2005 by the Transliteracies Research Project, directed by University of California at Santa Barbara Professor Alan Liu. The concept of 'transliteracies' was developed as part of research into online reading. It was shared and refined at the Transliteracies conference, held at UC Santa Barbara in 2005. The conference inspired the at the time De Montfort University Professor, Sue Thomas, to create the Production in Research and Transliteracy (PART) group, which evolved into the Transliteracy Research Group. The current meaning of transliteracy was defined in the group's seminal paper Transliteracy: crossing divides as "the ability to read, write, and interact across a range of platforms, tools, and media from signing and orality through handwriting, print, TV, radio, and film, to digital social networks." The concept was enthusiastically adopted by a number of professional groups, notably in the library and information field. Transliteracy Research Group Archive 2006–2013 curates numerous resources from this period. For a number of years, there was a gap between significant interest in transliteracy among professional groups and the scarcity of research. A group of academics from the University of Bordeaux considered transliteracy mainly in the school context. Freelance writer and consultant, Sue Thomas, studied transliteracy and creativity, while Suzana Sukovic, executive director of educational research and evidence-based practice at HETI, researched transliteracy in relation to digital storytelling. The first book on the topic, Transliteracy in complex information environment by Sukovic, is based on research and experience with practice-based projects. == Transliteracy in education == Transliteracy is making an impact on the classroom setting because of how technologically advanced younger generations are today. In 2012, Adam Marcus, a teacher and librarian at the New York City Department of Education (NYCDOE), decided to incorporate transliteracy into his school's public library summer reading program. He had a desire to enhance the experience of reading for his students by allowing them to connect to the text differently by using social media. He used a tool called VoiceThread in order to have his students "take part in conversations, formulate ideas, and share higher-order thinking through a variety of media channels: video, audio, text, images, and music". Students were also enabled to communicate with the book's author through blogs and websites, and were given multiple modes of media to comprehend and engage with the text on a deeper level. Some of these examples include an audio-video glossary and web links that aimed to bring the details of the text to life. The results of his experiment were deemed to have a positive effect on the program as students responded well to this interactive experience they were given. Marcus believes that it is important for educators and librarians to enhance storytelling for children by providing them with a modern and transliterate experience that one could not receive back then. The Agence nationale de la recherche funded a program at a French high school from 2013 to 2015, where the transliteracy skills of students were tested and observed. Students were placed in groups of three or four members and were required to use all sorts of media and tools in order to collect data for their projects. They were not allowed to only use digital sources, and were advised to use a diversity of sources. The focus of this experiment was to observe "the possible diversity of media and tools employed, on the ways of and reasons for switching from one to another, on how these different media and tools are distributed within contexts, according to the academic requirements and tasks individually and collectively performed by the students." The conclusions of the experiment dealt with physical space and organization being an issue for students and teachers to deal with. Spatially, it was challenging for students to navigate through different mediums when their space inside the classroom was limited. It was noticed that students were prone to use something that took up less space, rather than focusing on expanding their diversity of sources. Organizationally, it was challenging for students to organize all of the information they collected since everything was not being search and collected for digitally. In addition, students were not allotted a lot of time to complete their projects which also impacted their final product. == Transliteracy in libraries == In 2009, Dr. Susie Andretta, senior lecturer in Information Management at London Metropolitan University, conducted interviews with four different information professionals including an academic librarian, an outreach librarian, a content manager, and a scholar within the library science and information discipline. She was aiming to explore how transliteracy was colliding and combining with the print-world of libraries. Dr. Andretta defines transliteracy as "an umbrella term encompassing different literacies and multiple communication channels that require active participation with and across a range of platforms, and embracing both linear and non-linear messages (3)." The goals of these interviews ranged from the following: to test the information professional's awareness of transliteracy, to have them identify transliteracy and how it is integrated into their work, and to explain the impact transliteracy has had on they library they work at. Andretta found that out of all the information professionals interviewed, it was only the academic librarian who was vaguely familiar with the concept of transliteracy. Bernadette Daly Swanson, an Academic Librarian at UC Davis, expresses in her interview with Dr. Andretta how she would "like to think that the transliterate library is more of an environment where we do different things [...] I would take maybe about a third of the first floor of our library and transform it into a lab [...] where we can start to evolve [..] explore, and experiment in media development, content development, and do it not just with librarians; so open up the space for other people [...] so you don't get people working in isolation." Although the other three candidates that Dr. Andretta interviewed had not heard of the term transliteracy, they responded well to the concept once it was explained to them and agreed with its impact on the workplace. Dr. Michael Stephens, an assistant professor in the Graduate School of Library and Information Science at Dominican University, explains in his interview how the term transliteracy describes the courses he teaches on libraries and Web 2.0 technologies. Dr. Stephens states that students being educated in Web 2.0 technologies gives them "the opportunity to experience what the channel can be and the potential for that sharing learning, for asking questions, just for out loud thinking – I think it's incredibly valuable. [..] this is where this wonderful concept comes in, it was teaching them transliteracy and the fact that they can move across channels without getting worried about it." Dr. Andretta concluded from her interviews how although transliteracy may not be a very well-known term yet, it has nonetheless established itself into the intuition of libraries while also transforming the traditional library to a world of enhanced and expanded services. "Inherent in this transition are the challenges of having to adapt to a constantly changing technological landscape, the multiple literacies that this generates, and the need to establish a multifaceted library profession that can speak the multiple-media languages of its diverse users." Thomas Ipri, a librarian at the University of Nevada, advocates for libraries needing to make a change in their literary functions. He argues that the divide between digital and print makes it harder for libraries to accommodate their patrons and to share information. He f

    Read more →
  • Grid-oriented storage

    Grid-oriented storage

    Grid-oriented Storage (GOS) was a term used for data storage by a university project during the era when the term grid computing was popular. == Description == GOS was a successor of the term network-attached storage (NAS). GOS systems contained hard disks, often RAIDs (redundant arrays of independent disks), like traditional file servers. GOS was designed to deal with long-distance, cross-domain and single-image file operations, which is typical in Grid environments. GOS behaves like a file server via the file-based GOS-FS protocol to any entity on the grid. Similar to GridFTP, GOS-FS integrates a parallel stream engine and Grid Security Infrastructure (GSI). Conforming to the universal VFS (Virtual Filesystem Switch), GOS-FS can be pervasively used as an underlying platform to best utilize the increased transfer bandwidth and accelerate the NFS/CIFS-based applications. GOS can also run over SCSI, Fibre Channel or iSCSI, which does not affect the acceleration performance, offering both file level protocols and block level protocols for storage area network (SAN) from the same system. In a grid infrastructure, resources may be geographically distant from each other, produced by differing manufacturers, and have differing access control policies. This makes access to grid resources dynamic and conditional upon local constraints. Centralized management techniques for these resources are limited in their scalability both in terms of execution efficiency and fault tolerance. Provision of services across such platforms requires a distributed resource management mechanism and the peer-to-peer clustered GOS appliances allow a single storage image to continue to expand, even if a single GOS appliance reaches its capacity limitations. The cluster shares a common, aggregate presentation of the data stored on all participating GOS appliances. Each GOS appliance manages its own internal storage space. The major benefit of this aggregation is that clustered GOS storage can be accessed by users as a single mount point. GOS products fit the thin-server categorization. Compared with traditional “fat server”-based storage architectures, thin-server GOS appliances deliver numerous advantages, such as the alleviation of potential network/grid bottle-necks, CPU and OS optimized for I/O only, ease of installation, remote management and minimal maintenance, low cost and Plug and Play, etc. Examples of similar innovations include NAS, printers, fax machines, routers and switches. An Apache server has been installed in the GOS operating system, ensuring an HTTPS-based communication between the GOS server and an administrator via a Web browser. Remote management and monitoring makes it easy to set up, manage, and monitor GOS systems. == History == Frank Zhigang Wang and Na Helian proposed a funding proposal to the UK government titled “Grid-Oriented Storage (GOS): Next Generation Data Storage System Architecture for the Grid Computing Era” in 2003. The proposal was approved and granted one million pounds in 2004. The first prototype was constructed in 2005 at Centre for Grid Computing, Cambridge-Cranfield High Performance Computing Facility. The first conference presentation was at IEEE Symposium on Cluster Computing and Grid (CCGrid), 9–12 May 2005, Cardiff, UK. As one of the five best work-in-progress, it was included in the IEEE Distributed Systems Online. In 2006, the GOS architecture and its implementations was published in IEEE Transactions on Computers, titled “Grid-oriented Storage: A Single-Image, Cross-Domain, High-Bandwidth Architecture”. Starting in January 2007, demonstrations were presented at Princeton University, Cambridge University Computer Lab and others. By 2013, the Cranfield Centre still used future tense for the project. Peer-to-peer file sharings use similar techniques.

    Read more →