AI Chat Helper

AI Chat Helper — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Quantum machine learning

    Quantum machine learning

    Quantum machine learning (QML) is the study of quantum algorithms for machine learning. It often refers to quantum algorithms for machine learning tasks which analyze classical data, sometimes called quantum-enhanced machine learning. QML algorithms use qubits and quantum operations to try to improve the space and time complexity of classical machine learning algorithms. Hybrid QML methods involve both classical and quantum processing, where computationally difficult subroutines are outsourced to a quantum device. These routines can be more complex in nature and executed faster on a quantum computer. Furthermore, quantum algorithms can be used to analyze quantum states instead of classical data. The term "quantum machine learning" is sometimes used to refer classical machine learning methods applied to data generated from quantum experiments (i.e. machine learning of quantum systems), such as learning the phase transitions of a quantum system or creating new quantum experiments. QML also extends to a branch of research that explores methodological and structural similarities between certain physical systems and learning systems, in particular neural networks. For example, some mathematical and numerical techniques from quantum physics are applicable to classical deep learning and vice versa. Furthermore, researchers investigate more abstract notions of learning theory with respect to quantum information, sometimes referred to as "quantum learning theory". == Machine learning with quantum computers == Quantum-enhanced machine learning refers to quantum algorithms that solve tasks in machine learning, thereby improving and often expediting classical machine learning techniques. Such algorithms typically require one to encode the given classical data set into a quantum computer to make it accessible for quantum information processing. Subsequently, quantum information processing routines are applied and the result of the quantum computation is read out by measuring the quantum system. For example, the outcome of the measurement of a qubit reveals the result of a binary classification task. While many proposals of QML algorithms are still purely theoretical and require a full-scale universal quantum computer to be tested, others have been implemented on small-scale or special purpose quantum devices. === Quantum associative memories and quantum pattern recognition === Early work on quantum associative memories has been done by Dan Ventura and Tony Martinez and by Carlo A. Trugenberger in the late 1990s and early 2000s. Associative (or content-addressable) memories are able to recognize stored content on the basis of a similarity measure, while random access memories are accessed by the address of stored information and not its content. As such they must be able to retrieve both incomplete and corrupted patterns, the essential machine learning task of pattern recognition. Typical classical associative memories store p patterns in the O ( n 2 ) {\displaystyle O(n^{2})} interactions (synapses) of a real, symmetric energy matrix over a network of n artificial neurons. The encoding is such that the desired patterns are local minima of the energy functional and retrieval is done by minimizing the total energy, starting from an initial configuration. Unfortunately, classical associative memories are severely limited by the phenomenon of cross-talk. When too many patterns are stored, spurious memories appear which quickly proliferate, so that the energy landscape becomes disordered and no retrieval is anymore possible. The number of storable patterns is typically limited by a linear function of the number of neurons, p ≤ O ( n ) {\displaystyle p\leq O(n)} . Quantum associative memories (in their simplest realization) store patterns in a unitary matrix U acting on the Hilbert space of n qubits. Retrieval is realized by the unitary evolution of a fixed initial state to a quantum superposition of the desired patterns with probability distribution peaked on the most similar pattern to an input. By its very quantum nature, the retrieval process is thus probabilistic. Because quantum associative memories are free from cross-talk, however, spurious memories are never generated. Correspondingly, they have a superior capacity than classical ones. The number of parameters in the unitary matrix U is O ( p n ) {\displaystyle O(pn)} . One can thus have efficient, spurious-memory-free quantum associative memories for any polynomial number of patterns. If the matrix U is encoded as a unique operator (as opposed as to a sequence of gates as in the circuit model), e.g. by an optical interferometer, the retrieval becomes efficient even for an exponential number of patterns. === Linear algebra simulation with quantum amplitudes === A number of quantum algorithms for machine learning are based on the idea of amplitude encoding, that is, to associate the amplitudes of a quantum state with the inputs and outputs of computations. Since a state of n {\displaystyle n} qubits is described by 2 n {\displaystyle 2^{n}} complex amplitudes, this information encoding can allow for an exponentially compact representation. Intuitively, this corresponds to associating a discrete probability distribution over binary random variables with a classical vector. The goal of algorithms based on amplitude encoding is to formulate quantum algorithms whose resources grow polynomially in the number of qubits n {\displaystyle n} , which amounts to a logarithmic time complexity in the number of amplitudes and thereby the dimension of the input. Many QML algorithms in this category are based on variations of the quantum algorithm for linear systems of equations (colloquially called HHL, after the paper's authors) which, under specific conditions, performs a matrix inversion using an amount of physical resources growing only logarithmically in the dimensions of the matrix. One of these conditions is that a Hamiltonian which entry-wise corresponds to the matrix can be simulated efficiently, which is known to be possible if the matrix is sparse or low rank. For reference, any known classical algorithm for matrix inversion requires a number of operations that grows more than quadratically in the dimension of the matrix (e.g. O ( n 2.373 ) {\displaystyle O{\mathord {\left(n^{2.373}\right)}}} ), but they are not restricted to sparse matrices. Quantum matrix inversion can be applied to machine learning methods in which the training reduces to solving a linear system of equations, for example in least-squares linear regression, the least-squares version of support vector machines, and Gaussian processes. A crucial bottleneck of methods that simulate linear algebra computations with the amplitudes of quantum states is state preparation, which often requires one to initialise a quantum system in a state whose amplitudes reflect the features of the entire dataset. Although efficient methods for state preparation are known for specific cases, this step easily hides the complexity of the task. === Variational quantum algorithms (VQAs) === In a variational quantum algorithm, a classical computer optimizes the parameters used to prepare a quantum state, while a quantum computer is used to do the actual state preparation and measurement. VQAs are considered promising candidates for noisy intermediate-scale quantum computers. Variational quantum circuits (or parameterized quantum circuits) are a popular class of VQAs where the parameters are those used in a fixed quantum circuit. Researchers have studied VQCs to solve optimization problems and find the ground state energy of complex quantum systems, which were difficult to solve using a classical computer. === Quantum binary classifier === Pattern reorganization is one of the important tasks of machine learning, binary classification is one of the tools or algorithms to find patterns. Binary classification is used in supervised learning and in unsupervised learning. In QML, classical bits are converted to qubits and they are mapped to Hilbert space; complex value data are used in a quantum binary classifier to use the advantage of Hilbert space. By exploiting the quantum mechanic properties such as superposition, entanglement, interference the quantum binary classifier produces the accurate result in short period of time. === Quantum machine learning algorithms based on Grover search === Another approach to improving classical machine learning with quantum information processing uses amplitude amplification methods based on Grover's search algorithm, which has been shown to solve unstructured search problems with a quadratic speedup compared to classical algorithms. These quantum routines can be employed for learning algorithms that translate into an unstructured search task, as can be done, for instance, in the case of the k-medians and the k-nearest neighbors algorithms. Other applications include quadratic speedups in the training of perceptrons. An e

    Read more →
  • Pol.is

    Pol.is

    Polis (or Pol.is) is wiki survey software designed for large group collaborations. As a civic technology, Polis allows people to share their opinions and ideas, and its algorithm is intended to elevate ideas that can facilitate better decision-making, especially when there are lots of participants. Polis has been credited for assisting the passage of legislation in Taiwan. Pol.is has been used by governments in the United States, Canada, Singapore, Philippines, Finland, Spain and elsewhere. == History == Pol.is was founded by Colin Megill, Christopher Small, and Michael Bjorkegren after the Occupy Wall Street and Arab Spring movements. In Taiwan, pol.is has been "one of the key parts" of vTaiwan's suite of open-source tools for its citizen engagement efforts arising out of the Sunflower Student Movement. vTaiwan claims that of the 26 national issues related to technology discussed on the platform, 80% led to government action. Pol.is is also utilized by "Join," a national platform for online deliberation run by the Taiwanese government. In 2022, Wired reported that Polis was an influence on the Community Notes project at Twitter. In 2023, Megill advised OpenAI on how to facilitate deliberation at scale in a way that was more efficient than Polis, which still required significant human labor and analysis at the time. He helped to award $1 million in grants to teams working on solving the problem of deliberation at scale. In 2023, Anthropic was also exploring steering model behavior using Polis. In 2025, it helped the county that includes Bowling Green, Kentucky make a 25 year plan by facilitating the collection and review of ideas from thousands of residents, representing 10% of the county. 2,370 of 3,940 unique ideas were agreed-upon by over 80% of survey respondents. Ideas were screened by volunteers if they were redundant to an existing idea, off-topic or obscene. == How it works == Pol.is participants are anonymous and cannot reply directly to others posts, in an effort to avoid personal attacks for users. Its algorithms are designed not for engagement and scrolling, but to find areas of agreement to better understand the nuances of a wide range of opinions. Participants are prompted for ideas and vote on other participants' ideas. == Reception == Andrew Leonard, The Financial Times, and VentureBeat describe Pol.is as a possible antidote to the divisiveness of traditional internet discourse by gamifying consensus. Audrey Tang agreed saying, "Polis is quite well known in that it's a kind of social media that instead of polarizing people to drive so called engagement or addiction or attention, it automatically drives bridge making narratives and statements. So only the ideas that speak to both sides or to multiple sides will gain prominence in Polis." Niall Ferguson argues that the approach to utilize tools like Pol.is and Join in Taiwan empowers ordinary people instead of the elite and protects individual freedoms, providing a contrast to the AI-enhanced panopticon model seen in China. Carl Miller praised the technology as having "gamified finding consensus." Darshana Narayanan, in an op-ed in the Economist, argues that open-source machine-learning-based tools like Polis can help to bypass the influence of special interests or experts. Jamie Susskind cited polis and vTaiwan as a model for democracies, particularly around digital policy issues.

    Read more →
  • Browsing

    Browsing

    Browsing is a kind of orienting strategy. It is supposed to identify something of relevance for the browsing organism. In context of humans, it is a metaphor taken from the animal kingdom. It is used, for example, about people browsing open shelves in libraries, window shopping, or browsing databases or the Internet. In library and information science, it is an important subject, both purely theoretically and as applied science aiming at designing interfaces which support browsing activities for the user. == Definition == In 2011, Birger Hjørland provided the following definition: "Browsing is a quick examination of the relevance of a number of objects which may or may not lead to a closer examination or acquisition/selection of (some of) these objects. It is a kind of orienting strategy that is formed by our "theories", "expectations" and "subjectivity". == Controversies == As with any kind of human psychology, browsing can be understood in biological, behavioral, or cognitive terms on the one hand or in social, historical, and cultural terms on the other hand. In 2007, Marcia Bates researched browsing from "behavioural" approaches, while Hjørland (2011a+b) defended a social view. Bates found that browsing is rooted in our history as exploratory, motile animals hunting for food and nesting opportunities. According to Hjørland (2011a), on the other hand, Marcia Bates' browsing for information about browsing is governed by her behavioral assumptions, while Hjørland's browsing for information about browsing is governed by his socio-cultural understanding of human psychology. In short: Human browsing is based on our conceptions and interests. === Is browsing a random activity? === Browsing is often understood as a random activity. Dictionary.com, for example, has this definition: "to glance at random through a book, magazine, etc.". Hjørland suggests, however, that browsing is an activity that is governed by our metatheories. We may dynamically change our theories and conceptions but when we browse, the activity is governed by the interests, conceptions, priorities and metatheories that we have at that time. Therefore, browsing is not totally random. == Browsing versus analytical search strategies == In 1997, Gary Marchionini wrote: "A fundamental distinction is made between analytical and browsing strategies [...]. Analytical strategies depend on careful planning, the recall of query terms, and iterative query reformulations and examinations of results. Browsing strategies are heuristic and opportunistic and depend on recognizing relevant information. Analytic strategies are batch oriented and half duplex (turn talking) like human conversation, whereas browsing strategies are more interactive, real-time exchanges and collaborations between the information seeker and the information system. Browsing strategies demand a lower cognitive load in advance and a steadier attentional load throughout the information-seeking process. When it comes to Browsing, giblets are amazing." == Orienting strategies == Some sociologists, such as Berger and Zelditch in 1993, Wagner in 1984, and Wagner & Berger in 1985, have used the term "orienting strategies". They find that orienting strategies should be understood as metatheories: "Consider the very large proportion of sociological theory that is in the form of metatheory. It is discussion about theory: about what concepts it should include, about how those concepts should be linked, and about how theory should be studied. Similar to Kuhn’s paradigms, theories of this sort provide guidelines or strategies for understanding social phenomena and suggest the proper orientation of the theorist to these phenomena; they are orienting strategies. Textbooks in theory frequently focus on orienting strategies such as functionalism, exchange, or ethnomethodology." Sociologists thus use metatheories as orienting strategies. We may generalize and say that all people use metatheories as orienting strategies and that this is what direct our attention and also our browsing – also when we are not conscious about it.

    Read more →
  • Recommender system

    Recommender system

    A recommender system, also called a recommendation algorithm, recommendation engine, or recommendation platform, is a type of information filtering system that suggests items most relevant to a particular user. The value of these systems becomes particularly evident in scenarios where users must select from a large number of options, such as products, media, or content. Major social media platforms and streaming services rely on recommender systems that employ machine learning to analyze user behavior and preferences, thereby enabling personalized content feeds. Typically, the suggestions refer to a variety decision-making processes, including the selection of a product, musical selection, or online news source to read. The implementation of recommender systems is pervasive, with commonly recognised examples including the generation of playlist for video and music services, the provision of product recommendations for e-commerce platforms, and the recommendation of content on social media platforms and the open web. These systems can operate using a single type of input, such as music, or multiple inputs from diverse platforms, including news, books and search queries. Additionally, popular recommender systems have been developed for specific topics, such as restaurants and online dating services. Recommender systems have also been developed to explore research articles and experts, collaborators, and financial services. A content discovery platform is a software recommendation platform that employs recommender system tools. It utilizes user metadata in order to identify and suggest relevant content, whilst reducing ongoing maintenance and development costs. A content discovery platform delivers personalized content to websites, mobile devices, and set-top boxes. A large range of content discovery platforms currently exist for various forms of content ranging from news articles and academic journal articles to television. As operators compete to serve as the gateway to home entertainment, personalized television emerges as a key service differentiator. Academic content discovery has recently become another area of interest, the emergence of numerous companies dedicated to assisting academic researchers in keeping up to date with relevant academic content and facilitating serendipitous discovery of new content. == Overview == Recommender systems usually make use of either or both collaborative filtering and content-based filtering, as well as other systems such as knowledge-based systems. Collaborative filtering approaches build a model from a user's past behavior (e.g., items previously purchased or selected and/or numerical ratings given to those items) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that the user may have an interest in. Content-based filtering approaches utilize a series of discrete, pre-tagged characteristics of an item in order to recommend additional items with similar properties. === Example === The differences between collaborative and content-based filtering can be demonstrated by comparing two early music recommender systems, Last.fm and Pandora Radio. We can also look at how these methods are applied in e-commerce, for example, on platforms like Amazon. Last.fm creates a "station" of recommended songs by observing what bands and individual tracks the user has listened to on a regular basis and comparing those against the listening behavior of other users. Last.fm will play tracks that do not appear in the user's library, but are often played by other users with similar interests. As this approach leverages the behavior of users, it is an example of a collaborative filtering technique. Pandora uses the properties of a song or artist (a subset of the 450 attributes provided by the Music Genome Project) to seed a "station" that plays music with similar properties. User feedback is used to refine the station's results, deemphasizing certain attributes when a user "dislikes" a particular song and emphasizing other attributes when a user "likes" a song. This is an example of a content-based approach. In e-commerce, Amazon's well-known "customers who bought X also bought Y" feature is a prime example of collaborative filtering. It also uses content-based filtering when it recommends a book by the same author you've previously read or a pair of shoes in a similar style to ones you've viewed. Each type of system has its strengths and weaknesses. In the above example, Last.fm requires a large amount of information about a user to make accurate recommendations. This is an example of the cold start problem, and is common in collaborative filtering systems. Whereas Pandora needs very little information to start, it is far more limited in scope (for example, it can only make recommendations that are similar to the original seed). === Alternative implementations === Recommender systems are a useful alternative to search algorithms since they help users discover items they might not have found otherwise. Of note, recommender systems are often implemented using search engines indexing non-traditional data. In some cases, like in the Gonzalez v. Google Supreme Court case, may argue that search and recommendation algorithms are different technologies. Recommender systems have been the focus of several granted patents, and there are more than 50 software libraries that support the development of recommender systems including LensKit, RecBole, ReChorus and RecPack. == History == Elaine Rich created the first recommender system in 1979, called Grundy. She looked for a way to recommend users books they might like. Her idea was to create a system that asks users specific questions and classifies them into classes of preferences, or "stereotypes", depending on their answers. Depending on users' stereotype membership, they would then get recommendations for books they might like. Another early recommender system, called a "digital bookshelf", was described in a 1990 technical report by Jussi Karlgren at Columbia University, and implemented at scale and worked through in technical reports and publications from 1994 onwards by Jussi Karlgren, then at SICS, and research groups led by Pattie Maes at MIT, Will Hill at Bellcore, and Paul Resnick, also at MIT, whose work with GroupLens was awarded the 2010 ACM Software Systems Award. Montaner provided the first overview of recommender systems from an intelligent agent perspective. Adomavicius provided a new, alternate overview of recommender systems. Herlocker provides an additional overview of evaluation techniques for recommender systems, and Beel et al. discussed the problems of offline evaluations. Beel et al. have also provided literature surveys on available research paper recommender systems and existing challenges. == Approaches == === Collaborative filtering === One approach to the design of recommender systems that has wide use is collaborative filtering. Collaborative filtering is based on the assumption that people who agreed in the past will agree in the future, and that they will like similar kinds of items as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. By locating peer users/items with a rating history similar to the current user or item, they generate recommendations using this neighborhood. This approach is a cornerstone for e-commerce sites that analyze the purchasing patterns of thousands of users to suggest what you might like. Collaborative filtering methods are classified as memory-based and model-based. A well-known example of memory-based approaches is the user-based algorithm, while that of model-based approaches is matrix factorization (recommender systems). A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself. Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the k-nearest neighbor (k-NN) approach and the Pearson Correlation as first implemented by Allen. When building a model from a user's behavior, a distinction is often made between explicit and implicit forms of data collection. Examples of explicit data collection include the following: Asking a user to rate an item on a sliding scale. Asking a user to search. Asking a user to rank a collection of items from favorite to least favorite. Presenting two items to a user and asking him/her to choose the better one of them. Asking a user to create a list of items that he/she likes (see Rocchio classification or other similar techniques). Examples of implicit data collection include the following: Observing the items that a user views in an online store, media library, or other repository of med

    Read more →
  • Distribution management system

    Distribution management system

    A distribution management system (DMS) is a collection of applications designed to monitor and control the electric power distribution networks efficiently and reliably. It acts as a decision support system to assist the control room and field operating personnel with the monitoring and control of the electric distribution system. Improving the reliability and quality of service in terms of reducing power outages, minimizing outage time, maintaining acceptable frequency and voltage levels are the key deliverables of a DMS. Given the complexity of distribution grids, such systems may involve communication and coordination across multiple components. For example, the control of active loads may require a complex chain of communication through different components as described in US patent 11747849B2 In recent years, utilization of electrical energy increased exponentially and customer requirement and quality definitions of power were changed enormously. As electric energy became an essential part of daily life, its optimal usage and reliability became important. Real-time network view and dynamic decisions have become instrumental for optimizing resources and managing demands, leading to the need for distribution management systems in large-scale electrical networks. == Overview == Most distribution utilities have been comprehensively using IT solutions through their Outage Management System (OMS) that makes use of other systems like Customer Information System (CIS), Geographical Information System (GIS) and Interactive Voice Response System (IVRS). An outage management system has a network component/connectivity model of the distribution system. By combining the locations of outage calls from customers with knowledge of the locations of the protection devices (such as circuit breakers) on the network, a rule engine is used to predict the locations of outages. Based on this, restoration activities are charted out and the crew is dispatched for the same. In parallel with this, distribution utilities began to roll out Supervisory Control and Data Acquisition (SCADA) systems, initially only at their higher voltage substations. Over time, use of SCADA has progressively extended downwards to sites at lower voltage levels. DMSs access real-time data and provide all information on a single console at the control centre in an integrated manner. Their development varied across different geographic territories. In the US, for example, DMSs typically grew by taking Outage Management Systems to the next level, automating the complete sequences and providing an end to end, integrated view of the entire distribution spectrum. In the UK, by contrast, the much denser and more meshed network topologies, combined with stronger Health & Safety regulation, had led to early centralisation of high-voltage switching operations, initially using paper records and schematic diagrams printed onto large wallboards which were 'dressed' with magnetic symbols to show the current running states. There, DMSs grew initially from SCADA systems as these were expanded to allow these centralised control and safety management procedures to be managed electronically. These DMSs required even more detailed component/connectivity models and schematics than those needed by early OMSs as every possible isolation and earthing point on the networks had to be included. In territories such as the UK, therefore, the network component/connectivity models were usually developed in the DMS first, whereas in the USA these were generally built in the GIS. The typical data flow in a DMS has the SCADA system, the Information Storage & Retrieval (ISR) system, Communication (COM) Servers, Front-End Processors (FEPs) & Field Remote Terminal Units (FRTUs). == Why DMS? == Reduce the duration of outages Improve the speed and accuracy of outage predictions. Reduce crew patrol and drive times through improved outage locating. Improve the operational efficiency Determine the crew resources necessary to achieve restoration objectives. Effectively utilize resources between operating regions. Determine when best to schedule mutual aid crews. Increased customer satisfaction A DMS incorporates IVR and other mobile technologies, through which there is an improved outage communications for customer calls. Provide customers with more accurate estimated restoration times. Improve service reliability by tracking all customers affected by an outage, determining electrical configurations of every device on every feeder, and compiling details about each restoration process. == DMS Functions == In order to support proper decision making and O&M activities, DMS solutions should support the following functions: Network visualization & support tools Applications for Analytical & Remedial Action Utility Planning Tools System Protection Schemes The various sub functions of the same, carried out by the DMS are listed below:- === Network Connectivity Analysis (NCA) === Distribution network usually covers over a large area and catering power to different customers at different voltage levels. So locating required sources and loads on a larger GIS/Operator interface is often very difficult. Panning & zooming provided with normal SCADA system GUI does not cover the exact operational requirement. Network connectivity analysis is an operator specific functionality which helps the operator to identify or locate the preferred network or component very easily. NCA does the required analyses and provides display of the feed point of various network loads. Based on the status of all the switching devices such as circuit breaker (CB), Ring Main Unit (RMU) and/or isolators that affect the topology of the network modeled, the prevailing network topology is determined. The NCA further assists the operator to know operating state of the distribution network indicating radial mode, loops and parallels in the network. === Switching Schedule & Safety Management === In territories such as the UK a core function of a DMS has always been to support safe switching and work on the networks. Control engineers prepare switching schedules to isolate and make safe a section of network before work is carried out, and the DMS validates these schedules using its network model. Switching schedules can combine telecontrolled and manual (on-site) switching operations. When the required section has been made safe, the DMS allows a Permit To Work (PTW) document to be issued. After its cancellation when the work has been finished, the switching schedule then facilitates restoration of the normal running arrangements. Switching components can also be tagged to reflect any Operational Restrictions that are in force. The network component/connectivity model, and associated diagrams, must always be kept absolutely up to date. The switching schedule facility therefore also allows 'patches' to the network model to be applied to the live version at the appropriate stage(s) of the jobs. The term 'patch' is derived from the method previously used to maintain the wallboard diagrams. === State Estimation (SE) === The state estimator is an integral part of the overall monitoring and control systems for transmission networks. It is mainly aimed at providing a reliable estimate of the system voltages. This information from the state estimator flows to control centers and database servers across the network. The variables of interest are indicative of parameters like margins to operating limits, health of equipment and required operator action. State estimators allow the calculation of these variables of interest with high confidence despite the facts that the measurements may be corrupted by noise, or could be missing or inaccurate. Even though we may not be able to directly observe the state, it can be inferred from a scan of measurements which are assumed to be synchronized. The algorithms need to allow for the fact that presence of noise might skew the measurements. In a typical power system, the State is quasi-static. The time constants are sufficiently fast so that system dynamics decay away quickly (with respect to measurement frequency). The system appears to be progressing through a sequence of static states that are driven by various parameters like changes in load profile. The inputs of the state estimator can be given to various applications like Load Flow Analysis, Contingency Analysis, and other applications. === Load Flow Applications (LFA) === Load flow study is an important tool involving numerical analysis applied to a power system. The load flow study usually uses simplified notations like a single-line diagram and focuses on various forms of AC power rather than voltage and current. It analyzes the power systems in normal steady-state operation. The goal of a power flow study is to obtain complete voltage angle and magnitude information for each bus in a power system for specified load and generator real power and voltage conditions. Once this

    Read more →
  • Nike+iPod

    Nike+iPod

    The Nike+iPod Sport Kit is an activity tracker device, developed by Nike, Inc., which measures and records the distance and pace of a walk or run. The Nike+iPod consists of a small transmitter device attached to or embedded in a shoe, which communicates with either the Nike+ Sportband, or a receiver plugged into an iPod Nano. It can also work directly with a 2nd Generation iPod Touch (or higher), iPhone 3GS, iPhone 4, iPhone 4S, iPhone 5, The Nike+iPod was announced on May 23, 2006. On September 7, 2010, Nike released the Nike+ Running App (originally called Nike+ GPS) on the App Store, which used a tracking engine powered by MotionX that does not require the separate shoe sensor or pedometer. This application works using the accelerometer and GPS of the iPhone and the accelerometer of the iPod Touch, which does not have a GPS chip. Nike+Running is compatible with the iPhone 6 and iPhone 6 Plus down to iPhone 3GS and iPod touch. On June 21, 2012, Nike released Nike+ Running App for Android. The current app is compatible with all Android phones running 4.0.3 and up. == Overview == The sensor and iPod kit were revealed on May 20, 2006. The kit stores information such as the elapsed time of the workout, the distance traveled, pace, and calories burned by the individual. Nike+ was a collaboration between Nike and Apple; the platform consisted of an iPod, a wireless chip, Nike shoes that accepted the wireless chip, an iTunes membership, and a Nike+ online community. iPods using Nike iPod require a sensor and remote. The next upgraded product was the Sportband kit, which was announced in April 2008. The kit allows users to store run information without the iPod Nano. The Sportband consists of two parts: a rubber holding strap which is worn around the wrist, and a receiver which resembles a USB key-disk. The receiver displays information comparable to that of the iPod kit on the built-in display. After a run, the receiver can be plugged straight into a USB port and the software will upload the run information automatically to the Nike+ website. As of August 2008 "Nike+iPod for the Gym" launched, allowing users to record their cardio workouts directly to their iPods. No Sport kit or shoe sensor is required; all that is needed is a compatible iPod (1st–6th generation iPod Nano or 2nd/3rd gen iPod Touch) and an enabled piece of cardio equipment. As of March 2009, the seven largest commercial equipment providers were shipping enabled equipment (Life Fitness, Technogym, Precor USA, Star Trac, Cybex International, Matrix Fitness and Free Motion). The models of compatible cardio equipment include treadmills, stationary bicycles, stair climbers, ellipticals, and others such as Precor's Adaptive Motion Trainer. Once the user syncs an iPod with iTunes, the cardio workouts are automatically stored at Nikeplus.com, where each workout is visualized and tracked based on the number of calories burned. The calories are converted to "CardioMiles", at a ratio of 100:1, allowing cardio users to take full advantage of all the tools and features of Nikeplus.com, and allow them to engage in challenges with other runners, walkers and cardio users, using a common currency. With the release of the second-generation iPod Touch in 2008, Apple Inc. included a built-in ability to receive Nike+ signals, which allowed the iPod to connect directly to the wireless sensor thus eliminating the need for an external receiver to be connected. Apple also added this capability to the iPhone 3GS (released 2009), iPhone 4 (2010), and third-generation iPod Touch (2009). Those devices use their Broadcom Bluetooth chipset to receive the signals. On June 7, 2010, Polar and Nike introduced the Polar WearLink+ that works with Nike+. This new product works with the Nike+ SportBand and the fifth generation iPod nano in conjunction with the Nike+ iPod Sport Kit. Polar WearLink+ that works with Nike+ communicates directly with the fifth generation iPod nano and Nike+ SportBand using a proprietary digital protocol but it is dual-mode so it is also compatible with most Polar training computers (all those using 5 kHz analog transmission technology). Nike+ had 18 million global users as of April 2013. One year later, Nike updated the number of global users to 28 million. In iOS 6.1.2 (and possibly higher), a hole in the compatibility for the app has allowed jailbroken iPad users to use the native Nike + iPod iPhone and iPod app by moving the app bundle and setting permissions for the app. On April 30, 2018, Nike retired services for legacy Nike wearable devices, such as the Nike+ FuelBand and the Nike+ SportWatch GPS, and previous versions of apps, including Nike Run Club and Nike Training Club version 4.X and lower. Likewise, Nike no longer supported the Nike+ Connect software that transferred data to a NikePlus Profile or the Nike+ Fuel/FuelBand and Nike+ Move apps. == Sports kit equipment == The kit consists of two pieces: a piezoelectric sensor with a Nordic Semiconductor nRF2402 transmitter that is mounted under the inner sole of the shoe and a receiver that connects to the iPod. They communicate using a 2.4 GHz wireless radio and use Nordic Semiconductor's "ShockBurst" network protocol. The wireless data is encrypted in transit, but some uniquely identifying data is sent in the plain. The wireless protocol was reverse engineered and documented by Dmitry Grinberg in 2011. Nike recommends that the shoe be a Nike+ model with a special pocket in which to place the device. Nike has released the sensor for individual sale meaning that consumers no longer have to purchase the whole set (the iPod receiver and sensor). As the sensor battery cannot be replaced, a new one must be purchased every time the battery runs out. Aftermarket solutions are available to users who do not want to use shoes with built-in or hand-made pockets for the foot sensor, such as shoe pouches and containment devices designed to affix the sensor against the shoe laces. No matter how the sensor is integrated with the user's shoes, care must be taken that it is firmly fixed in place and will not jerk around while in use, which would degrade the accuracy. == Sports kit usage == The Sports Kit can be used to track running, which it refers to as "workouts". New workouts are started by plugging the receiving unit into the iPod, then navigating through the iPod menu system. The user chooses a goal for the workout, which might be to cover a specific distance, or burn a number of calories, or work out for a specified time. A workout can also be started without a goal, which is called a "Basic Workout". When the workout goal has been set, the receiver seeks the sensor, possibly asking the user to "walk around to activate [the] sensor". The user then must press the center button on the iPod to begin the workout. Audio feedback is provided in the user's choice of generic male or female voice by the iPod over the course of the workout, depending on the type of workout chosen. For goal-oriented workouts, the feedback will correspond to significant milestones toward the goal. In a distance workout, for example, the audio feedback will inform the user as each mile or kilometer has been completed, as well as the half-way point of the workout, and a countdown of four 100-meter increments at the end of the workout. The iPod's control wheel functions change slightly during a workout. The Pause button now not only pauses the music but also the workout. Similarly, the Menu button is used to access the controls to end the workout. The Forward and Back buttons are unchanged, performing audio track skip and reverse functions. The Center button has two functions: audio feedback about the current distance, time, and pace are provided when the button is tapped once, while if the button is held down the iPod skips to the "PowerSong" - an audio track chosen by the user, generally intended for motivation. In addition to the in-workout audio feedback, there are pre-recorded congratulations provided by Lance Armstrong, Tiger Woods, Joan Benoit Samuelson, and Paula Radcliffe whenever a user achieves a personal best (such as fastest mile, fastest 5K, fastest 10K, longest run yet) or reaches certain long-term milestones (such as 250 miles, 500 kilometers). This "celebrity feedback" is heard after the usual end-of-run statistics. While the Sports Kit can be used immediately after purchase, it will report more accurate results if it is calibrated before the first usage and then regularly afterwards. For calibration, the user finds a fixed known distance of at least 0.25 mile or 400 meters and then sets the Nike+ to calibration mode for the walk or run over that distance. When the walk or run is complete, the device calibrates itself and future workout reporting will reflect statistics closer to that individual user's workout style. Consumer Reports magazine tested the device and found it accurate as long as you keep an even pace. In workouts with varied pa

    Read more →
  • Generalized distributive law

    Generalized distributive law

    The generalized distributive law (GDL) is a generalization of the distributive property which gives rise to a general message passing algorithm. It is a synthesis of the work of many authors in the information theory, digital communications, signal processing, statistics, and artificial intelligence communities. The law and algorithm were introduced in a semi-tutorial by Srinivas M. Aji and Robert J. McEliece with the same title. == Introduction == "The distributive law in mathematics is the law relating the operations of multiplication and addition, stated symbolically, a ∗ ( b + c ) = a ∗ b + a ∗ c {\displaystyle a(b+c)=ab+ac} ; that is, the monomial factor a {\displaystyle a} is distributed, or separately applied, to each term of the binomial factor b + c {\displaystyle b+c} , resulting in the product a ∗ b + a ∗ c {\displaystyle ab+ac} " – Britannica. As it can be observed from the definition, application of distributive law to an arithmetic expression reduces the number of operations in it. In the previous example the total number of operations reduced from three (two multiplications and an addition in a ∗ b + a ∗ c {\displaystyle ab+ac} ) to two (one multiplication and one addition in a ∗ ( b + c ) {\displaystyle a(b+c)} ). Generalization of distributive law leads to a large family of fast algorithms. This includes the FFT and Viterbi algorithm. This is explained in a more formal way in the example below: α ( a , b ) = d e f ∑ c , d , e ∈ A f ( a , c , b ) g ( a , d , e ) {\displaystyle \alpha (a,\,b){\stackrel {\mathrm {def} }{=}}\displaystyle \sum \limits _{c,d,e\in A}f(a,\,c,\,b)\,g(a,\,d,\,e)} where f ( ⋅ ) {\displaystyle f(\cdot )} and g ( ⋅ ) {\displaystyle g(\cdot )} are real-valued functions, a , b , c , d , e ∈ A {\displaystyle a,b,c,d,e\in A} and | A | = q {\displaystyle |A|=q} (say) Here we are "marginalizing out" the independent variables ( c {\displaystyle c} , d {\displaystyle d} , and e {\displaystyle e} ) to obtain the result. When we are calculating the computational complexity, we can see that for each q 2 {\displaystyle q^{2}} pairs of ( a , b ) {\displaystyle (a,b)} , there are q 3 {\displaystyle q^{3}} terms due to the triplet ( c , d , e ) {\displaystyle (c,d,e)} which needs to take part in the evaluation of α ( a , b ) {\displaystyle \alpha (a,\,b)} with each step having one addition and one multiplication. Therefore, the total number of computations needed is 2 ⋅ q 2 ⋅ q 3 = 2 q 5 {\displaystyle 2\cdot q^{2}\cdot q^{3}=2q^{5}} . Hence the asymptotic complexity of the above function is O ( n 5 ) {\displaystyle O(n^{5})} . If we apply the distributive law to the RHS of the equation, we get the following: α ( a , b ) = d e f ∑ c ∈ A f ( a , c , b ) ⋅ ∑ d , e ∈ A g ( a , d , e ) {\displaystyle \alpha (a,\,b){\stackrel {\mathrm {def} }{=}}\displaystyle \sum \limits _{c\in A}f(a,\,c,\,b)\cdot \sum _{d,\,e\in A}g(a,\,d,\,e)} This implies that α ( a , b ) {\displaystyle \alpha (a,\,b)} can be described as a product α 1 ( a , b ) ⋅ α 2 ( a ) {\displaystyle \alpha _{1}(a,\,b)\cdot \alpha _{2}(a)} where α 1 ( a , b ) = d e f ∑ c ∈ A f ( a , c , b ) {\displaystyle \alpha _{1}(a,b){\stackrel {\mathrm {def} }{=}}\displaystyle \sum \limits _{c\in A}f(a,\,c,\,b)} and α 2 ( a ) = d e f ∑ d , e ∈ A g ( a , d , e ) {\displaystyle \alpha _{2}(a){\stackrel {\mathrm {def} }{=}}\displaystyle \sum \limits _{d,\,e\in A}g(a,\,d,\,e)} Now, when we are calculating the computational complexity, we can see that there are q 3 {\displaystyle q^{3}} additions in α 1 ( a , b ) {\displaystyle \alpha _{1}(a,\,b)} and α 2 ( a ) {\displaystyle \alpha _{2}(a)} each and there are q 2 {\displaystyle q^{2}} multiplications when we are using the product α 1 ( a , b ) ⋅ α 2 ( a ) {\displaystyle \alpha _{1}(a,\,b)\cdot \alpha _{2}(a)} to evaluate α ( a , b ) {\displaystyle \alpha (a,\,b)} . Therefore, the total number of computations needed is q 3 + q 3 + q 2 = 2 q 3 + q 2 {\displaystyle q^{3}+q^{3}+q^{2}=2q^{3}+q^{2}} . Hence the asymptotic complexity of calculating α ( a , b ) {\displaystyle \alpha (a,b)} reduces to O ( n 3 ) {\displaystyle O(n^{3})} from O ( n 5 ) {\displaystyle O(n^{5})} . This shows by an example that applying distributive law reduces the computational complexity which is one of the good features of a "fast algorithm". == History == Some of the problems that used distributive law to solve can be grouped as follows: Decoding algorithms: A GDL like algorithm was used by Gallager's for decoding low density parity-check codes. Based on Gallager's work Tanner introduced the Tanner graph and expressed Gallagers work in message passing form. The tanners graph also helped explain the Viterbi algorithm. It is observed by Forney that Viterbi's maximum likelihood decoding of convolutional codes also used algorithms of GDL-like generality. Forward–backward algorithm: The forward backward algorithm helped as an algorithm for tracking the states in the Markov chain. And this also was used the algorithm of GDL like generality Artificial intelligence: The notion of junction trees has been used to solve many problems in AI. Also the concept of bucket elimination used many of the concepts. == The MPF problem == MPF or marginalize a product function is a general computational problem which as special case includes many classical problems such as computation of discrete Hadamard transform, maximum likelihood decoding of a linear code over a memory-less channel, and matrix chain multiplication. The power of the GDL lies in the fact that it applies to situations in which additions and multiplications are generalized. A commutative semiring is a good framework for explaining this behavior. It is defined over a set K {\displaystyle K} with operators " + {\displaystyle +} " and " . {\displaystyle .} " where ( K , + ) {\displaystyle (K,\,+)} and ( K , . ) {\displaystyle (K,\,.)} are a commutative monoids and the distributive law holds. Let p 1 , … , p n {\displaystyle p_{1},\ldots ,p_{n}} be variables such that p 1 ∈ A 1 , … , p n ∈ A n {\displaystyle p_{1}\in A_{1},\ldots ,p_{n}\in A_{n}} where A {\displaystyle A} is a finite set and | A i | = q i {\displaystyle |A_{i}|=q_{i}} . Here i = 1 , … , n {\displaystyle i=1,\ldots ,n} . If S = { i 1 , … , i r } {\displaystyle S=\{i_{1},\ldots ,i_{r}\}} and S ⊂ { 1 , … , n } {\displaystyle S\,\subset \{1,\ldots ,n\}} , let A S = A i 1 × ⋯ × A i r {\displaystyle A_{S}=A_{i_{1}}\times \cdots \times A_{i_{r}}} , p S = ( p i 1 , … , p i r ) {\displaystyle p_{S}=(p_{i_{1}},\ldots ,p_{i_{r}})} , q S = | A S | {\displaystyle q_{S}=|A_{S}|} , A = A 1 × ⋯ × A n {\displaystyle \mathbf {A} =A_{1}\times \cdots \times A_{n}} , and p = { p 1 , … , p n } {\displaystyle \mathbf {p} =\{p_{1},\ldots ,p_{n}\}} Let S = { S j } j = 1 M {\displaystyle S=\{S_{j}\}_{j=1}^{M}} where S j ⊂ { 1 , . . . , n } {\displaystyle S_{j}\subset \{1,...\,,n\}} . Suppose a function is defined as α i : A S i → R {\displaystyle \alpha _{i}:A_{S_{i}}\rightarrow R} , where R {\displaystyle R} is a commutative semiring. Also, p S i {\displaystyle p_{S_{i}}} are named the local domains and α i {\displaystyle \alpha _{i}} as the local kernels. Now the global kernel β : A → R {\displaystyle \beta :\mathbf {A} \rightarrow R} is defined as: β ( p 1 , . . . , p n ) = ∏ i = 1 M α ( p S i ) {\displaystyle \beta (p_{1},...\,,p_{n})=\prod _{i=1}^{M}\alpha (p_{S_{i}})} Definition of MPF problem: For one or more indices i = 1 , . . . , M {\displaystyle i=1,...\,,M} , compute a table of the values of S i {\displaystyle S_{i}} -marginalization of the global kernel β {\displaystyle \beta } , which is the function β i : A S i → R {\displaystyle \beta _{i}:A_{S_{i}}\rightarrow R} defined as β i ( p S i ) = ∑ p S i c ∈ A S i c β ( p ) {\displaystyle \beta _{i}(p_{S_{i}})\,=\displaystyle \sum \limits _{p_{S_{i}^{c}}\in A_{S_{i}^{c}}}\beta (p)} Here S i c {\displaystyle S_{i}^{c}} is the complement of S i {\displaystyle S_{i}} with respect to { 1 , . . . , n } {\displaystyle \mathbf {\{} 1,...\,,n\}} and the β i ( p S i ) {\displaystyle \beta _{i}(p_{S_{i}})} is called the i t h {\displaystyle i^{th}} objective function, or the objective function at S i {\displaystyle S_{i}} . It can observed that the computation of the i t h {\displaystyle i^{th}} objective function in the obvious way needs M q 1 q 2 q 3 ⋯ q n {\displaystyle Mq_{1}q_{2}q_{3}\cdots q_{n}} operations. This is because there are q 1 q 2 ⋯ q n {\displaystyle q_{1}q_{2}\cdots q_{n}} additions and ( M − 1 ) q 1 q 2 . . . q n {\displaystyle (M-1)q_{1}q_{2}...q_{n}} multiplications needed in the computation of the i th {\displaystyle i^{\text{th}}} objective function. The GDL algorithm which is explained in the next section can reduce this computational complexity. The following is an example of the MPF problem. Let p 1 , p 2 , p 3 , p 4 , {\displaystyle p_{1},\,p_{2},\,p_{3},\,p_{4},} and p 5 {\displaystyle p_{5}} be variables such that p 1 ∈ A 1 , p 2 ∈ A 2 , p 3 ∈ A 3 , p 4 ∈ A 4 , {\displaystyle p_{1}\in

    Read more →
  • Maritime Informatics

    Maritime Informatics

    Maritime Informatics is a thematic topic within the broader discipline of informatics. It can be considered as both a field of study and domain of application. As an application domain, it is the outlet of innovations originating from data science and artificial intelligence; as a field of study, it is positioned between computer science and marine engineering. == Beginnings of maritime informatics == As a result of the increasing levels of digitalisation occurring in the maritime sector starting around 2010 and stimulated by the EU-endorsed MonaLisa project for sea traffic management (STM), a number of academics and shipping industry leaders recognised that the maritime transportation sector would benefit from a specific field of study and application to be known as Maritime Informatics - the use of information systems, data sharing and data analytics in the business and operations of maritime transportation. They considered that it would lead to improvements in efficiency, safety, resilience, and ecological sustainability - all of which are currently lacking for many aspects of sea transport. One of the first public airings of the concept of Maritime Informatics was a presentation delivered on 11 September 2014 in Gothenburg, Sweden. A proposal for an inaugural minitrack on Maritime Informatics was accepted for the 2015 Americas Conference on Information Systems in Puerto Rico where three papers were presented. Since then numerous publications has been brought forward captured at www.maritimeinformatics.org and in late 2020 the first reference book on Maritime Informatics was co-written by 81 expert contributors (47 practitioners and 34 researchers) from 20 countries. Most impactful authors and journals in the domain have been documented in a review paper. Dimitrios Zissis, Luca Cazzanti and Leonardo M. Millefiori are the top three authors; top journals and conferences include Ocean Engineering, Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems, Sensors, the international Conference On Engineering, Technology And Innovation, Expert Systems With Applications, IEEE Access, and Journal of Navigation. == Background == The shipping industry has several particular organisational aspects that are recognised and taken into account in maritime informatics: It is predominantly a self-organising ecosystem Many activities are undertaken as part of episodic tight coupling There is a so-called maritime stack There is increasing pressure to balance capital productivity and energy efficiency There is the potential virtuous interplay between different types of systems == Data sharing == Digital data sharing is key to the all-important, arguably fundamental, data analytics aspects of maritime informatics because it opens the way for better access to relevant and reliable data. As in land-based commerce, digital data sharing is a growing phenomenon in maritime operations - though there is a way to go. It is enabling greater transparency for all those involved in the transportation of goods and passengers, not least being the end-customer. This leads to better and more informed decision-making and planning by all those involved. The push for digitalisation and data sharing is being pursued both by governments and the commercial sector. For example, the Member States of the IMO agreed a mandatory requirement for their governments to introduce electronic information exchange between ships and ports as from 8 April 2019. Meanwhile, commercial operators, particularly in the container lines are putting systems in place for sharing data for mutual benefit in their operations. Data sharing is an important aspect of the Port Collaborative Decision Making (PortCDM) and Port Call Optimization initiatives, both of which seek to improve the coordination, synchronization and efficiency of the port call process by enabling a common and shared situational awareness among all those involved. == Standardisation == The availability and sharing of relevant digital data underpins maritime informatics and is key to more effective and efficient coordination and synchronisation in the predominantly self-organising ecosystem that is maritime transportation. For this to occur, a high priority underpinning maritime informatics is the encouragement of standardised digital data exchange and data sharing, leading, in turn, to improvements in shipping analytics. Improved availability of data will support better historical analysis, now-casting and forecasting. The International Maritime Organization (IMO) FAL Committee is taking the lead in ensuring that the common terms used in the various standards being developed or in use in the maritime sector are compatible and therefore interoperable as far as is practicable, by creating and maintaining The IMO Compendium on Facilitation and Electronic Business. The IMO Compendium consists of an IMO Data Set and IMO Reference Data Model agreed by the main organisations involved in the development of standards for the electronic exchange of information related to the FAL Convention: the World Customs Organization (WCO), the United Nations Economic Commission for Europe (UNECE) and the International Organization for Standardization (ISO). There are several other prominent international governmental and non-governmental organisations actively contributing to the ongoing standardisation and harmonisation process including the UN Electronic Data Interchange for Administration, Commerce and Transport (UN EDIFACT), the Digital Container Shipping Association (DCSA), the International Harbour Masters Association (IHMA) and BIMCO - the world's largest direct-membership organisation for shipowners, charterers, shipbrokers and agents.

    Read more →
  • System Service Descriptor Table

    System Service Descriptor Table

    The System Service Descriptor Table (SSDT) is an internal dispatch table within Microsoft Windows. == Function == The SSDT maps syscalls to kernel function addresses. When a syscall is issued by a user space application, it contains the service index as parameter to indicate which syscall is called. The SSDT is then used to resolve the address of the corresponding function within ntoskrnl.exe. In modern Windows kernels, two SSDTs are used: One for generic routines (KeServiceDescriptorTable) and a second (KeServiceDescriptorTableShadow) for graphical routines. A parameter passed by the calling userspace application determines which SSDT shall be used. == Hooking == Modification of the SSDT allows to redirect syscalls to routines outside the kernel. These routines can be either used to hide the presence of software or to act as a backdoor to allow attackers permanent code execution with kernel privileges. For both reasons, hooking SSDT calls is often used as a technique in both Windows kernel mode rootkits and antivirus software. In 2010, many computer security products which relied on hooking SSDT calls were shown to be vulnerable to exploits using race conditions to attack the products' security checks.

    Read more →
  • Traité de Documentation

    Traité de Documentation

    Traité de documentation: le livre sur le livre, théorie et pratique is a landmark book by Belgian author Paul Otlet, first published in 1934. == Legacy == The book is considered a landmark in the history of information science, with concepts predicting the rise of the World Wide Web and search engines. In [Otlet's] most famous publication of 1934, Traité de Documentation, he wrote of a desk in the form of a wheel from which different projects (workspaces) could be switched as they rotated — foreshadowing the multiple desktops and tabs of contemporary computer interfaces. Inspired by the arrival of radio, phonograph, cinema, and television, Otlet also posited that there were as yet many “inventions to be discovered,” including the reading and annotation of remote documents and computer speech.

    Read more →
  • Ecoinformatics

    Ecoinformatics

    Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics. A few definitions have been circulating, mostly centered on the creation of tools to access and analyze natural system data. However, the scope and aims of ecoinformatics are certainly broader than the development of metadata standards to be used in documenting datasets. Ecoinformatics aims to facilitate environmental research and management by developing ways to access, integrate databases of environmental information, and develop new algorithms enabling different environmental datasets to be combined to test ecological hypotheses. Ecoinformatics is related to the concept of ecosystem services. Ecoinformatics characterize the semantics of natural system knowledge. For this reason, much of today's ecoinformatics research relates to the branch of computer science known as knowledge representation, and active ecoinformatics projects are developing links to activities such as the Semantic Web. Current initiatives to effectively manage, share, and reuse ecological data are indicative of the increasing importance of fields like ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE, Data Conservancy, and Artificial Intelligence for Environment & Sustainability. == Software Development Lifecycle == Central to the concept of ecoinformatics is the Software Development Lifecycle (SDLC), a systematic framework for writing, implementing, and maintaining software products. Typically in Ecoinformatics projects, the development pipeline includes data collection, usually from several different environmental data sources, then integrating these data sources together, and then analyzing the data. Here, each step of the SDLC is described in the context of ecoinformatics, per Michener et al. It is important to note that the plan, collect, assure, describes and preserve steps refer to the data collection entity, which can be individual researchers or large data-collection networks, while the discover, integrate, and analyze steps typically refer to the individual researcher. Plan: Ecoinformatics projects require data from several databases. Each database holds different data, and therefore researchers should identify what types of environmental or ecological data they will need to answer their research question. Collect: Data is collected in several different ways. In ecoinformatics, this is usually restricted to manually entering data into a spreadsheet, and parsing data from an existing database. The growth of relational databases has made it easier for ecologists to download relevant data and integrate datasets together Assure: Data entries should be checked thoroughly to validate their accuracy and usability, such as to check for outliers and erroneous points. The same principle applies to data downloaded from datasets. This responsibility falls on both the ecologist downloading the data, and the entity that sets up the data collection system. Describe: An accurate description of the metadata of a dataset that is used in a study should include enough information to deduce the data collection and processing methodology, when the data were collected, why the data were collected, and how the data were stored. This is important for reproducibility, especially for projects that build on each other and may recycle data Preserve: After data is collected by an institutional entity, it should be archived such that it is easily accessible. Ideally, this is in databases that are maintained and not at risk of deprecation Discover: While there are good practices for discovering data to start a research project, this process is often marred by a lack of usable, published data, as researchers may collect data specific to their study, but may not publish this data for wider use. On the data collection end, this can be addressed by better data-sharing practices, such as by linking datasets when publishing papers or studies. On the data procurement end, this can be addressed by more precise data searching, such as using key words to find relevant datasets. Integrate: Synthesizing datasets together can be difficult and labor-intensive, largely due to the methodological differences in data collection. There are several approaches to this, but the best practices typically involve computational approaches, namely using R or Python, to automate the processes and prevent errors Analyze: Data analysis can take several forms, and should be tailored to the specific ecological project. However, all data analysis methods should be well-documented, including the procedure for analysis, justification for analysis methods, and any shortcomings in a specific approach. == Applications of Ecoinformatics Across Ecology == === Ecosystem Ecology === Source: Ecosystem studies, by definition, encompass interactions across the entire life sciences spectrum, from microscopic biochemical reactions to large-scale geological phenomena. As a result, big databases may not be designed specifically for any particular research question, but should be inclusive enough to support most studies. Since ecosystem-level questions require a broad perspective, data-related ecosystem projects would likely incorporate data from several databases. A common framework for incorporating data into ecosystem-level studies is the network science model, in which data collection mechanisms and resources are treated like a large, interconnected network instead of individual entities. The network may include several data collection stations within one databases, or may span across multiple databases. Currently there are several large-scale networks, but they do not generate data on the scale to consider ecology as a big data science. A current challenge for ecoinformatics in ecosystem ecology is that most funding is prioritized for generating new data rather than maintaining existing data infrastructures. Integrating data across the different spatial scales can also be difficult, since each dataset may hold different types of data. === Urban Ecology === Source: The current push for smart cities, and sensor network integration into infrastructure, has positioned as a major source of data for ecological studies. Typical urban ecology questions address the effects of urbanization on the local ecosystem, and how to drive future development to promote urban biodiversity. While sensor networks in cities typically collect environmental data to optimize city processes, they may also be used for ecological initiatives, especially with respect to understanding the complex, multi-layered relationship between cities and their local ecosystem. It can also be used to better understand the current landscape of cities, and identify avenues for rewinding of cities. For example, analyzing mobility patterns can identify areas that may lend themselves well to building parks and green spaces. Bird watching data can also be used to identify the types of bird species in a local area. === Infectious Disease === Source: Like other disciplines of ecology, emerging infectious disease and epidemiology span multiple scales, from understanding the genetics that drive disease trends to large-scale spatiotemporal analyses. As a result, infectious disease studies can incorporate everything from bioinformatics, genetic sequences, amino acid sequences, and environmental observation data. On the micro-scale, these data can then be used to predict infectivity/transmissibility, drug resistance, drug candidates, and mutation sites. On the macro-scale, it can be used to identify societal trends or environmental factors that lend themselves to spillover, locations of infection, and practices that cause disease transmission. == Databases == Source: USGS National Streamflow sensor network GBIF Neotoma Paleobiology database European Vegetation Archive USDA Forest Inventory Analysis TRY BIEN AmeriFlux TEAM iNaturalist NEON GLEON LTER CZO TERN SAEON

    Read more →
  • Long division

    Long division

    In arithmetic, long division is a standard division algorithm suitable for dividing multi-digit numbers that is simple enough to perform by hand. It breaks down a division problem into a series of easier steps. As in all division problems, one number, called the dividend, is divided by another, called the divisor, producing a result called the quotient. It enables computations involving arbitrarily large numbers to be performed by following a series of simple steps. The abbreviated form of long division is called short division, which is almost always used instead of long division when the divisor has only one digit. == History == Related algorithms have existed since the 12th century. Al-Samawal al-Maghribi (1125–1174) performed calculations with decimal numbers that essentially require long division, leading to infinite decimal results, but without formalizing the algorithm. Caldrini (1491) is the earliest printed example of long division, known as the Danda method in medieval Italy, and it became more practical with the introduction of decimal notation for fractions by Pitiscus (1608). The specific algorithm in modern use was introduced by Henry Briggs c. 1600. == Education == Inexpensive calculators and computers have become the most common tools for performing division in educational and professional contexts worldwide, reducing reliance on traditional paper-and-pencil techniques. Internally, these devices implement various division algorithms, many of which rely on iterative approximations and multiplication to improve computational efficiency. Educational approaches to teaching division vary across countries and regions, reflecting differing curricular priorities. In North America, long division has been de-emphasized or, in some cases, removed from portions of the curriculum as part of reform mathematics, which emphasizes conceptual understanding and the use of technology. In contrast, many education systems in Europe and Asia continue to emphasize mastery of standard algorithms, including long division, as a foundational arithmetic skill. For example, curricula in countries such as Japan and Germany typically introduce and reinforce long division during primary education, often alongside mental arithmetic strategies and problem-solving techniques. International assessments such as the Trends in International Mathematics and Science Study (TIMSS) highlight these differences, showing variation in how procedural fluency and conceptual understanding are balanced across educational systems. These differing approaches reflect broader educational philosophies regarding the balance between procedural fluency, conceptual understanding, and the role of technology in mathematics education. == Method == In English-speaking countries, long division does not use the division slash ⟨∕⟩ or division sign ⟨÷⟩ symbols but instead constructs a tableau. The divisor is separated from the dividend by a right parenthesis ⟨)⟩ or vertical bar ⟨|⟩; the dividend is separated from the quotient by a vinculum (i.e., an overbar). The combination of these two symbols is sometimes known as a long division symbol, division bracket, or even a bus stop. It developed in the 18th century from an earlier single-line notation separating the dividend from the quotient by a left parenthesis. The process is begun by dividing the left-most digit of the dividend by the divisor. The quotient (rounded down to an integer) becomes the first digit of the result, and the remainder is calculated (this step is notated as a subtraction). This remainder carries forward when the process is repeated on the following digit of the dividend (notated as 'bringing down' the next digit to the remainder). When all digits have been processed and no remainder is left, the process is complete. An example is shown below, representing the division of 500 by 4 (with a result of 125). 125 (Explanations) 4)500 4 ( 4 × 1 = 4) 10 ( 5 - 4 = 1) 8 ( 4 × 2 = 8) 20 (10 - 8 = 2) 20 ( 4 × 5 = 20) 0 (20 - 20 = 0) A more detailed breakdown of the steps goes as follows: Find the shortest sequence of digits starting from the left end of the dividend, 500, that the divisor 4 goes into at least once. In this case, this is simply the first digit, 5. The largest number that the divisor 4 can be multiplied by without exceeding 5 is 1, so the digit 1 is put above the 5 to start constructing the quotient. Next, the 1 is multiplied by the divisor 4, to obtain the largest whole number that is a multiple of the divisor 4 without exceeding the 5 (4 in this case). This 4 is then placed under and subtracted from the 5 to get the remainder, 1, which is placed under the 4 under the 5. Afterwards, the first as-yet unused digit in the dividend, in this case the first digit 0 after the 5, is copied directly underneath itself and next to the remainder 1, to form the number 10. At this point the process is repeated enough times to reach a stopping point: The largest number by which the divisor 4 can be multiplied without exceeding 10 is 2, so 2 is written above as the second leftmost quotient digit. This 2 is then multiplied by the divisor 4 to get 8, which is the largest multiple of 4 that does not exceed 10; so 8 is written below 10, and the subtraction 10 minus 8 is performed to get the remainder 2, which is placed below the 8. The next digit of the dividend (the last 0 in 500) is copied directly below itself and next to the remainder 2 to form 20. Then the largest number by which the divisor 4 can be multiplied without exceeding 20, which is 5, is placed above as the third leftmost quotient digit. This 5 is multiplied by the divisor 4 to get 20, which is written below and subtracted from the existing 20 to yield the remainder 0, which is then written below the second 20. At this point, since there are no more digits to bring down from the dividend and the last subtraction result was 0, we can be assured that the process finished. If the last remainder when we ran out of dividend digits had been something other than 0, there would have been two possible courses of action: We could just stop there and say that the dividend divided by the divisor is the quotient written at the top with the remainder written at the bottom, and write the answer as the quotient followed by a fraction that is the remainder divided by the divisor. We could extend the dividend by writing it as, say, 500.000... and continue the process (using a decimal point in the quotient directly above the decimal point in the dividend), in order to get a decimal answer, as in the following example. 31.75 4)127.00 12 (12 ÷ 4 = 3) 07 (0 remainder, bring down next figure) 4 (7 ÷ 4 = 1 r 3) 3.0 (bring down 0 and the decimal point) 2.8 (7 × 4 = 28, 30 ÷ 4 = 7 r 2) 20 (an additional zero is brought down) 20 (5 × 4 = 20) 0 In this example, the decimal part of the result is calculated by continuing the process beyond the units digit, "bringing down" zeros as being the decimal part of the dividend. This example also illustrates that, at the beginning of the process, a step that produces a zero can be omitted. Since the first digit 1 is less than the divisor 4, the first step is instead performed on the first two digits 12. Similarly, if the divisor were 13, one would perform the first step on 127 rather than 12 or 1. === Basic procedure for long division of n ÷ m === Find the location of all decimal points in the dividend n and divisor m. If necessary, simplify the long division problem by moving the decimals of the divisor and dividend by the same number of decimal places, to the right (or to the left), so that the decimal of the divisor is to the right of the last digit. When doing long division, keep the numbers lined up straight from top to bottom under the tableau. After each step, be sure the remainder for that step is less than the divisor. If it is not, there are three possible problems: the multiplication is wrong, the subtraction is wrong, or a greater quotient is needed. In the end, the remainder, r, is added to the growing quotient as a fraction, r⁄m. === Invariant property and correctness === The basic presentation of the steps of the process (above) focuses on what steps are to be performed, rather than the properties of those steps that ensure the result will be correct (specifically, that q × m + r = n, where q is the final quotient and r the final remainder). A slight variation of presentation requires more writing, and requires that we change, rather than just update, digits of the quotient, but can shed more light on why these steps actually produce the right answer by allowing evaluation of q × m + r at intermediate points in the process. This illustrates the key property used in the derivation of the algorithm (below). Specifically, we amend the above basic procedure so that we fill the space after the digits of the quotient under construction with 0's, to at least the 1's place, and include those 0's in the numbers we write below the division bra

    Read more →
  • Uniform convergence in probability

    Uniform convergence in probability

    Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family uniformly converge to their theoretical probabilities. Uniform convergence in probability has applications to statistics as well as machine learning as part of statistical learning theory. Specifically, the Glivenko-Cantelli theorem and the homonymous classes of functions are fundamentally related to uniform convergence. The law of large numbers says that, for each single event A {\displaystyle A} , its empirical frequency in a sequence of independent trials converges (with high probability) to its theoretical probability. In many application however, the need arises to judge simultaneously the probabilities of events of an entire class S {\displaystyle S} from one and the same sample. Moreover, it, is required that the relative frequency of the events converge to the probability uniformly over the entire class of events S {\displaystyle S} . The Uniform Convergence Theorem gives a sufficient condition for this convergence to hold. Roughly, if the event-family is sufficiently simple (its VC dimension is sufficiently small) then uniform convergence holds. == Definitions == For a class of predicates H {\displaystyle H} defined on a set X {\displaystyle X} and a set of samples x = ( x 1 , x 2 , … , x m ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{m})} , where x i ∈ X {\displaystyle x_{i}\in X} , the empirical frequency of h ∈ H {\displaystyle h\in H} on x {\displaystyle x} is Q ^ x ( h ) = 1 m | { i : 1 ≤ i ≤ m , h ( x i ) = 1 } | . {\displaystyle {\widehat {Q}}_{x}(h)={\frac {1}{m}}|\{i:1\leq i\leq m,h(x_{i})=1\}|.} The theoretical probability of h ∈ H {\displaystyle h\in H} is defined as Q P ( h ) = P { y ∈ X : h ( y ) = 1 } . {\displaystyle Q_{P}(h)=P\{y\in X:h(y)=1\}.} The Uniform Convergence Theorem states, roughly, that if H {\displaystyle H} is "simple" and we draw samples independently (with replacement) from X {\displaystyle X} according to any distribution P {\displaystyle P} , then with high probability, the empirical frequency will be close to its expected value, which is the theoretical probability. Here "simple" means that the Vapnik–Chervonenkis dimension of the class H {\displaystyle H} is small relative to the size of the sample. In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole. The Uniform Convergence Theorem was first proved by Vapnik and Chervonenkis using the concept of growth function. == Uniform Convergence Theorem == The statement of the Uniform Convergence Theorem is as follows: If H {\displaystyle H} is a set of { 0 , 1 } {\displaystyle \{0,1\}} -valued functions defined on a set X {\displaystyle X} and P {\displaystyle P} is a probability distribution on X {\displaystyle X} then for ε > 0 {\displaystyle \varepsilon >0} and m {\displaystyle m} a positive integer, we have: P m { | Q P ( h ) − Q x ^ ( h ) | ≥ ε for some h ∈ H } ≤ 4 Π H ( 2 m ) e − ε 2 m / 8 . {\displaystyle P^{m}\{|Q_{P}(h)-{\widehat {Q_{x}}}(h)|\geq \varepsilon {\text{ for some }}h\in H\}\leq 4\Pi _{H}(2m)e^{-\varepsilon ^{2}m/8}.} In the above, for any x ∈ X m , {\displaystyle x\in X^{m},} Q P ( h ) = P { ( y ∈ X : h ( y ) = 1 } , {\displaystyle Q_{P}(h)=P\{(y\in X:h(y)=1\},} Q ^ x ( h ) = 1 m | { i : 1 ≤ i ≤ m , h ( x i ) = 1 } | {\displaystyle {\widehat {Q}}_{x}(h)={\frac {1}{m}}|\{i:1\leq i\leq m,h(x_{i})=1\}|} and | x | = m . {\displaystyle |x|=m.} P m {\displaystyle P^{m}} indicates that the probability is taken over x {\displaystyle x} consisting of m {\displaystyle m} i.i.d. draws from the distribution P . {\displaystyle P.} Finally, the growth function Π H {\displaystyle \Pi _{H}} is defined in the following way, for any { 0 , 1 } {\displaystyle \{0,1\}} -valued functions H {\displaystyle H} over X {\displaystyle X} and for any natural number m {\displaystyle m} : Π H ( m ) = max | { h ∩ D : D ⊆ X , | D | = m , h ∈ H } | . {\displaystyle \Pi _{H}(m)=\max |\{h\cap D:D\subseteq X,|D|=m,h\in H\}|.} From the point of view of Learning Theory one can consider H {\displaystyle H} to be the Concept/Hypothesis class defined over the instance set X {\displaystyle X} . Crucially, the Sauer–Shelah lemma implies that Π H ( m ) ≤ m d {\displaystyle \Pi _{H}(m)\leq m^{d}} , where d {\displaystyle d} is the VC dimension of H {\displaystyle H} . == Proof of the Uniform Convergence Theorem == and are the sources of the proof below. Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof. Symmetrization: We transform the problem of analyzing | Q P ( h ) − Q ^ x ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{x}(h)|\geq \varepsilon } into the problem of analyzing | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} , where r {\displaystyle r} and s {\displaystyle s} are i.i.d samples of size m {\displaystyle m} drawn according to the distribution P {\displaystyle P} . One can view r {\displaystyle r} as the original randomly drawn sample of length m {\displaystyle m} , while s {\displaystyle s} may be thought as the testing sample which is used to estimate Q P ( h ) {\displaystyle Q_{P}(h)} . Permutation: Since r {\displaystyle r} and s {\displaystyle s} are picked identically and independently, so swapping elements between them will not change the probability distribution on r {\displaystyle r} and s {\displaystyle s} . So, we will try to bound the probability of | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} for some h ∈ H {\displaystyle h\in H} by considering the effect of a specific collection of permutations of the joint sample x = r | | s {\displaystyle x=r||s} . Specifically, we consider permutations σ ( x ) {\displaystyle \sigma (x)} which swap x i {\displaystyle x_{i}} and x m + i {\displaystyle x_{m+i}} in some subset of 1 , 2 , . . . , m {\displaystyle {1,2,...,m}} . The symbol r | | s {\displaystyle r||s} means the concatenation of r {\displaystyle r} and s {\displaystyle s} . Reduction to a finite class: We can now restrict the function class H {\displaystyle H} to a fixed joint sample and hence, if H {\displaystyle H} has finite VC Dimension, it reduces to the problem to one involving a finite function class. We present the technical details of the proof. It should be stressed that this proof glosses over details like the measurability of the events V {\displaystyle V} and R {\displaystyle R} ; measurability is granted in the case of H {\displaystyle H} being finite or countable, but this is not normally the case in standard applications of the theorem (e.g. for statistical learning theory or to prove the Glivenko-Cantelli theorem). To get measurability, one needs to use a notion of separability of the underlying space, possibly related to H {\displaystyle H} . === Symmetrization === Lemma: Let V = { x ∈ X m : | Q P ( h ) − Q ^ x ( h ) | ≥ ε for some h ∈ H } {\displaystyle V=\{x\in X^{m}:|Q_{P}(h)-{\widehat {Q}}_{x}(h)|\geq \varepsilon {\text{ for some }}h\in H\}} and R = { ( r , s ) ∈ X m × X m : | Q r ^ ( h ) − Q ^ s ( h ) | ≥ ε / 2 for some h ∈ H } . {\displaystyle R=\{(r,s)\in X^{m}\times X^{m}:|{\widehat {Q_{r}}}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2{\text{ for some }}h\in H\}.} Then for m ≥ 2 ε 2 {\displaystyle m\geq {\frac {2}{\varepsilon ^{2}}}} , P m ( V ) ≤ 2 P 2 m ( R ) {\displaystyle P^{m}(V)\leq 2P^{2m}(R)} . Proof: By the triangle inequality, if | Q P ( h ) − Q ^ r ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon } and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2} then | Q ^ r ( h ) − Q ^ s ( h ) | ≥ ε / 2 {\displaystyle |{\widehat {Q}}_{r}(h)-{\widehat {Q}}_{s}(h)|\geq \varepsilon /2} . Therefore, P 2 m ( R ) ≥ P 2 m { ∃ h ∈ H , | Q P ( h ) − Q ^ r ( h ) | ≥ ε and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 } = ∫ V P m { s : ∃ h ∈ H , | Q P ( h ) − Q ^ r ( h ) | ≥ ε and | Q P ( h ) − Q ^ s ( h ) | ≤ ε / 2 } d P m ( r ) = A {\displaystyle {\begin{aligned}&P^{2m}(R)\\[5pt]\geq {}&P^{2m}\{\exists h\in H,|Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon {\text{ and }}|Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2\}\\[5pt]={}&\int _{V}P^{m}\{s:\exists h\in H,|Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon {\text{ and }}|Q_{P}(h)-{\widehat {Q}}_{s}(h)|\leq \varepsilon /2\}\,dP^{m}(r)\\[5pt]={}&A\end{aligned}}} since r {\displaystyle r} and s {\displaystyle s} are independent. Now for r ∈ V {\displaystyle r\in V} fix an h ∈ H {\displaystyle h\in H} such that | Q P ( h ) − Q ^ r ( h ) | ≥ ε {\displaystyle |Q_{P}(h)-{\widehat {Q}}_{r}(h)|\geq \varepsilon } . For this h {\displaystyle h} , we shall

    Read more →
  • ARMA International

    ARMA International

    ARMA International (formerly the Association of Records Managers and Administrators) is an American not-for-profit professional association for information professionals – primarily information management (including records management) and information governance, and related industry practitioners and vendors. The association provides educational opportunities and publications covering aspects of information management broadly. == History == The Association was founded in 1955. In 1975, the Association of Records Executives and Administrators (AREA) and the American Records Management Association merged to form ARMA International. The headquarters for ARMA International is located in Overland Park, Kansas. == Operations == ARMA International services professionals in the United States, Canada, Japan, and the United Kingdom. Its members include records managers, attorneys, information technology professionals, consultants, and archivists involved in various aspects of managing records and information assets. ARMA hosts an annual conference with the goal of bringing together record and information management professionals from around the world – In 2023, ARMA hosted conferences in both the United States and Canada. Topics addressed in the 120+ educational sessions include advanced technology, creating information structure, ediscovery and information law, information management fundamentals, information project management, and reducing organizational information risk. The expo features exhibitors displaying records and information technologies, products, and services.

    Read more →
  • Uncertain database

    Uncertain database

    An uncertain database is a kind of database studied in database theory. The goal of uncertain databases is to manage information on which there is some uncertainty. Uncertain databases make it possible to explicitly represent and manage uncertainty on the data, usually in a succinct way. == Formal definition == At the basis of uncertain databases is the notion of possible world. Specifically, a possible world of an uncertain database is a (certain) database which is one of the possible realizations of the uncertain database. A given uncertain database typically has more than one, and potentially infinitely many, possible worlds. A formalism to represent uncertain databases then explains how to succinctly represent a set of possible worlds into one uncertain database. == Types of uncertain databases == Uncertain database models differ in how they represent and quantify these possible worlds: Incomplete databases are a compact representation of the set of possible worlds – the use of NULL in SQL, arguably the most commonplace instantiation of uncertain databases, is an example of incomplete database model. Probabilistic databases are a compact representation of a probability distribution over the set of possible worlds. Fuzzy databases are a compact representation of a fuzzy set of the possible worlds. Though mostly studied in the relational setting, uncertain database models can also be defined in other relational models such as graph databases or XML databases. === Incomplete database === The most common database model is the relational model. Multiple incomplete database models have been defined over the relational model, that form extensions to the relational algebra. These have been called Imieliński–Lipski algebras: Relations with NULL values, also called Codd tables c-tables v-tables === Example === The following table is a relation of an incomplete database, described in the formalism of NULL values: There are infinitely many possible worlds for this incomplete database, obtained by replacing the "NULL" values with concrete values. For instance, the following relation is a possible world:

    Read more →