Tanagra (machine learning)

Tanagra (machine learning)

Tanagra is a free suite of machine learning software for research and academic purposes developed by Ricco Rakotomalala at the Lumière University Lyon 2, France. Tanagra supports several standard data mining tasks such as: Visualization, Descriptive statistics, Instance selection, feature selection, feature construction, regression, factor analysis, clustering, classification and association rule learning. Tanagra is an academic project. It is widely used in French-speaking universities. Tanagra is frequently used in real studies and in software comparison papers. == History == The development of Tanagra was started in June 2003. The first version was distributed in December 2003. Tanagra is the successor of Sipina, another free data mining tool which is intended only for supervised learning tasks (classification), especially the interactive and visual construction of decision trees. Sipina is still available online and is maintained. Tanagra is an "open source project" as every researcher can access the source code and add their own algorithms, as long as they agree and conform to the software distribution license. The main purpose of the Tanagra project is to give researchers and students a user-friendly data mining software, conforming to the present norms of the software development in this domain (especially in the design of its GUI and the way to use it), and allowing the analyzation of either real or synthetic data. From 2006, Ricco Rakotomalala made an important documentation effort. A large number of tutorials are published on a dedicated website. They describe the statistical and machine learning methods and their implementation with Tanagra on real case studies. The use of other free data mining tools on the same problems is also widely described. The comparison of the tools enables readers to understand the possible differences in the presentation of results. == Description == Tanagra works similarly to current data mining tools. The user can design visually a data mining process in a diagram. Each node is a statistical or machine learning technique, the connection between two nodes represents the data transfer. But unlike the majority of tools which are based on the workflow paradigm, Tanagra is very simplified. The treatments are represented in a tree diagram. The results are displayed in an HTML format. This makes it is easy to export the outputs in order to visualize the results in a browser. It is also possible to copy the result tables to a spreadsheet. Tanagra makes a good compromise between statistical approaches (e.g. parametric and nonparametric statistical tests), multivariate analysis methods (e.g. factor analysis, correspondence analysis, cluster analysis, regression) and machine learning techniques (e.g. neural network, support vector machine, decision trees, random forest).

AppValley

AppValley is an independent American digital distribution service operated and trademarked by AppValley LLC. It serves as an alternative app store for the iOS mobile operating system, which allows users to download applications that are not available on the App Store, most commonly tweaked "++" apps, jailbreak apps, and apps including paid apps on the app store. == Legality == AppValley is among several services that violate enterprise developer certificates from Apple. The terms under which these are granted make clear that they are for companies who wish to distribute apps to their employees. AppValley uses these certificates to distribute software directly to non-employees, thereby bypassing the AppStore. AppValley's conduct had implications in U.S. sanctioned markets like Iran, Iraq, North Korea, Cuba, and Venezuela, which have all been subject to commercial sanctions. Among the software offered by AppValley and other services is pirated software, including paid apps on the app store and premium versions of Instagram, Spotify, Pokémon Go, and others. For instance, AppValley distributes an ad-free version of the music streaming app Spotify even on the free tier. == History == The website was founded in May 2017, releasing late that month with a very basic version of the app. There were less than 100 apps available for download at this time. On Jan 19, 2018, a new version dubbed AppValley 2.0 was released bringing dark mode, more categories, a search, and a much faster interface. On February 14, 2019, a Chinese partner "Jason Wu" allegedly took control of the main Twitter account and domain, causing the original AppValley developers to migrate to the domain app-valley.vip and the Twitter account handle @App_Valley_vip. As of September 2024, the app-valley.vip domain now redirects to appvalley.signulous.com. Today, AppValley continues to offer an alternative to Apple's App Store where app developers can publish their applications. == Features == AppValley is a mobile app installer which can also support iOS version that can be installed and downloaded on the mobile or the devices of the people who wish to get access to many different applications available. AppValley also contains apps that have been modified or tweaked for user preferences, and allows the user to by pass national restrictions on the use of apps, without having to resort to jailbreaking. As of June 2, 2020, there are over 1300 apps available for download.

Algorithm

In mathematics and computer science, an algorithm ( ) is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes (referred to as automated decision-making) and deduce valid inferences (referred to as automated reasoning). In contrast, a heuristic is an approach to solving problems without well-defined correct or optimal results. For example, although social media recommender systems are commonly called "algorithms", they actually rely on heuristics as there is no truly "correct" recommendation. As an effective method, an algorithm can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. Starting from an initial state and input, a computation occurs at each step, eventually producing output and terminating. The transition between states can be non-deterministic; randomized algorithms incorporate random input. == Etymology == Around 825 AD, Persian scientist and polymath Muḥammad ibn Mūsā al-Khwārizmī wrote kitāb al-ḥisāb al-hindī ("Book of Indian computation") and kitab al-jam' wa'l-tafriq al-ḥisāb al-hindī ("Addition and subtraction in Indian arithmetic"). In the early 12th century, Latin translations of these texts involving the Hindu–Arabic numeral system and arithmetic appeared, for example Liber Alghoarismi de practica arismetrice, attributed to John of Seville, and Liber Algoritmi de numero Indorum, attributed to Adelard of Bath. Here, alghoarismi or algoritmi is the Latinization of Al-Khwarizmi's name; the text starts with the phrase Dixit Algoritmi, or "Thus spoke Al-Khwarizmi". The word algorism in English came to mean the use of place-value notation in calculations; it occurs in the Ancrene Wisse from circa 1225. By the time Geoffrey Chaucer wrote The Canterbury Tales in the late 14th century, he used a variant of the same word in describing augrym stones, stones used for place-value calculation. In the 15th century, under the influence of the Greek word ἀριθμός (arithmos, "number"; cf. "arithmetic"), the Latin word was altered to algorithmus. By 1596, this form of the word was used in English, as algorithm, by Thomas Hood. == Definition == One informal definition is "a set of rules that precisely defines a sequence of operations", which would include all computer programs, and any bureaucratic procedure or cook-book recipe. In general, a program is an algorithm only if it stops eventually. Formally, algorithm is an explicit set of instructions to produce an output, that can be followed by a computer or a human performing specific operations on symbols.. == History == === Ancient algorithms === Step-by-step procedures for solving mathematical problems have been recorded since antiquity. This includes in Babylonian mathematics (around 2500 BC), Egyptian mathematics (around 1550 BC), Indian mathematics (around 800 BC and later), the Ifa Oracle (around 500 BC), Greek mathematics (around 240 BC), Chinese mathematics (around 200 BC and later), and Arabic mathematics (around 800 AD). The earliest evidence of algorithms is found in ancient Mesopotamian mathematics. A Sumerian clay tablet found in Shuruppak near Baghdad and dated to c. 2500 BC describes the earliest division algorithm. During the Hammurabi dynasty c. 1800 – c. 1600 BC, Babylonian clay tablets described algorithms for computing formulas. Algorithms were also used in Babylonian astronomy. Babylonian clay tablets describe and employ algorithmic procedures to compute the time and place of significant astronomical events. Algorithms for arithmetic are also found in ancient Egyptian mathematics, dating back to the Rhind Mathematical Papyrus c. 1550 BC. Algorithms were later used in ancient Hellenistic mathematics. Two examples are the Sieve of Eratosthenes, which was described in the Introduction to Arithmetic by Nicomachus, and the Euclidean algorithm, which was first described in Euclid's Elements (c. 300 BC).Examples of ancient Indian mathematics included the Shulba Sutras, the Kerala School, and the Brāhmasphuṭasiddhānta. In the 9th century, Muḥammad ibn Mūsā al-Khwārizmī revolutionized the field by establishing the algorithm as a systematic, finite sequence of logical steps to solve mathematical problems. In his influential work, The Compendious Book on Calculation by Completion and Balancing, he moved beyond specific numerical solutions to introduce general procedures for algebraic reduction and balancing. This transformed mathematics into a 'mechanical' process of well-defined rules—a fundamental shift that laid the groundwork for modern algorithmic theory. The Latin translation of his arithmetic treatise, titled Algoritmi de numero Indorum, led to the term algorithm being derived from the Latinization of his name, Algoritmi, specifically to describe this new rule-based approach to mathematics. The first cryptographic algorithm for deciphering encrypted code was developed by Al-Kindi, a 9th-century Arab mathematician, in A Manuscript On Deciphering Cryptographic Messages. He gave the first description of cryptanalysis by frequency analysis, the earliest codebreaking algorithm. === Computers === ==== Weight-driven clocks ==== Weight-driven clocks were a key European invention in Middle Ages, specifically the verge escapement mechanism producing the tick of mechanical clocks. Accurate automatic machines led to mechanical automata in the 13th century and computational machines—the difference and analytical engines of Charles Babbage and Ada Lovelace in the mid-19th century. Lovelace designed the first algorithm intended for a computer, Babbage's analytical engine, the first real Turing-complete computer, more than the mechanical calculators of the time. Although the full implementation of Babbage's second device was only built decades after her lifetime, Lovelace has been called "history's first programmer". ==== Electromechanical relay ==== The Jacquard loom, a precursor to punch cards, and telephone switching machines led to the development of the first computers. By the mid-19th century, the telegraph, was in use throughout the world. By the late 19th century, ticker tape (c. 1870s) and punch cards (c. 1890) were developed. Then came the teleprinter (c. 1910) with its punched-paper use of Baudot code on tape. Telephone-switching networks of electromechanical relays were invented in 1835. These led to the invention of the digital adding device by George Stibitz in 1937. While working in Bell Laboratories, he observed the "burdensome" use of mechanical calculators with gears, prompting him to experiment create an experimental digital adder at home. === Formalization === In 1928, a partial formalization of the modern concept of algorithms began with attempts to solve David Hilbert's Entscheidungsproblem (decision problem). Later formalizations were framed as attempts to define "effective calculability" or "effective method". Those formalizations included the Gödel–Herbrand–Kleene recursive functions of 1930, 1934 and 1935, Alonzo Church's lambda calculus of 1936, Emil Post's Formulation 1 of 1936, and Alan Turing's Turing machines of 1936–37 and 1939. === Modern Algorithms === For decades, it was assumed that algorithm evolution progresses from heuristics to formal algorithms. A Symbolic integration provides a classic illustration. In 1961, James Slagle’s program SAINT used heuristics to solve 52 of 54 freshman calculus exercises from an MIT textbook (≈96%). In 1967, Larry Moses’s SIN refined the heuristics and achieved 100% success, though it remained heuristic. Finally, in 1969, Robert Risch introduced the Risch Algorithm with formal guarantees. This trajectory defined the traditional path: heuristics evolving until a definitive, guaranteed algorithm emerged. However, the rise of transformer-based AI has inverted this sequence — classical algorithms are now being displaced by heuristics once again. Algorithms have evolved and improved in many ways as time goes on. Common uses of algorithms today include social media apps like Instagram and YouTube. Algorithms are used as a way to analyze what people like and push more of those things to the people who interact with them. Quantum computing uses quantum algorithm procedures to solve problems faster. More recently, in 2024, NIST updated their post-quantum encryption standards, which includes new encryption algorithms to enhance defenses against attacks using quantum computing. == Representations == Algorithms can be expressed in many kinds of notation, including natural languages, pseudocode, flowcharts, drakon-charts, programming languages or control tables. Natural language expressions of algorithms tend to be verbose and ambiguous and are rarely used for complex or technical algor

Storage area network

A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN). Although a SAN provides only block-level access, file systems built on top of SANs do provide file-level access and are known as shared-disk file systems. Newer SAN configurations enable hybrid SAN and allow traditional block storage that appears as local storage but also object storage for web services through APIs. == Storage architectures == Storage area networks (SANs) are sometimes referred to as network behind the servers and historically developed out of a centralized data storage model, but with its own data network. A SAN is, at its simplest, a dedicated network for data storage. In addition to storing data, SANs allow for the automatic backup of data, and the monitoring of the storage as well as the backup process. A SAN is a combination of hardware and software. It grew out of data-centric mainframe architectures, where clients in a network can connect to several servers that store different types of data. To scale storage capacities as the volumes of data grew, direct-attached storage (DAS) was developed, where disk arrays or just a bunch of disks (JBODs) were attached to servers. In this architecture, storage devices can be added to increase storage capacity. However, the server through which the storage devices are accessed is a single point of failure, and a large part of the LAN network bandwidth is used for accessing, storing and backing up data. To solve the single point of failure issue, a direct-attached shared storage architecture was implemented, where several servers could access the same storage device. DAS was the first network storage system and is still widely used where data storage requirements are not very high. Out of it developed the network-attached storage (NAS) architecture, where one or more dedicated file server or storage devices are made available in a LAN. Therefore, the transfer of data, particularly for backup, still takes place over the existing LAN. If more than a terabyte of data was stored at any one time, LAN bandwidth became a bottleneck. Therefore, SANs were developed, where a dedicated storage network was attached to the LAN, and terabytes of data are transferred over a dedicated high speed and bandwidth network. Within the SAN, storage devices are interconnected. Transfer of data between storage devices, such as for backup, happens behind the servers and is meant to be transparent. In a NAS architecture data is transferred using the TCP and IP protocols over Ethernet. Distinct protocols were developed for SANs, such as Fibre Channel, iSCSI, Infiniband. Therefore, SANs often have their own network and storage devices, which have to be bought, installed, and configured. This makes SANs inherently more expensive than NAS architectures. == Components == SANs have their own networking devices, such as SAN switches. To access the SAN, so-called SAN servers are used, which in turn connect to SAN host adapters. Within the SAN, a range of data storage devices may be interconnected, such as SAN-capable disk arrays, JBODs and tape libraries. === Host layer === Servers that allow access to the SAN and its storage devices are said to form the host layer of the SAN. Such servers have host adapters, which are cards that attach to slots on the server motherboard (usually PCI slots) and run with a corresponding firmware and device driver. Through the host adapters the operating system of the server can communicate with the storage devices in the SAN. In Fibre channel deployments, a cable connects to the host adapter through the gigabit interface converter (GBIC). GBICs are also used on switches and storage devices within the SAN, and they convert digital bits into light impulses that can then be transmitted over the Fibre Channel cables. Conversely, the GBIC converts incoming light impulses back into digital bits. The predecessor of the GBIC was called gigabit link module (GLM). === Fabric layer === The fabric layer consists of SAN networking devices that include SAN switches, routers, protocol bridges, gateway devices, and cables. SAN network devices move data within the SAN, or between an initiator, such as an HBA port of a server, and a target, such as the port of a storage device. When SANs were first built, hubs were the only devices that were Fibre Channel capable, but Fibre Channel switches were developed and hubs are now rarely found in SANs. Switches have the advantage over hubs that they allow all attached devices to communicate simultaneously, as a switch provides a dedicated link to connect all its ports with one another. When SANs were first built, Fibre Channel had to be implemented over copper cables, these days multimode optical fibre cables are used in SANs. SANs are usually built with redundancy, so SAN switches are connected with redundant links. SAN switches connect the servers with the storage devices and are typically non-blocking allowing transmission of data across all attached wires at the same time. SAN switches are for redundancy purposes set up in a meshed topology. A single SAN switch can have as few as 8 ports and up to 32 ports with modular extensions. So-called director-class switches can have as many as 128 ports. In switched SANs, the Fibre Channel switched fabric protocol FC-SW-6 is used under which every device in the SAN has a hardcoded World Wide Name (WWN) address in the host bus adapter (HBA). If a device is connected to the SAN its WWN is registered in the SAN switch name server. In place of a WWN, or worldwide port name (WWPN), SAN Fibre Channel storage device vendors may also hardcode a worldwide node name (WWNN). The ports of storage devices often have a WWN starting with 5, while the bus adapters of servers start with 10 or 21. === Storage layer === The serialized Small Computer Systems Interface (SCSI) protocol is often used on top of the Fibre Channel switched fabric protocol in servers and SAN storage devices. The Internet Small Computer Systems Interface (iSCSI) over Ethernet and the Infiniband protocols may also be found implemented in SANs, but are often bridged into the Fibre Channel SAN. However, Infiniband and iSCSI storage devices, in particular, disk arrays, are available. The various storage devices in a SAN are said to form the storage layer. It can include a variety of hard disk and magnetic tape devices that store data. In SANs, disk arrays are joined through a RAID which makes a lot of hard disks look and perform like one big storage device. Every storage device, or even partition on that storage device, has a logical unit number (LUN) assigned to it. This is a unique number within the SAN. Every node in the SAN, be it a server or another storage device, can access the storage by referencing the LUN. The LUNs allow for the storage capacity of a SAN to be segmented and for the implementation of access controls. A particular server, or a group of servers, may, for example, be only given access to a particular part of the SAN storage layer, in the form of LUNs. When a storage device receives a request to read or write data, it will check its access list to establish whether the node, identified by its LUN, is allowed to access the storage area, also identified by a LUN. LUN masking is a technique whereby the host bus adapter and the SAN software of a server restrict the LUNs for which commands are accepted. In doing so LUNs that should never be accessed by the server are masked. Another method to restrict server access to particular SAN storage devices is fabric-based access control, or zoning, which is enforced by the SAN networking devices and servers. Under zoning, server access is restricted to storage devices that are in a particular SAN zone. == Network protocols == A mapping layer to other protocols is used to form a network: ATA over Ethernet (AoE), mapping of AT Attachment (ATA) over Ethernet Fibre Channel Protocol (FCP), a mapping of SCSI over Fibre Channel Fibre Channel over Ethernet (FCoE) ESCON over Fibre Channel (FICON), used by mainframe computers HyperSCSI, mapping of SCSI over Ethernet iFCP or SANoIP mapping of FCP over IP iSCSI, mapping of SCSI over TCP/IP iSCSI Extensions for RDMA (iSER), mapping of iSCSI over InfiniBand Network block device, mapping device node requests on UNIX-like systems over stream sockets like TCP/IP SCSI RDMA Protocol (SRP), another SCSI implementation for remote direct memory access (RDMA) transports Storage networks may also be built using Serial Attached SCSI (SAS) and Serial ATA (SATA) technologies. SAS evolved from SCSI direct-attached storage. SATA evolved from Para

Predictor–corrector method

In numerical analysis, predictor–corrector methods belong to a class of algorithms designed to integrate ordinary differential equations – to find an unknown function that satisfies a given differential equation. All such algorithms proceed in two steps: The initial, "prediction" step, starts from a function fitted to the function-values and derivative-values at a preceding set of points to extrapolate ("anticipate") this function's value at a subsequent, new point. The next, "corrector" step refines the initial approximation by using the predicted value of the function and another method to interpolate that unknown function's value at the same subsequent point. == Predictor–corrector methods for solving ODEs == When considering the numerical solution of ordinary differential equations (ODEs), a predictor–corrector method typically uses an explicit method for the predictor step and an implicit method for the corrector step. === Example: Euler method with the trapezoidal rule === A simple predictor–corrector method (known as Heun's method) can be constructed from the Euler method (an explicit method) and the trapezoidal rule (an implicit method). Consider the differential equation y ′ = f ( t , y ) , y ( t 0 ) = y 0 , {\displaystyle y'=f(t,y),\quad y(t_{0})=y_{0},} and denote the step size by h {\displaystyle h} . First, the predictor step: starting from the current value y i {\displaystyle y_{i}} , calculate an initial guess value y ~ i + 1 {\displaystyle {\tilde {y}}_{i+1}} via the Euler method, y ~ i + 1 = y i + h f ( t i , y i ) . {\displaystyle {\tilde {y}}_{i+1}=y_{i}+hf(t_{i},y_{i}).} Next, the corrector step: improve the initial guess using trapezoidal rule, y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle y_{i+1}=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.} That value is used as the next step. === PEC mode and PECE mode === There are different variants of a predictor–corrector method, depending on how often the corrector method is applied. The Predict–Evaluate–Correct–Evaluate (PECE) mode refers to the variant in the above example: y ~ i + 1 = y i + h f ( t i , y i ) , y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},y_{i}),\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.\end{aligned}}} It is also possible to evaluate the function f only once per step by using the method in Predict–Evaluate–Correct (PEC) mode: y ~ i + 1 = y i + h f ( t i , y ~ i ) , y i + 1 = y i + 1 2 h ( f ( t i , y ~ i ) + f ( t i + 1 , y ~ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},{\tilde {y}}_{i}),\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},{\tilde {y}}_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )}.\end{aligned}}} Additionally, the corrector step can be repeated in the hope that this achieves an even better approximation to the true solution. If the corrector method is run twice, this yields the PECECE mode: y ~ i + 1 = y i + h f ( t i , y i ) , y ^ i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ~ i + 1 ) ) , y i + 1 = y i + 1 2 h ( f ( t i , y i ) + f ( t i + 1 , y ^ i + 1 ) ) . {\displaystyle {\begin{aligned}{\tilde {y}}_{i+1}&=y_{i}+hf(t_{i},y_{i}),\\{\hat {y}}_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\tilde {y}}_{i+1}){\bigr )},\\y_{i+1}&=y_{i}+{\tfrac {1}{2}}h{\bigl (}f(t_{i},y_{i})+f(t_{i+1},{\hat {y}}_{i+1}){\bigr )}.\end{aligned}}} The PECEC mode has one fewer function evaluation than PECECE mode. More generally, if the corrector is run k times, the method is in P(EC)k or P(EC)kE mode. If the corrector method is iterated until it converges, this could be called PE(CE)∞.

Vector database

A vector database, vector store or vector search engine is a database that stores and retrieves embeddings of data in vector space. Vector databases typically implement approximate nearest neighbor algorithms so users can search for records semantically similar to a given input, unlike traditional databases which primarily look up records by exact match. Use-cases for vector databases include similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation (RAG). Vector embeddings are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundred to tens of thousands, depending on the complexity of the data being represented. Each data item is represented by one vector in this space. Words, phrases, or entire documents, as well as images, audio, and other types of data, can all be vectorized. These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors close to each other. Vector retrieval can be combined with metadata filtering or lexical search to support filtered and hybrid retrieval workflows. == Techniques == Common techniques for similarity search on high-dimensional vectors include: Hierarchical Navigable Small World (HNSW) graphs Locality-sensitive hashing (LSH) and sketching Product quantization (PQ) Inverted files These techniques may also be combined in vector search systems. In recent benchmarks, HNSW-based implementations have been among the best performers. Conferences such as the International Conference on Similarity Search and Applications (SISAP) and the Conference on Neural Information Processing Systems (NeurIPS) have hosted competitions on vector search in large databases. == Applications == Vector databases are used in a wide range of machine learning applications including similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation. === Retrieval-augmented generation === An especially common use-case for vector databases is in retrieval-augmented generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected, and for each document or document section, a feature vector (known as an "embedding") is computed, typically using a deep learning network, and stored in a vector database along with a link to the document. Given a user prompt, the feature vector of the prompt is computed, and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model, and the large language model proceeds to create a response to the prompt given this context. == Implementations ==

Collaborative diffusion

Collaborative Diffusion is a type of pathfinding algorithm which uses the concept of antiobjects, objects within a computer program that function opposite to what would be conventionally expected. Collaborative Diffusion is typically used in video games, when multiple agents must path towards a single target agent. For example, the ghosts in Pac-Man. In this case, the background tiles serve as antiobjects, carrying out the necessary calculations for creating a path and having the foreground objects react accordingly, whereas having foreground objects be responsible for their own pathing would be conventionally expected. Collaborative Diffusion is favored for its efficiency over other pathfinding algorithms, such as A, when handling multiple agents. Also, this method allows elements of competition and teamwork to easily be incorporated between tracking agents. Notably, the time taken to calculate paths remains constant as the number of agents increases.