A social network hosting service is a web hosting service that specifically hosts the user creation of web-based social networking services, alongside related applications. Such services are also known as vertical social networks due to the creation of SNSes which cater to specific user interests and niches; like larger, interest-agnostic SNSes, such niche networking services may also possess the ability to create increasingly niche groups of users. == List of social network hosting services == Federated Media Publishing's BigTent BroadVision Clearvale Ning Wall.fm
Machine learning in video games
Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control, procedural content generation (PCG) and deep learning-based content generation. Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems. Information on machine learning techniques in the field of games is mostly known to public through research projects as most gaming companies choose not to publish specific information about their intellectual property. The most publicly known application of machine learning in games is likely the use of deep learning agents that compete with professional human players in complex strategy games. There has been a significant application of machine learning on games such as Atari/ALE, Doom, Minecraft, StarCraft, and car racing. Other games that did not originally exists as video games, such as chess and Go have also been affected by the machine learning. == Overview of relevant machine learning techniques == === Deep learning === Deep learning is a subset of machine learning which focuses heavily on the use of artificial neural networks (ANN) that learn to solve complex tasks. Deep learning uses multiple layers of ANN and other techniques to progressively extract information from an input. Due to this complex layered approach, deep learning models often require powerful machines to train and run on. ==== Convolutional neural networks ==== Convolutional neural networks (CNN) are specialized ANNs that are often used to analyze image data. These types of networks are able to learn translation invariant patterns, which are patterns that are not dependent on location. CNNs are able to learn these patterns in a hierarchy, meaning that earlier convolutional layers will learn smaller local patterns while later layers will learn larger patterns based on the previous patterns. A CNN's ability to learn visual data has made it a commonly used tool for deep learning in games. === Recurrent neural network === Recurrent neural networks are a type of ANN that are designed to process sequences of data in order, one part at a time rather than all at once. An RNN runs over each part of a sequence, using the current part of the sequence along with memory of previous parts of the current sequence to produce an output. These types of ANN are highly effective at tasks such as speech recognition and other problems that depend heavily on temporal order. There are several types of RNNs with different internal configurations; the basic implementation suffers from a lack of long term memory due to the vanishing gradient problem, thus it is rarely used over newer implementations. ==== Long short-term memory ==== A long short-term memory (LSTM) network is a specific implementation of a RNN that is designed to deal with the vanishing gradient problem seen in simple RNNs, which would lead to them gradually "forgetting" about previous parts of an inputted sequence when calculating the output of a current part. LSTMs solve this problem with the addition of an elaborate system that uses an additional input/output to keep track of long term data. LSTMs have achieved very strong results across various fields, and were used by several monumental deep learning agents in games. === Reinforcement learning === Reinforcement learning is the process of training an agent using rewards and/or punishments. The way an agent is rewarded or punished depends heavily on the problem; such as giving an agent a positive reward for winning a game or a negative one for losing. Reinforcement learning is used heavily in the field of machine learning and can be seen in methods such as Q-learning, policy search, Deep Q-networks and others. It has seen strong performance in both the field of games and robotics. === Neuroevolution === Neuroevolution involves the use of both neural networks and evolutionary algorithms. Instead of using gradient descent like most neural networks, neuroevolution models make use of evolutionary algorithms to update neurons in the network. Researchers claim that this process is less likely to get stuck in a local minimum and is potentially faster than state of the art deep learning techniques. == Deep learning agents == Machine learning agents have been used to take the place of a human player rather than function as NPCs, which are deliberately added into video games as part of designed gameplay. Deep learning agents have achieved impressive results when used in competition with both humans and other artificial intelligence agents. === Chess === Chess is a turn-based strategy game that is considered a difficult AI problem due to the computational complexity of its board space. Similar strategy games are often solved with some form of a Minimax Tree Search. These types of AI agents have been known to beat professional human players, such as the historic 1997 Deep Blue versus Garry Kasparov match. Since then, machine learning agents have shown ever greater success than previous AI agents. === Go === Go is another turn-based strategy game which is considered an even more difficult AI problem than chess. The state space of is Go is around 10^170 possible board states compared to the 10^120 board states for Chess. Prior to recent deep learning models, AI Go agents were only able to play at the level of a human amateur. ==== AlphaGo ==== Google's 2015 AlphaGo was the first AI agent to beat a professional Go player. AlphaGo used a deep learning model to train the weights of a Monte Carlo tree search (MCTS). The deep learning model consisted of 2 ANN, a policy network to predict the probabilities of potential moves by opponents, and a value network to predict the win chance of a given state. The deep learning model allows the agent to explore potential game states more efficiently than a vanilla MCTS. The network were initially trained on games of humans players and then were further trained by games against itself. ==== AlphaGo Zero ==== AlphaGo Zero, another implementation of AlphaGo, was able to train entirely by playing against itself. It was able to quickly train up to the capabilities of the previous agent. === StarCraft series === StarCraft and its sequel StarCraft II are real-time strategy (RTS) video games that have become popular environments for AI research. Blizzard and DeepMind have worked together to release a public StarCraft 2 environment for AI research to be done on. Various deep learning methods have been tested on both games, though most agents usually have trouble outperforming the default AI with cheats enabled or skilled players of the game. ==== Alphastar ==== Alphastar was the first AI agent to beat professional StarCraft 2 players without any in-game advantages. The deep learning network of the agent initially received input from a simplified zoomed out version of the gamestate, but was later updated to play using a camera like other human players. The developers have not publicly released the code or architecture of their model, but have listed several state of the art machine learning techniques such as relational deep reinforcement learning, long short-term memory, auto-regressive policy heads, pointer networks, and centralized value baseline. Alphastar was initially trained with supervised learning, it watched replays of many human games in order to learn basic strategies. It then trained against different versions of itself and was improved through reinforcement learning. The final version was hugely successful, but only trained to play on a specific map in a protoss mirror matchup. === Dota 2 === Dota 2 is a multiplayer online battle arena (MOBA) game. Like other complex games, traditional AI agents have not been able to compete on the same level as professional human player. The only widely published information on AI agents attempted on Dota 2 is OpenAI's deep learning Five agent. ==== OpenAI Five ==== OpenAI Five utilized separate long short-term memory networks to learn each hero. It trained using a reinforcement learning technique known as Proximal Policy Learning running on a system containing 256 GPUs and 128,000 CPU cores. Five trained for months, accumulating 180 years of game experience each day, before facing off with professional players. It was eventually able to beat the 2018 Dota 2 esports champion team in a 2019 series of games. === Planetary Annihilation === Planetary Annihilation is a real-time strategy game which focuses on massive scale war. The developers use ANNs in their default AI agent. === Supreme Commander 2 === Supreme Commander 2 is a real-time strategy (RTS) video game. The game uses Multilayer Perceptrons (MLPs) to control a platoon’s reaction to encountered enemy units. Total of four MLPs are used, one for each platoon type: land, naval
Control-flow diagram
A control-flow diagram (CFD) is a diagram to describe the control flow of a business process, process or review. Control-flow diagrams were developed in the 1950s, and are widely used in multiple engineering disciplines. They are one of the classic business process modeling methodologies, along with flow charts, drakon-charts, data flow diagrams, functional flow block diagram, Gantt charts, PERT diagrams, and IDEF. == Overview == A control-flow diagram can consist of a subdivision to show sequential steps, with if-then-else conditions, repetition, and/or case conditions. Suitably annotated geometrical figures are used to represent operations, data, or equipment, and arrows are used to indicate the sequential flow from one to another. There are several types of control-flow diagrams, for example: Change-control-flow diagram, used in project management Configuration-decision control-flow diagram, used in configuration management Process-control-flow diagram, used in process management Quality-control-flow diagram, used in quality control. In software and systems development, control-flow diagrams can be used in control-flow analysis, data-flow analysis, algorithm analysis, and simulation. Control and data are most applicable for real time and data-driven systems. These flow analyses transform logic and data requirements text into graphic flows which are easier to analyze than the text. PERT, state transition, and transaction diagrams are examples of control-flow diagrams. == Types of control-flow diagrams == === Process-control-flow diagram === A flow diagram can be developed for the process [control system] for each critical activity. Process control is normally a closed cycle in which a sensor. The application determines if the sensor information is within the predetermined (or calculated) data parameters and constraints. The results of this comparison, which controls the critical component. This [feedback] may control the component electronically or may indicate the need for a manual action. This closed-cycle process has many checks and balances to ensure that it stays safe. It may be fully computer controlled and automated, or it may be a hybrid in which only the sensor is automated and the action requires manual intervention. Further, some process control systems may use prior generations of hardware and software, while others are state of the art. === Performance-seeking control-flow diagram === The figure presents an example of a performance-seeking control-flow diagram of the algorithm. The control law consists of estimation, modeling, and optimization processes. In the Kalman filter estimator, the inputs, outputs, and residuals were recorded. At the compact propulsion-system-modeling stage, all the estimated inlet and engine parameters were recorded. In addition to temperatures, pressures, and control positions, such estimated parameters as stall margins, thrust, and drag components were recorded. In the optimization phase, the operating-condition constraints, optimal solution, and linear-programming health-status condition codes were recorded. Finally, the actual commands that were sent to the engine through the DEEC were recorded.
Trusted Computing
Trusted Computing (TC) is a technology developed and promoted by the Trusted Computing Group. The term is taken from the field of trusted systems and has a specialized meaning that is distinct from the field of confidential computing. With Trusted Computing, the computer will consistently behave in expected ways, and those behaviors will be enforced by computer hardware and software. Enforcing this behavior is achieved by loading the hardware with a unique encryption key that is inaccessible to the rest of the system and the owner. TC is controversial as the hardware is not only secured for its owner, but also against its owner, leading opponents of the technology like free software activist Richard Stallman to deride it as "treacherous computing", and certain scholarly articles to use scare quotes when referring to the technology. Trusted Computing proponents such as International Data Corporation, the Enterprise Strategy Group and Endpoint Technologies Associates state that the technology will make computers safer, less prone to viruses and malware, and thus more reliable from an end-user perspective. They also state that Trusted Computing will allow computers and servers to offer improved computer security over that which is currently available. Opponents often state that this technology will be used primarily to enforce digital rights management policies (imposed restrictions to the owner) and not to increase computer security. Chip manufacturers Intel and AMD, hardware manufacturers such as HP and Dell, and operating system providers such as Microsoft include Trusted Computing in their products if enabled. The U.S. Army requires that every new PC it purchases comes with a Trusted Platform Module (TPM). As of July 3, 2007, so does virtually the entire United States Department of Defense. == Key concepts == Trusted Computing encompasses six key technology concepts, of which all are required for a fully Trusted system, that is, a system compliant to the TCG specifications: Endorsement key Secure input and output Memory curtaining / protected execution Sealed storage Remote attestation Trusted Third Party (TTP) === Endorsement key === The endorsement key is a 2048-bit RSA public and private key pair that is created randomly on the chip at manufacture time and cannot be changed. The private key never leaves the chip, while the public key is used for attestation and for encryption of sensitive data sent to the chip, as occurs during the TPM_TakeOwnership command. This key is used to allow the execution of secure transactions: every Trusted Platform Module (TPM) is required to be able to sign a random number (in order to allow the owner to show that he has a genuine trusted computer), using a particular protocol created by the Trusted Computing Group (the direct anonymous attestation protocol) in order to ensure its compliance of the TCG standard and to prove its identity; this makes it impossible for a software TPM emulator with an untrusted endorsement key (for example, a self-generated one) to start a secure transaction with a trusted entity. The TPM should be designed to make the extraction of this key by hardware analysis hard, but tamper resistance is not a strong requirement. === Memory curtaining === Memory curtaining extends common memory protection techniques to provide full isolation of sensitive areas of memory—for example, locations containing cryptographic keys. Even the operating system does not have full access to curtained memory. The exact implementation details are vendor specific. === Sealed storage === Sealed storage protects private information by binding it to platform configuration information including the software and hardware being used. This means the data can be released only to a particular combination of software and hardware. Sealed storage can be used for DRM enforcing. For example, users who keep a song on their computer that has not been licensed to be listened will not be able to play it. Currently, a user can locate the song, listen to it, and send it to someone else, play it in the software of their choice, or back it up (and in some cases, use circumvention software to decrypt it). Alternatively, the user may use software to modify the operating system's DRM routines to have it leak the song data once, say, a temporary license was acquired. Using sealed storage, the song is securely encrypted using a key bound to the trusted platform module so that only the unmodified and untampered music player on his or her computer can play it. In this DRM architecture, this might also prevent people from listening to the song after buying a new computer, or upgrading parts of their current one, except after explicit permission of the vendor of the song. === Remote attestation === Remote attestation allows changes to the user's computer to be detected by authorized parties. For example, software companies can identify unauthorized changes to software, including users modifying their software to circumvent commercial digital rights restrictions. It works by having the hardware generate a certificate stating what software is currently running. The computer can then present this certificate to a remote party to show that unaltered software is currently executing. Numerous remote attestation schemes have been proposed for various computer architectures, including Intel, RISC-V, and ARM. Remote attestation is usually combined with public-key encryption so that the information sent can only be read by the programs that requested the attestation, and not by an eavesdropper. To take the song example again, the user's music player software could send the song to other machines, but only if they could attest that they were running an authorized copy of the music player software. Combined with the other technologies, this provides a more restricted path for the music: encrypted I/O prevents the user from recording it as it is transmitted to the audio subsystem, memory locking prevents it from being dumped to regular disk files as it is being worked on, sealed storage curtails unauthorized access to it when saved to the hard drive, and remote attestation prevents unauthorized software from accessing the song even when it is used on other computers. To preserve the privacy of attestation responders, Direct Anonymous Attestation has been proposed as a solution, which uses a group signature scheme to prevent revealing the identity of individual signers. Proof of space (PoS) have been proposed to be used for malware detection, by determining whether the L1 cache of a processor is empty (e.g., has enough space to evaluate the PoSpace routine without cache misses) or contains a routine that resisted being evicted. === Trusted third party === == Known applications == The Microsoft products Windows Vista, Windows 7, Windows 8 and Windows RT make use of a Trusted Platform Module to facilitate BitLocker Drive Encryption. Other known applications with runtime encryption and the use of secure enclaves include the Signal messenger and the e-prescription service ("E-Rezept") by the German government. == Possible applications == === Digital rights management === Trusted Computing would allow companies to create a digital rights management (DRM) system which would be very hard to circumvent, though not impossible. An example is downloading a music file. Sealed storage could be used to prevent the user from opening the file with an unauthorized player or computer. Remote attestation could be used to authorize play only by music players that enforce the record company's rules. The music would be played from curtained memory, which would prevent the user from making an unrestricted copy of the file while it is playing, and secure I/O would prevent capturing what is being sent to the sound system. Circumventing such a system would require either manipulation of the computer's hardware, capturing the analogue (and thus degraded) signal using a recording device or a microphone, or breaking the security of the system. New business models for use of software (services) over Internet may be boosted by the technology. By strengthening the DRM system, one could base a business model on renting programs for a specific time periods or "pay as you go" models. For instance, one could download a music file which could only be played a certain number of times before it becomes unusable, or the music file could be used only within a certain time period. === Preventing cheating in online games === Trusted Computing could be used to combat cheating in online games. Some players modify their game copy in order to gain unfair advantages in the game; remote attestation, secure I/O and memory curtaining could be used to determine that all players connected to a server were running an unmodified copy of the software. === Verification of remote computation for grid computing === Trusted Computing could be used to guarantee participants in a grid computing sys
CANaerospace
CANaerospace is a higher layer protocol based on Controller Area Network (CAN) which has been developed by Stock Flight Systems in 1998 for aeronautical applications. == Background == CANaerospace supports airborne systems employing the Line-replaceable unit (LRU) concept to share data across CAN and ensures interoperability between CAN LRUs by defining CAN physical layer characteristics, network layers, communication mechanisms, data types and aeronautical axis systems. CANaerospace is an open source project, was initiated to standardize the interface between CAN LRUs on the system level. CANaerospace is continuously being developed further and has also been published by NASA as the Advanced General Aviation Transport Experiments Databus Standard in 2001. It found widespread use in aeronautical research worldwide. A major research aircraft that employs several CANaerospace networks for real-time computer interconnection is the Stratospheric Observatory for Infrared Astronomy (SOFIA), a Boeing 747SP with a 2.5m astronomic telescope. CANaerospace is also frequently used in flight simulation and connects entire aircraft cockpits (i.e. in Eurofighter Typhoon simulators) to the simulation host computers. In Italy CANaerospace is used as UAV data bus technology. Furthermore, CANaerospace serves as communication network in several general aviation avionics systems. The CANaerospace interface definition closes the gap between the ISO/OSI layer 1 and 2 CAN protocol (which is implemented in the CAN controller itself) and the specific requirements of distributed systems in aircraft. It may be used as a primary or ancillary avionics network and was designed to meet the following requirements: Democratic network: CANaerospace does not require any master/slave relationships between LRUs or a "bus controller", thereby avoiding a potential single source of failure. Every node in the network has the same rights for participation in the bus traffic. Self-identifying message format: Each CANaerospace message contains information about the type of the data and the transmitting node. This allows the data to be unambiguously recognized at each receiving node. Continuous Message Numbering: Each CANaerospace message contains a continuously incremented number which allows coherent processing of messages in the receiving stations. Message Status Code: Each CANaerospace message contains information about the integrity of the data is conveying. This allows receiving stations to evaluate the quality of the received data and to react accordingly. Emergency Event Signaling: CANaerospace defines a mechanism that allows each node to transmit information about exception or error situations. This information can be used by other stations to determine the network health. Node Service Interface: As an enhancement to CAN, CANaerospace provides a means for individual stations on the network to communicate with each other using connection-oriented and connectionless services. Predefined CAN Identifier Assignment: CANaerospace offers a predefined identifier assignment list for normal operation data. In addition to the predefined list, user-defined identifier assignment lists may be used. Ease of Implementation: The amount of code to implement CANaerospace is very little by design in order to minimize the effort for testing and certification of flight safety critical systems. Openness to Extensions: All CANaerospace definitions are extendable to provide flexibility for future enhancements and to allow adaptions to the requirements of specific applications. Free Availability: No cost whatsoever apply for the use of CANaerospace. The specification can be downloaded from the Internet == Physical interface == To ensure interoperability and reliable communication, CANaerospace specifies the electrical characteristics, bus transceiver requirements and data rates with the corresponding tolerances based on ISO 11898. The bit timing calculation (baud rate accuracy, sample point definition) and robustness to electromagnetic interference are given special emphasis. Also addressed are CAN connector, wiring considerations and design guidelines to maximize electromagnetic compatibility. == Communication layers == The Bosch CAN specification itself allows messages being transmitted both periodically and aperiodically but does not cover issues like data representation, node addressing or connection-oriented protocols. CAN is entirely based on Anyone-to-Many (ATM) communication which means that CAN messages are always received by all stations in the network. The advantage of the CAN concept is inherent data consistency between all stations, the drawback is that it does not allow node addressing which is the basis for Peer-to-Peer (PTP) communication. Using CAN networks in aeronautical applications, however, demands a standard targeted to the specific requirements of airborne systems which implies that communication between individual stations in the network must be possible to enable the required degree of system monitoring. Consequently, CANaerospace defines additional ISO/OSI layer 3, 4 and 6 functions to support node addressing and unified ATM/PTP communication mechanisms. PTP communication allows to set up client/server interactions between individual stations in the network either temporarily or permanently. More than one of these interactions may be in effect at any given time and each node may be client for one operation and server for another at the same time. This CANaerospace mechanism is called "Node Service Concept" and allows i.e. to distribute system functions over several stations in the network or to control dynamic system reconfiguration in case of failure. The Node Service concept supports both connection-oriented and connectionless interactions like with TCP/IP and UDP/IP for Ethernet. Enabling both ATM and PTP communication for CAN requires the introduction of independent network layers to isolate the different types of communication. This is realized for CANaerospace by forming CAN identifier groups as shown in Figure 1. The resulting structure creates Logical Communication Channels (LCCs) and assigns a specific communication type (ATM, PTP) to each of the LCCs. User-defined LCCs provide the necessary freedom for designers and allow the implementation of CANaerospace according to the needs of specific applications. Figure 1: Logical Communication Channels for CANaerospace As a side effect, the CAN identifier groups in Figure 1 affect the priority of the message transmission in case of bus arbitration. The communication channels are therefore arranged according to their relative importance: Emergency Event Data Channel (EED): This communication channel is used for messages which require immediate action (i.e. system degradation or reconfiguration) and have to be transmitted with very high priority. Emergency Event Data uses ATM communication exclusively. High/Low Priority Node Service Data Channel (NSH/NSL): These communication channels are used for client/server interactions using PTP communication. The corresponding services may be of the connection-oriented as well as the connectionless type. NSH/NSL may also be used to support test and maintenance functions. Normal Operation Data Channel (NOD): This communication channel is used for the transmission of the data which is generated during normal system operation and described in the CANaerospace identifier assignment list. These messages may be transmitted periodically or aperiodically as well as synchronously or asynchronously. All messages which cannot be assigned to other communication channels shall use this channel. High/Low Priority User-Defined Data Channel (UDH/UDL): This channel is dedicated to communication which cannot, due to their specific characteristics, be assigned other channels without violating the CANaerospace specification. As long as the defined identifier range is used, the message content and the communication type (ATM, PTP) for these channels may be specified by the system designer. To ensure interoperability it is highly recommended that the use of these channels is minimized. Debug Service Data Channel (DSD): This channel is dedicated to messages which are used temporarily for development and test purposes only and are not transmitted during normal operation. As long as the defined identifier range is used, the message content and the communication type (ATM, PTP) for these channels may be specified by the system designer. == Data representation == The majority of the real-time control systems used in aeronautics employ "big endian" processor architectures. This data representation was therefore specified for CANaerospace as well. With big endian data representation, the most significant bit of any datum is arranged leftmost and transmitted first on CANaerospace as shown in Figure 2. Figure 2: "Big Endian" Data Representation for CANaerospace CANaerospace uses a self-identifying message
H (company)
H Company, also known simply as H, is a French artificial intelligence startup which develops "action-oriented" artificial intelligence agents for enterprise automation and productivity. In May 2024, H Company closed a record-setting $220 million seed round, at the time the largest AI raise in Europe. In 2026, H Company released Holo 3, the latest generation of its computer-use AI models. The update marked a major advance in agentic AI, enabling agents to navigate any user interface, interpret screens, and complete complex, multi-step tasks across enterprise systems—much like a human user. This breakthrough positioned H Company at the frontier of computer-use autonomy, accelerating the integration of AI in enterprise workflows. == History == H Company was founded in 2023 in Paris by Laurent Sifre, Charles Kantor, and three DeepMind veterans: Daan Wiestra, Karl Tuyls, Julien Perollat. In May 2024, the firm secured what was then the largest European AI seed round, totaling $220 million led by US investors including Eric Schmidt (former Google CEO), Amazon, and backed by Accel, Bpifrance, UiPath, Eurazeo, Xavier Niel, Yuri Milner, Bernard Arnault, Samsung and others. In August 2024, three cofounders (Wiestra, Tuyls, Perollat) left the company over operational disagreements. In November 2024, H launched Runner H, its first agentic-API platform, which combined a large language model (LLM) and a reduced, 2-billion parameter vision-language model (VLM). In May 2025, H Company acquired Mithril Security, and in June 2025 the company widened its offering for agentic models. In June 2025, Gautier Cloix (formerly CEO Palantir France) replaced Charles Kantor as CEO of H Company, aiming to pivot the company towards a "forward deployed engineers" model. In July 2025, H Company introduced Surfer-H-CLI, an open-source, web-native Chrome agent designed for browser-based automation—able to search, scroll, click, and type on behalf of users and controllable via any visual language model (VLM). When paired with its June 2025 open-sourced 3B-parameter Holo-1 model, Surfer-H-CLI achieved 92.2% WebVoyager benchmark accuracy. == Activity == H Company creates enterprise AI models and agents (agentic AI) to automate and optimize complex workflows. H Company specifically designs AI agents called computer use capable of autonomously interfacing with any software (local or cloud-based) to detect and automate repetitive operations. H Company is based in Paris, France, with international offices in London and New York. H Company raised $220 million since its inception. Gautier Cloix is president and CEO of the company. H Company client include the French national lottery FDJ United. In March 2026, H Company released Holo3, a family of artificial intelligence models designed to operate digital systems by interacting directly with user interfaces. Holo3 enables agents ("virtual humanoids") to understand what is displayed in front-end environments—such as web pages, desktop applications, and other graphical user interfaces—and perform actions such as clicking, typing, and navigating across them to complete multi-step tasks. On the OSWorld-Verified benchmark, Holo3 reportedly achieved about 78.9%, surpassing the scores of OpenAI’s GPT‑5.4 and Anthropic’s Claude Opus 4.6 on this specific test, at roughly one-tenth of the inference cost of these proprietary systems. The release has been presented as a significant step toward automating routine digital workflows, allowing organizations to offload repetitive on-screen work, such as data entry and reconciliation across multiple tools, to AI-based agents.
Change data capture
In databases, change data capture (CDC) is a set of software design patterns used to determine and track the data that has changed (the "deltas") so that action can be taken using the changed data. The result is a delta-driven dataset. CDC is an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources. For instance it can be used for incremental update of data loading. CDC occurs often in data warehouse environments since capturing and preserving the state of data across time is one of the core functions of a data warehouse, but CDC can be utilized in any database or data repository system. == Methodology == System developers can set up CDC mechanisms in a number of ways and in any one or a combination of system layers from application logic down to physical storage. In a simplified CDC context, one computer system has data believed to have changed from a previous point in time, and a second computer system needs to take action based on that changed data. The former is the source, the latter is the target. It is possible that the source and target are the same system physically, but that would not change the design pattern logically. Multiple CDC solutions can exist in a single system. === Timestamps on rows === Tables whose changes must be captured may have a column that represents the time of last change. Names such as LAST_UPDATE, LAST_MODIFIED, etc. are common. Any row in any table that has a timestamp in that column that is more recent than the last time data was captured is considered to have changed. Timestamps on rows are also frequently used for optimistic locking so this column is often available. === Version numbers on rows === Database designers give tables whose changes must be captured a column that contains a version number. Names such as VERSION_NUMBER, etc. are common. One technique is to mark each changed row with a version number. A current version is maintained for the table, or possibly a group of tables. This is stored in a supporting construct such as a reference table. When a change capture occurs, all data with the latest version number is considered to have changed. Once the change capture is complete, the reference table is updated with a new version number. (Do not confuse this technique with row-level versioning used for optimistic locking. For optimistic locking each row has an independent version number, typically a sequential counter. This allows a process to atomically update a row and increment its counter only if another process has not incremented the counter. But CDC cannot use row-level versions to find all changes unless it knows the original "starting" version of every row. This is impractical to maintain.) === Status indicators on rows === This technique can either supplement or complement timestamps and versioning. It can configure an alternative if, for example, a status column is set up on a table row indicating that the row has changed (e.g., a boolean column that, when set to true, indicates that the row has changed). Otherwise, it can act as a complement to the previous methods, indicating that a row, despite having a new version number or a later date, still shouldn't be updated on the target (for example, the data may require human validation). === Time/version/status on rows === This approach combines the three previously discussed methods. As noted, it is not uncommon to see multiple CDC solutions at work in a single system, however, the combination of time, version, and status provides a particularly powerful mechanism and programmers should utilize them as a trio where possible. The three elements are not redundant or superfluous. Using them together allows for such logic as, "Capture all data for version 2.1 that changed between 2005-06-01 00:00 and 2005-07-01 00:00 where the status code indicates it is ready for production." === Triggers on tables === May include a publish/subscribe pattern to communicate the changed data to multiple targets. In this approach, triggers log events that happen to the transactional table into another queue table that can later be "played back". For example, imagine an Accounts table, when transactions are taken against this table, triggers would fire that would then store a history of the event or even the deltas into a separate queue table. The queue table might have schema with the following fields: Id, TableName, RowId, Timestamp, Operation. The data inserted for our Account sample might be: 1, Accounts, 76, 2008-11-02 00:15, Update. More complicated designs might log the actual data that changed. This queue table could then be "played back" to replicate the data from the source system to a target. Data capture offers a challenge in that the structure, contents and use of a transaction log is specific to a database management system. Unlike data access, no standard exists for transaction logs. Most database management systems do not document the internal format of their transaction logs, although some provide programmatic interfaces to their transaction logs (for example: Oracle, DB2, SQL/MP, SQL/MX and SQL Server 2008). Other challenges in using transaction logs for change data capture include: Coordinating the reading of the transaction logs and the archiving of log files (database management software typically archives log files off-line on a regular basis). Translation between physical storage formats that are recorded in the transaction logs and the logical formats typically expected by database users (e.g., some transaction logs save only minimal buffer differences that are not directly useful for change consumers). Dealing with changes to the format of the transaction logs between versions of the database management system. Eliminating uncommitted changes that the database wrote to the transaction log and later rolled back. Dealing with changes to the metadata of tables in the database. CDC solutions based on transaction log files have distinct advantages that include: minimal impact on the database (even more so if one uses log shipping to process the logs on a dedicated host). no need for programmatic changes to the applications that use the database. low latency in acquiring changes. transactional integrity: log scanning can produce a change stream that replays the original transactions in the order they were committed. Such a change stream include changes made to all tables participating in the captured transaction. no need to change the database schema == Confounding factors == As often occurs in complex domains, the final solution to a CDC problem may have to balance many competing concerns. === Unsuitable source systems === Change data capture both increases in complexity and reduces in value if the source system saves metadata changes when the data itself is not modified. For example, some Data models track the user who last looked at but did not change the data in the same structure as the data. This results in noise in the Change Data Capture. === Tracking the capture === Actually tracking the changes depends on the data source. If the data is being persisted in a modern database then Change Data Capture is a simple matter of permissions. Two techniques are in common use: Tracking changes using database triggers Reading the transaction log as, or shortly after, it is written. If the data is not in a modern database, CDC becomes a programming challenge. === Push versus pull === Push: the source process creates a snapshot of changes within its own process and delivers rows downstream. The downstream process uses the snapshot, creates its own subset and delivers them to the next process. Pull: the target that is immediately downstream from the source, prepares a request for data from the source. The downstream target delivers the snapshot to the next target, as in the push model. === Alternatives === Sometimes the slowly changing dimension is used as an alternative method. CDC and SCD are similar in that both methods can detect changes in a data set. The most common forms of SCD are type 1 (overwrite), type 2 (maintain history) or 3 (only previous and current value). SCD 2 can be useful if history is needed in the target system. CDC overwrites in the target system (akin to SCD1), and is ideal when only the changed data needs to arrive at the target, i.e. a delta-driven dataset.