Data product

Data product

In data management and product management, a data product is a reusable, active, and standardized data asset designed to deliver measurable value to its users, whether internal or external, by applying the rigorous principles of product thinking and management. It comprises one or more data artifacts (e.g., datasets, models, pipelines) and is enriched with metadata, including governance policies, data quality rules, data contracts, and, where applicable, a software bill of materials (SBOM) to document its dependencies and components. Ownership of a data product is aligned to a specific domain or use case, ensuring accountability, stewardship, and its continuous evolution throughout its lifecycle. Adhering to the FAIR principles – findable, accessible, interoperable, and reusable – a data product is designed to be discoverable, scalable, reusable, and aligned with both business and regulatory standards, driving innovation and efficiency in modern data ecosystems. == History == In 2012, DJ Patil proposed the first documented definition: a data product is a product that facilitates an end goal through the use of data. In 2019, Zhamak Dehghani introduced Data Mesh, with a strong focus on domain-oriented data products. Later, in 2020, she solidifies Data Mesh around four principles, one being Data as a Product, in which she defines Data Product as the node on the mesh that encapsulates three structural components required for its function, providing access to the domain's analytical data as a product. In 2024, Andrea Gioia published one of the first books specifically on data products post Data Mesh announcement. In his book, Gioia defines the concept of pure data product. In 2025, during the Data Day Texas conference, Jean-Georges Perrin and a collective of product managers and data engineers got together to craft the current definition and make it available to the public domain. In July 2025, Bitol, a project of The Linux Foundation, released and early version of the Open Data Product Standard (ODPS) aiming at normalizing data products

Gapo

Gapo is a Vietnamese social networking service based in Hanoi, Vietnam. Users are able to create a personal profile and share text, photos and videos with others on the platform. Users can also use Gapo for live streaming, instant messaging, blogging, and online payments. Gapo was launched in July 2019 by Hà Trung Kiên and Duong Vi Khoa. == History == Gapo was founded in response to calls for Vietnam's Communist-led government to produce a domestic alternative to social media giants like Facebook and Google. Gapo officially launched on July 23, 2019 at an event in Hanoi. The company received 500 billion đồng (US$22 million) in funding from technology corporation G-Group to be utilized in the first phase of development. They also partnered with Sony Music Entertainment to provide music content to its services. == Features == Gapo features a news feed for posting content, livestreaming, instant messaging, and blogging. It also allows users to pay online and access public services. == Reception == Within two days of launch, Gapo received about 200,000 registrations. By September 2019, the user base increased to one million. Upon launch, Gapo experienced significant technical difficulties. Users complained about the inability to sign up for a new account and said that certain functions were not available for use at launch. This issue caused Gapo to temporarily suspend their services in order to perform upgrades and bug fixes. Gapo relaunched the next day, though many users reported that the access speed decreased. The mobile app also received mixed reviews from users in both the App Store and the Google Play Store, with an average rating of 3.1 and 3.5, respectively. Most users found the app to be a knockoff of Facebook, although some users praised the app for being locally developed. === Expert opinions on platform viability === Le Hong Hiep of the ISEAS - Yusof Ishak Institute was doubtful that a Vietnamese-owned social network service could be as powerful as a foreign-based service, stating that Vietnam might not be able to develop a viable social media network to compete with the likes of Facebook or Google. Others, like blogger Ann Chi, said that, due to local players complying with local censorship policy, there is a chance that locals might not trust Gapo and other local services in light of possible surveillance. Regarding the targeted user base figure for the end of 2019 and 2021, experts cautioned that the company might need an additional trillion đồng of funding to reach its planned user base targets. In response, the company stated that Gapo was never meant to compete with Facebook, but instead noted that the main difference between Gapo and Facebook is that Gapo provides a personalized user experience through customization. == Censorship == Gapo has the right to censor posts and news that are deemed offensive and inaccurate by users or not approved by the censorship curators.

Business rules engine

A business rules engine is a software system that executes one or more business rules in a runtime production environment. The rules might come from legal regulation ("An employee can be fired for any reason or no reason but not for an illegal reason"), company policy ("All customers that spend more than $100 at one time will receive a 10% discount"), or other sources. A business rule system enables these company policies and other operational decisions to be defined, tested, executed and maintained separately from application code. Rule engines typically support rules, facts, priority (score), mutual exclusion, preconditions, and other functions. Rule engine software is commonly provided as a component of a business rule management system which, among other functions, provides the ability to: register, define, classify, and manage all the rules, verify consistency of rules definitions (”Gold-level customers are eligible for free shipping when order quantity > 10” and “maximum order quantity for Silver-level customers = 15” ), define the relationships between different rules, and relate some of these rules to IT applications that are affected or need to enforce one or more of the rules. == IT use case == In any IT application, business rules can change more frequently than other parts of the application code. Rules engines or inference engines serve as pluggable software components which execute business rules that a business rules approach has externalized or separated from application code. This externalization or separation allows business users to modify the rules without the need for IT intervention. The system as a whole becomes more easily adaptable with such external business rules, but this does not preclude the usual requirements of QA and other testing. == History == An article in Computerworld traces rules engines to the early 1990s and to products from the likes of Pegasystems, Fair Isaac Corp, ILOG and eMerge from Sapiens. == Design strategies == Many organizations' rules efforts combine aspects of what is generally considered workflow design with traditional rule design. This failure to separate the two approaches can lead to problems with the ability to re-use and control both business rules and workflows. Design approaches that avoid this quandary separate the role of business rules and workflows as follows: Business rules produce knowledge; Workflows perform business work. Concretely, that means that a business rule may do things like detect that a business situation has occurred and raise a business event (typically carried via a messaging infrastructure) or create higher level business knowledge (e.g., evaluating the series of organizational, product, and regulatory-based rules concerning whether or not a loan meets underwriting criteria). On the other hand, a workflow would respond to an event that indicated something such as the overloading of a routing point by initiating a series of activities. This separation is important because the same business judgment (mortgage meets underwriting criteria) or business event (router is overloaded) can be reacted to by many different workflows. Embedding the work done in response to rule-driven knowledge creation into the rule itself greatly reduces the ability of business rules to be reused across an organization because it makes them work-flow specific. To create an architecture that employs a business rules engine it is essential to establish the integration between a BPM (Business Process Management) and a BRM (Business Rules Management) platform that is based upon processes responding to events or examining business judgments that are defined by business rules. There are some products in the marketplace that provide this integration natively. In other situations this type of abstraction and integration will have to be developed within a particular project or organization. Most Java-based rules engines provide a technical call-level interface, based on the JSR-94 application programming interface (API) standard, in order to allow for integration with different applications, and many rule engines allow for service-oriented integrations through Web-based standards such as WSDL and SOAP. Most rule engines provide the ability to develop a data abstraction that represents the business entities and relationships that rules should be written against. This business entity model can typically be populated from a variety of sources including XML, POJOs, flat files, etc. There is no standard language for writing the rules themselves. Many engines use a Java-like syntax, while some allow the definition of custom business-friendly languages. Most rules engines function as a callable library. However, it is becoming more popular for them to run as a generic process akin to the way that RDBMSs behave. Most engines treat rules as a configuration to be loaded into their process instance, although some are actually code generators for the whole rule execution instance and others allow the user to choose. == Types of rule engines == There are a number of different types of rule engines. These types (generally) differ in how Rules are scheduled for execution. Most rules engines used by businesses are forward chaining, which can be further divided into two classes: The first class processes so-called production/inference rules. These types of rules are used to represent behaviors of the type IF condition THEN action. For example, such a rule could answer the question: "Should this customer be allowed a mortgage?" by executing rules of the form "IF some-condition THEN allow-customer-a-mortgage". The other type of rule engine processes so-called reaction/Event condition action rules. The reactive rule engines detect and react to incoming events and process event patterns. For example, a reactive rule engine could be used to alert a manager when certain items are out of stock. The biggest difference between these types is that production rule engines execute when a user or application invokes them, usually in a stateless manner. A reactive rule engine reacts automatically when events occur, usually in a stateful manner. Many (and indeed most) popular commercial rule engines have both production and reaction rule capabilities, although they might emphasize one class over another. For example, most business rules engines are primarily production rules engines, whereas complex event processing rules engines emphasize reaction rules. In addition, some rules engines support backward chaining. In this case a rules engine seeks to resolve the facts to fit a particular goal. It is often referred to as being goal driven because it tries to determine if something exists based on existing information. Another kind of rule engine automatically switches between back- and forward-chaining several times during a reasoning run, e.g. the Internet Business Logic system, which can be found by searching the web. A fourth class of rules engine might be called a deterministic engine. These rules engines may forgo both forward chaining and backward chaining, and instead utilize domain-specific language approaches to better describe policy. This approach is often easier to implement and maintain, and provides performance advantages over forward or backward chaining systems. There are some circumstance where fuzzy logic based inference may be more appropriate, where heuristics are used in rule processing, rather than Boolean rules. Examples might include customer classification, missing data inference, customer value calculations, etc. The DARL language and the associated inference engine and editors is an example of this approach. == Rules engines for access control / authorization == One common use case for rules engines is standardized access control to applications. OASIS defines a rules engine architecture and standard dedicated to access control called XACML (eXtensible Access Control Markup Language). One key difference between a XACML rule engine and a business rule engine is the fact that a XACML rule engine is stateless and cannot change the state of any data. The XACML rule engine, called a Policy Decision Point (PDP), expects a binary Yes/No question e.g. "Can Alice view document D?" and returns a decision e.g. Permit / deny.

Spatial–temporal reasoning

Spatial–temporal reasoning is an area of artificial intelligence that draws from the fields of computer science, cognitive science, and cognitive psychology. The theoretic goal—on the cognitive side—involves representing and reasoning spatial-temporal knowledge in mind. The applied goal—on the computing side—involves developing high-level control systems of automata for navigating and understanding time and space. == Influence from cognitive psychology == A convergent result in cognitive psychology is that the connection relation is the first spatial relation that human babies acquire, followed by understanding orientation relations and distance relations. Internal relations among the three kinds of spatial relations can be computationally and systematically explained within the theory of cognitive prism as follows: the connection relation is primitive; an orientation relation is a distance comparison relation: you being in front of me can be interpreted as you are nearer to my front side than my other sides; a distance relation is a connection relation using a third object: you being one meter away from me can be interpreted as a one-meter-long object connected with you and me simultaneously. == Fragmentary representations of temporal calculi == Without addressing internal relations among spatial relations, AI researchers contributed many fragmentary representations. Examples of temporal calculi include Allen's interval algebra, and Vilain's & Kautz's point algebra. The most prominent spatial calculi are mereotopological calculi, Frank's cardinal direction calculus, Freksa's double cross calculus, Egenhofer and Franzosa's 4- and 9-intersection calculi, Ligozat's flip-flop calculus, various region connection calculi (RCC), and the Oriented Point Relation Algebra. Recently, spatio-temporal calculi have been designed that combine spatial and temporal information. For example, the spatiotemporal constraint calculus (STCC) by Gerevini and Nebel combines Allen's interval algebra with RCC-8. Moreover, the qualitative trajectory calculus (QTC) allows for reasoning about moving objects. == Quantitative abstraction == An emphasis in the literature has been on qualitative spatial-temporal reasoning which is based on qualitative abstractions of temporal and spatial aspects of the common-sense background knowledge on which our human perspective of physical reality is based. Methodologically, qualitative constraint calculi restrict the vocabulary of rich mathematical theories dealing with temporal or spatial entities such that specific aspects of these theories can be treated within decidable fragments with simple qualitative (non-metric) languages. Contrary to mathematical or physical theories about space and time, qualitative constraint calculi allow for rather inexpensive reasoning about entities located in space and time. For this reason, the limited expressiveness of qualitative representation formalism calculi is a benefit if such reasoning tasks need to be integrated in applications. For example, some of these calculi may be implemented for handling spatial GIS queries efficiently and some may be used for navigating, and communicating with, a mobile robot. == Relation algebra == Most of these calculi can be formalized as abstract relation algebras, such that reasoning can be carried out at a symbolic level. For computing solutions of a constraint network, the path-consistency algorithm is an important tool. == Software == GQR, constraint network solver for calculi like RCC-5, RCC-8, Allen's interval algebra, point algebra, cardinal direction calculus, etc. qualreas is a Python framework for qualitative reasoning over networks of relation algebras, such as RCC-8, Allen's interval algebra, and Allen's algebra integrated with Time Points and situated in either Left- or Right-Branching Time.

Dr.Fill

Dr.Fill is a computer program that solves American-style crossword puzzles. It was developed by Matt Ginsberg and described by Ginsberg in an article in the Journal of Artificial Intelligence Research. Ginsberg claims in that article that Dr.Fill is among the top fifty crossword solvers in the world. == History == Dr.Fill participated in the 2012 American Crossword Puzzle Tournament, finishing 141st of approximately 650 entrants with a total score of just over 10,000 points. The appearance led to a variety of descriptions of Dr.Fill in the popular press, including The Economist, the San Francisco Chronicle and Gizmodo. A description of Dr.Fill appeared on the front page of the March 17, 2012 New York Times. Dr.Fill's score in 2013 improved to 10,550, which would have earned it 92nd place. Videos of the program solving the problems from the tournament are available on YouTube. The score in 2014 improved further to 10,790, which would have tied for 67th place. A video of the program solving the first six puzzles from that tournament, together with a talk given by Ginsberg describing its performance, can be found on YouTube. Dr.Fill has largely continued to improve since the 2014 event. In 2015, it scored 10,920 points and finished in 55th place. In 2016, it scored 11,205 points and finished in 41st place. In 2017, it scored 11,795 and finished in 11th place. In 2018, it scored 10,740 points, dropping to 78th place. Dr.Fill returned to "form" in 2019, once again scoring 11,795 and finishing in 14th place. The 2020 ACPT was cancelled due to COVID-19, and Dr.Fill participated as a non-competitor in the Boswords tournament instead. The program outperformed the humans, scoring 11,218 points (fast solves with a total of one mistake) while the best scoring human scored 10,994 points (slower solves but no mistakes). The 2021 ACPT was virtual, again due to COVID-19. The Dr.Fill effort was joined by the Berkeley NLP Group, creating a hybrid system named Berkeley Crossword Solver, and Dr.Fill won the main event, scoring 12,825 points with Erik Agard, the highest scoring human, scoring 12,810 points. The tournament was won by Tyler Hinman (12,760 points), who completed the championship puzzle perfectly in three minutes. Dr.Fill also completed that puzzle perfectly, but in 49 seconds. After winning the tournament, Ginsberg announced on August 8, 2021, that both he and Dr.Fill would be retiring from crosswords. == Algorithm == As described by Ginsberg, Dr.Fill works by converting a crossword to a weighted constraint satisfaction problem and then attempting to maximize the probability that the fill is correct. Probabilities for individual words or phrases in the puzzle are computed using relatively simple statistical techniques based on features such as previous appearances of the clue, number of Google hits for the fill, and so on. In doing this, Dr.Fill is attempting to solve a problem similar to that tackled by the Jeopardy!-playing program Watson; Dr.Fill runs on a laptop instead of a supercomputer and Ginsberg remarks that Watson is far more effective than Dr.Fill at solving this portion of the problem. Instead of computational horsepower, Dr.Fill relies on the constraints provided by crossing words to refine its answers. A variety of techniques from artificial intelligence are applied to attempt to find the most likely fill. These include a small amount of lookahead, limited discrepancy search, and postprocessing. Ginsberg remarks that postprocessing was chosen over branch and bound because the two techniques are mutually incompatible and postprocessing was found to be more effective in this domain.

Materialized view

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function. The process of setting up a materialized view is sometimes called materialization. This is a form of caching the results of a query, similar to memoization of the value of a function in functional languages, and it is sometimes described as a form of precomputation. As with other forms of precomputation, database users typically use materialized views for performance reasons, i.e. as a form of optimization. Materialized views that store data based on remote tables were also known as snapshots (deprecated Oracle terminology). In any database management system following the relational model, a view is a virtual table representing the result of a database query. Whenever a query or an update addresses an ordinary view's virtual table, the DBMS converts these into queries or updates against the underlying base tables. A materialized view takes a different approach: the query result is cached as a concrete ("materialized") table (rather than a view as such) that may be updated from the original base tables from time to time. This enables much more efficient access, at the cost of extra storage and of some data being potentially out-of-date. Materialized views find use especially in data warehousing scenarios, where frequent queries of the actual base tables can be expensive. In a materialized view, indexes can be built on any column. In contrast, in a normal view, it's typically only possible to exploit indexes on columns that come directly from (or have a mapping to) indexed columns in the base tables; often this functionality is not offered at all. == Implementations == === Oracle === Materialized views were implemented first by the Oracle Database: the Query rewrite feature was added from version 8i. Example syntax to create a materialized view in Oracle: === PostgreSQL === In PostgreSQL, version 9.3 and newer natively support materialized views. In version 9.3, a materialized view is not auto-refreshed, and is populated only at time of creation (unless WITH NO DATA is used). It may be refreshed later manually using REFRESH MATERIALIZED VIEW. In version 9.4, the refresh may be concurrent with selects on the materialized view if CONCURRENTLY is used. Example syntax to create a materialized view in PostgreSQL: === SQL Server === Microsoft SQL Server differs from other RDBMS by the way of implementing materialized view via a concept known as "Indexed Views". The main difference is that such views do not require a refresh because they are in fact always synchronized to the original data of the tables that compound the view. To achieve this, it is necessary that the lines of origin and destination are "deterministic" in their mapping, which limits the types of possible queries to do this. This mechanism has been realised since the 2000 version of SQL Server. Example syntax to create a materialized view in SQL Server: === Stream processing frameworks === Apache Kafka (since v0.10.2), Apache Spark (since v2.0), Apache Flink, Kinetica DB, Materialize, RisingWave, and Epsio all support materialized views on streams of data. === Others === Materialized views are also supported in Sybase SQL Anywhere. In IBM Db2, they are called "materialized query tables". ClickHouse supports materialized views that automatically refresh on merges. MySQL doesn't support materialized views natively, but workarounds can be implemented by using triggers or stored procedures or by using the open-source application Flexviews. Materialized views can be implemented in Amazon DynamoDB using data modification events captured by DynamoDB Streams. Google announced in 8 April 2020 the availability of materialized views for BigQuery as a beta release.

Xinhua–Sogou AI news anchor

Xinhua News Agency and Sogou of China developed an artificial intelligence (AI) for news reporting purposes. The AI was unveiled in 2018. It is touted to be the "world's first AI news anchor". == History == The AI was unveiled at the 2018 World Internet Conference in Wuzhen, Zhejiang, China. The AI devises avatars patterned after real life Xinhua anchors. The AI patterned after Qiu Hao spoke in Chinese, while the one derived from the likeness of Zhang Zhao speaks in English. The unveiling of the AI raised concerns of its impact on employment. Xinhua and Sogou unveiled Xin Xiaomeng, an AI with a female avatar in 2019. People's Daily followed suit by unveiling its own AI newscaster in 2023.