AI Essay Reddit

AI Essay Reddit — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Uncertain data

    Uncertain data

    In computer science, uncertain data is data that contains noise that makes it deviate from the correct, intended or original values. In the age of big data, uncertainty or data veracity is one of the defining characteristics of data. Data is constantly growing in volume, variety, velocity and uncertainty (1/veracity). Uncertain data is found in abundance today on the web, in sensor networks, within enterprises both in their structured and unstructured sources. For example, there may be uncertainty regarding the address of a customer in an enterprise dataset, or the temperature readings captured by a sensor due to aging of the sensor. In 2012 IBM called out managing uncertain data at scale in its global technology outlook report that presents a comprehensive analysis looking three to ten years into the future seeking to identify significant, disruptive technologies that will change the world. In order to make confident business decisions based on real-world data, analyses must necessarily account for many different kinds of uncertainty present in very large amounts of data. Analyses based on uncertain data will have an effect on the quality of subsequent decisions, so the degree and types of inaccuracies in this uncertain data cannot be ignored. Uncertain data is found in the area of sensor networks; text where noisy text is found in abundance on social media, web and within enterprises where the structured and unstructured data may be old, outdated, or plain incorrect; in modeling where the mathematical model may only be an approximation of the actual process. When representing such data in a database, an appropriate uncertain database model needs to be selected. == Example data model for uncertain data == One way to represent uncertain data is through probability distributions. Let us take the example of a relational database. There are three main ways to do represent uncertainty as probability distributions in such a database model. In attribute uncertainty, each uncertain attribute in a tuple is subject to its own independent probability distribution. For example, if readings are taken of temperature and wind speed, each would be described by its own probability distribution, as knowing the reading for one measurement would not provide any information about the other. In correlated uncertainty, multiple attributes may be described by a joint probability distribution. For example, if readings are taken of the position of an object, and the x- and y-coordinates stored, the probability of different values may depend on the distance from the recorded coordinates. As distance depends on both coordinates, it may be appropriate to use a joint distribution for these coordinates, as they are not independent. In tuple uncertainty, all the attributes of a tuple are subject to a joint probability distribution. This covers the case of correlated uncertainty, but also includes the case where there is a probability of a tuple not belonging in the relevant relation, which is indicated by all the probabilities not summing to one. For example, assume we have the following tuple from a probabilistic database: Then, the tuple has 10% chance of not existing in the database.

    Read more →
  • Pax Silica

    Pax Silica

    Pax Silica is a United States-led international initiative focused on strengthening and coordinating "trusted" supply chains for advanced technologies—especially semiconductors, artificial intelligence (AI) infrastructure, critical minerals, advanced manufacturing, logistics, and associated energy and data infrastructure. The initiative is coordinated by the US Department of State and was launched in December 2025 alongside the signing of the non-binding Pax Silica Declaration by an initial group of partner countries. The initiative describes itself as a "positive-sum" partnership intended to reduce "coercive dependencies" and improve resilience across the full technology stack, from mineral extraction and processing through chip manufacturing and computing infrastructure. US officials described Pax Silica as a framework for coordinating flagship projects and policy alignment across partner countries, including supply-chain mapping, investment and co-investment initiatives, and protection of critical infrastructure and sensitive technologies. Reuters reported discussions of projects linked to trade and logistics routes and an industrial park initiative in Israel. Gulf countries, such as the UAE and Qatar, are betting on attracting AI companies with cheap energy. Moreover, the UAE's potential to invest in Pax Silica's activities has been noted as a fundamental asset for the initiative. In early 2026, the U.S. announced plans to contribute $250M toward an investmest consortium that's intended to strengthen energy and critical mineral supply chains. == Launch and background == During the 2020s, governments increasingly treated supply-chain resilience in semiconductors, critical minerals, and AI-related computing infrastructure as a national-security priority, amid export controls, industrial policy measures, and geopolitical competition over the technologies underpinning advanced manufacturing and AI. Pax Silica was presented by US officials as an economic-security framework aimed at aligning policies and investment among "trusted partners" that host major technology firms and key industrial capacity. Pacific Forum's analyst Akhil Ramesh, writing for the National Interest magazine, described the initiative as understanding that: "economic security today is inseparable from control over energy, critical minerals, high-end manufacturing, and advanced models." On December 11, 2025, the US Department of State announced the inaugural Pax Silica Summit and a planned signing of the Pax Silica Declaration, describing Pax Silica as the Department's flagship effort on AI and supply-chain security. The initial summit was held in Washington, D.C. on December 12, 2025. The State Department fact sheet described cooperation areas including connectivity and data infrastructure, compute and semiconductors, advanced manufacturing, logistics, mineral refining and processing, and energy. == Membership == Pax Silica participation has been discussed in terms of (1) countries that have signed the declaration and (2) countries invited to summit discussions or publicly reported as prospective signatories but which had not (as of mid-January 2026) signed the declaration. === Countries that signed the Pax Silica Declaration === Seven countries signed the declaration at the December 12, 2025, summit in Washington, D.C.: Australia Israel Japan South Korea Singapore United Kingdom United States Some countries who attended the initial conversations did not immediately sign, while additional countries were invited to join after the discussions concluded. The following are the later signatory countries on the declaration: Greece Netherlands (joined December 17, 2025; "non-signing partner") Qatar (joined January 13, 2026) United Arab Emirates (joined January 14, 2026) India (joined February 20, 2026) Sweden (signed March 17, 2026) Finland (signed April 16, 2026) Philippines (signed April 17, 2026) Norway (signed May 6, 2026) === Countries invited / participating, but not yet signed === At launch, US materials and contemporaneous reporting described additional invited participants and observers, including: Canada – observer/participant in related discussions, per US briefing materials; not listed among signatories. Taiwan – participated in summit sessions according to a State Department briefing; not listed among signatories. The Organisation for Economic Co-operation and Development (OECD) and European Union were also noted by US officials as present in an observer capacity, but are not countries.

    Read more →
  • Tree (abstract data type)

    Tree (abstract data type)

    In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be connected to exactly one parent, except for the root node, which has no parent (i.e., the root node as the top-most node in the tree hierarchy). These constraints mean there are no cycles or "loops" (no node can be its own ancestor), and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes (parent and children nodes of a node under consideration, if they exist) in a single straight line (called edge or link between two adjacent nodes). Binary trees are a commonly used type, which constrain the number of children for each parent to at most two. When the order of the children is specified, this data structure corresponds to an ordered tree in graph theory. A value or pointer to other data may be associated with every node in the tree, or sometimes only with the leaf nodes, which have no children nodes. The abstract data type (ADT) can be represented in a number of ways, including a list of parents with pointers to children, a list of children with pointers to parents, or a list of nodes and a separate list of parent-child relations (a specific type of adjacency list). Representations might also be more complicated, for example using indexes or ancestor lists for performance. Trees as used in computing are similar to but can be different from mathematical constructs of trees in graph theory, trees in set theory, and trees in descriptive set theory. == Terminology == A node is a structure which may contain data and connections to other nodes, sometimes called edges or links. Each node in a tree has zero or more child nodes, which are below it in the tree (by convention, trees are drawn with descendants going downwards). A node that has a child is called the child's parent node (or superior). All nodes have exactly one parent, except the topmost root node, which has none. A node might have many ancestor nodes, such as the parent's parent. Child nodes with the same parent are sibling nodes. Typically siblings have an order, with the first one conventionally drawn on the left. Some definitions allow a tree to have no nodes at all, in which case it is called empty. An internal node (also known as an inner node, inode for short, or branch node) is any node of a tree that has child nodes. Similarly, an external node (also known as an outer node, leaf node, or terminal node) is any node that does not have child nodes. The height of a node is the length of the longest downward path to a leaf from that node. The height of the root is the height of the tree. The depth of a node is the length of the path to its root (i.e., its root path). Thus the root node has depth zero, leaf nodes have height zero, and a tree with only a single node (hence both a root and leaf) has depth and height zero. Conventionally, an empty tree (tree with no nodes, if such are allowed) has height −1. Each non-root node can be treated as the root node of its own subtree, which includes that node and all its descendants. Other terms used with trees: Neighbor Parent or child. Ancestor A node reachable by repeated proceeding from child to parent. Descendant A node reachable by repeated proceeding from parent to child. Also known as subchild. Degree For a given node, its number of children. A leaf, by definition, has degree zero. Degree of tree The degree of a tree is the maximum degree of a node in the tree. Distance The number of edges along the shortest path between two nodes. Level The level of a node is the number of edges along the unique path between it and the root node. This is the same as depth. Width The number of nodes in a level. Breadth The number of leaves. Complete tree A tree with every level filled, except the last. Forest A set of one or more disjoint trees. Ordered tree A rooted tree in which an ordering is specified for the children of each vertex. Size of a tree Number of nodes in the tree. == Common operations == Enumerating all the items Enumerating a section of a tree Searching for an item Adding a new item at a certain position on the tree Deleting an item Pruning: Removing a whole section of a tree Grafting: Adding a whole section to a tree Finding the root for any node Finding the lowest common ancestor of two nodes === Traversal and search methods === Stepping through the items of a tree, by means of the connections between parents and children, is called walking the tree, and the action is a walk of the tree. Often, an operation might be performed when a pointer arrives at a particular node. A walk in which each parent node is traversed before its children is called a pre-order walk; a walk in which the children are traversed before their respective parents are traversed is called a post-order walk; a walk in which a node's left subtree, then the node itself, and finally its right subtree are traversed is called an in-order traversal. (This last scenario, referring to exactly two subtrees, a left subtree and a right subtree, assumes specifically a binary tree.) A level-order walk effectively performs a breadth-first search over the entirety of a tree; nodes are traversed level by level, where the root node is visited first, followed by its direct child nodes and their siblings, followed by its grandchild nodes and their siblings, etc., until all nodes in the tree have been traversed. == Representations == There are many different ways to represent trees. In working memory, nodes are typically dynamically allocated records with pointers to their children, their parents, or both, as well as any associated data. If of a fixed size, the nodes might be stored in a list. Nodes and relationships between nodes might be stored in a separate special type of adjacency list. In relational databases, nodes are typically represented as table rows, with indexed row IDs facilitating pointers between parents and children. Nodes can also be stored as items in an array, with relationships between them determined by their positions in the array (as in a binary heap). A binary tree can be implemented as a list of lists: the head of a list (the value of the first term) is the left child (subtree), while the tail (the list of second and subsequent terms) is the right child (subtree). This can be modified to allow values as well, as in Lisp S-expressions, where the head (value of first term) is the value of the node, the head of the tail (value of second term) is the left child, and the tail of the tail (list of third and subsequent terms) is the right child. Ordered trees can be naturally encoded by finite sequences, for example with natural numbers. == Examples of trees and non-trees == == Type theory == As an abstract data type, the abstract tree type T with values of some type E is defined, using the abstract forest type F (list of trees), by the functions: value: T → E children: T → F nil: () → F node: E × F → T with the axioms: value(node(e, f)) = e children(node(e, f)) = f In terms of type theory, a tree is an inductive type defined by the constructors nil (empty forest) and node (tree with root node with given value and children). == Mathematical terminology == Viewed as a whole, a tree data structure is an ordered tree, generally with values attached to each node. Concretely, it is (if required to be non-empty): A rooted tree with the "away from root" direction (a more narrow term is an "arborescence"), meaning: A directed graph, whose underlying undirected graph is a tree (any two vertices are connected by exactly one simple path), with a distinguished root (one vertex is designated as the root), which determines the direction on the edges (arrows point away from the root; given an edge, the node that the edge points from is called the parent and the node that the edge points to is called the child), together with: an ordering on the child nodes of a given node, and a value (of some data type) at each node. Often trees have a fixed (more properly, bounded) branching factor (outdegree), particularly always having two child nodes (possibly empty, hence at most two non-empty child nodes), hence a "binary tree". Allowing empty trees makes some definitions simpler, some more complicated: a rooted tree must be non-empty, hence if empty trees are allowed the above definition instead becomes "an empty tree or a rooted tree such that ...". On the other hand, empty trees simplify defining fixed branching factor: with empty trees allowed, a binary tree is a tree such that every node has exactly two children, each of which is a tree (possibly empty). == Applications == Trees are commonly used to represent or manipulate hierarchical data in ap

    Read more →
  • Darkforest

    Darkforest

    Darkforest is a computer go program developed by Meta Platforms, based on deep learning techniques using a convolutional neural network. Its updated version Darkfores2 combines the techniques of its predecessor with Monte Carlo tree search. The MCTS effectively takes tree search methods commonly seen in computer chess programs and randomizes them. With the update, the system is known as Darkfmcts3. Darkforest is of similar strength to programs like CrazyStone and Zen. It has been tested against a professional human player at the 2016 UEC cup. Google's AlphaGo program won against a professional player in October 2015 using a similar combination of techniques. Darkforest is named after Liu Cixin's science fiction novel The Dark Forest. == Background == Competing with top human players in the ancient game of Go has been a long-term goal of artificial intelligence. Go's high branching factor makes traditional search techniques ineffective, even on cutting-edge hardware, and Go's evaluation function could change drastically with one stone change. However, by using a Deep Convolutional Neural Network designed for long-term predictions, Darkforest has been able to substantially improve the win rate for bots over more traditional Monte Carlo Tree Search based approaches. === Matches === Against human players, Darkfores2 achieves a stable 3d ranking on KGS Go Server, which roughly corresponds to an advanced amateur human player. However, after adding Monte Carlo Tree Search to Darkfores2 to create a much stronger player named darkfmcts3, it can achieve a 5d ranking on the KGS Go Server. ==== Against other AI ==== darkfmcts3 is on par with state-of-the-art Go AIs such as Zen, DolBaram and Crazy Stone, but lags behind AlphaGo. It won 3rd place in January 2016 KGS Bot Tournament against other Go AIs. === News coverage === After Google's AlphaGo won against Fan Hui in 2015, Facebook made its AI's hardware designs public, alongside releasing the code behind DarkForest as open-source, in addition to heavy recruiting to strengthen its team of AI engineers. == Style of play == Darkforest uses a neural network to sort through the 10100 board positions, and find the most powerful next move. However, neural networks alone cannot match the level of good amateur players or the best search-based Go engines, and so Darkfores2 combines the neural network approach with a search-based machine. A database of 250,000 real Go games were used in the development of Darkforest, with 220,000 used as a training set and the rest used to test the neural network's ability to predict the next moves played in the real games. This allows Darkforest to accurately evaluate the global state of the board, but local tactics were still poor. Search-based engines have poor global evaluation, but are good at local tactics. Combining these two approaches is difficult because search-based engines work much faster than neural networks, a problem which was solved in Darkfores2 by running the processes in parallel with frequent communication between the two. === Conventional strategies === Go is generally played by analyzing the position of the stones on the board. Various advanced players have described it as playing in some part subconsciously. Unlike chess and checkers, where AI players can simply look further forward at moves than human players, but with each round of Go having on average 250 possible moves, that approach is ineffective. Instead, neural networks copy human play by training the AI systems on images of successful moves, the AI can effectively learn how to interpret how the board looks, as many grandmasters do. In November 2015, Facebook demonstrated the combination of MCTS with neural networks, which played with a style that "felt human". === Flaws === It has been noted that Darkforest still has flaws in its playstyle. The bot sometimes plays tenuki ("move elsewhere") pointlessly when local powerful moves are required. When the bot is losing, it shows the typical behavior of MCTS, it plays bad moves and loses more. The Facebook AI team has acknowledged these as areas of future improvement. == Program architecture == The family of Darkforest computer go programs is based on convolution neural networks. The most recent advances in Darkfmcts3 combined convolutional neural networks with more traditional Monte Carlo tree search. Darkfmcts3 is the most advanced version of Darkforest, which combines Facebook's most advanced convolutional neural network architecture from Darkfores2 with a Monte Carlo tree search. Darkfmcts3 relies on a convolution neural networks that predicts the next k moves based on the current state of play. It treats the board as a 19x19 image with multiple channels. Each channel represents a different aspect of board information based upon the specific style of play. For standard and extended play, there are 21 and 25 different channels, respectively. In standard play, each players liberties are represented as six binary channels or planes. The respective plane is true if the player one, two, or three or more liberties available. Ko (i.e. illegal moves) is represented as one binary plane. Stone placement for each opponent and empty board positions are represented as three binary planes, and the duration since a stone has been placed is represented as real numbers on two planes, one for each player. Lastly, the opponents rank is represented by nine binary planes, where if all are true, the player is a 9d level, if 8 are true, an 8d level, and so forth. Extended play additionally considers the border (binary plane that is true at the border), position mask (represented as distance from the board center, i.e. x ( − 0.5 ∗ d i s t a n c e 2 ) {\displaystyle x^{(-0.5distance^{2})}} , where x {\displaystyle x} is a real number at a position), and each player's territory (binary, based on which player a location is closer to). Darkfmct3 uses a 12-layer full convolutional network with a width of 384 nodes without weight sharing or pooling. Each convolutional layer is followed by a rectified linear unit, a popular activation function for deep neural networks. A key innovation of Darkfmct3 compared to previous approaches is that it uses only one softmax function to predict the next move, which enables the approach to reduce the overall number of parameters. Darkfmct3 was trained against 300 random selected games from an empirical dataset representing different game stages. The learning rate was determined by vanilla stochastic gradient descent. Darkfmct3 synchronously couples a convolutional neural network with a Monte Carlo tree search. Since the convolutional neural network is computationally taxing, the Monte Carlo tree search focuses computation on the more likely game play trajectories. By running the neural network synchronously with the Monte Carlo tree search, it is possible to guarantee that each node is expanded by the moves predicted by the neural network. == Comparison with other systems == Darkfores2 beats Darkforest, its neural network-only predecessor, around 90% of the time, and Pachi, one of the best search-based engines, around 95% of the time. On the Kyu rating system, Darkforest holds a 1-2d level. Darkfores2 achieves a stable 3d level on KGS Go Server as a ranked bot. With the added Monte Carlo tree search, Darkfmcts3 with 5,000 rollouts beats Pachi with 10k rollouts in all 250 games; with 75k rollouts it achieves a stable 5d level in KGS server, on par with state-of-the-art Go AIs (e.g., Zen, DolBaram, CrazyStone); with 110k rollouts, it won the 3rd place in January KGS Go Tournament.

    Read more →
  • Teleradiology

    Teleradiology

    Teleradiology is the transmission of radiological patient images from procedures such as x-rays, Computed tomography (CT), and MRI imaging, from one location to another for the purposes of sharing studies with other radiologists and physicians. Teleradiology allows radiologists to provide services without actually having to be at the location of the patient. This is particularly important when a sub-specialist such as an MRI radiologist, neuroradiologist, pediatric radiologist, or musculoskeletal radiologist is needed, since these professionals are generally only located in large metropolitan areas working during daytime hours. Teleradiology allows for specialists to be available at all times. Teleradiology utilizes standard network technologies such as the Internet, telephone lines, wide area networks, local area networks (LAN) and the latest advanced technologies such as medical cloud computing. Specialized software is used to transmit the images and enable the radiologist to effectively analyze potentially hundreds of images of a given study. Technologies such as advanced graphics processing, voice recognition, artificial intelligence, and image compression are often used in teleradiology. Through teleradiology and mobile DICOM viewers, images can be sent to another part of the hospital or to other locations around the world with equal effort. Teleradiology is a growth technology given that imaging procedures are growing approximately 15% annually against an increase of only 2% in the radiologist population. == Reports == Teleradiology services commonly provide either preliminary or final interpretations of medical imaging studies. Preliminary reads are frequently used in emergency settings to support immediate clinical decisions and may include direct communication of critical findings to the referring physician. Some providers report turnaround times of approximately 30 minutes for emergency cases, with faster processing for time-sensitive conditions such as stroke. Final reads are definitive and used in official patient records and billing. These reports typically include all relevant findings and may require access to prior imaging and clinical data. Teleradiology is also employed to provide off-hour or overflow coverage for healthcare institutions lacking continuous on-site radiology staffing. == Subspecialties == Some teleradiologists are fellowship trained and have a wide variety of subspecialty expertise including such difficult-to-find areas as neuroradiology, pediatric neuroradiology, thoracic imaging, musculoskeletal radiology, mammography, and nuclear cardiology. There are also various medical practitioners who are not radiologists that take on studies in radiology to become sub specialists in their respected fields, an example of this is dentistry where oral and maxillofacial radiology allows those in dentistry to specialize in the acquisition and interpretation of radiographic imaging studies performed for diagnosis of treatment guidance for conditions affecting the maxillofacial region. == Teleultrasound == Teleradiology infrastructure has also been adapted to support point-of-care ultrasound (POCUS) in remote and austere environments. In teleultrasound—also known as telementored ultrasound—a remote expert guides a non-specialist in real time during image acquisition. This technique has been successfully demonstrated in extreme settings, including aboard the International Space Station, on Mount Everest, and during helicopter flight. == Regulations == In the United States, Medicare and Medicaid laws require the teleradiologist to be on U.S. soil in order to qualify for reimbursement of the Final Read. In addition, advanced teleradiology systems must also be HIPAA compliant, which helps to ensure patients' privacy. HIPAA (Health Insurance Portability and Accountability Act of 1996) is a uniform, federal floor of privacy protections for consumers. It limits the ways that entities can use patients' personal information and protects the privacy of all medical information no matter what form it is in. Quality teleradiology must abide by important HIPAA rules to ensure patients' privacy is protected. Also State laws governing the licensing requirements and medical malpractice insurance coverage required for physicians vary from state to state. Ensuring compliance with these laws is a significant overhead expense for larger multi-state teleradiology groups. Medicare (Australia) has identical requirements to that of the United States, where the guidelines are provided by the Department of Health and Ageing, and government based payments fall under the Health Insurance Act. The regulations in Australia are also conducted at both federal and state levels, ensuring that strict guidelines are adhered to at all times, with regular yearly updates and amendments are introduced (usually around March and November of every year), ensuring that the legislation is kept up to date with changes in the industry. One of the most recent changes to Medicare and radiology / teleradiology in Australia was the introduction of the Diagnostic Imaging Accreditation Scheme (DIAS) on 1 July 2008. DIAS was introduced to further improve the quality of Diagnostic Imaging and to amend the Health Insurance Act. == Industry growth == Until the late 1990s teleradiology was primarily used by individual radiologists to interpret occasional emergency studies from offsite locations, often in the radiologists home. The connections were made through standard analog phone lines. Teleradiology expanded rapidly as the growth of the internet and broad band combined with new CT scanner technology to become an essential tool in trauma cases in emergency rooms throughout the country. The occasional 2–3 x-ray studies a week soon became 3–10 CT scans, or more, a night. Because ER physicians are not trained to read CT scans or MRIs, radiologists went from working 8–10 hours a day, five and half days a week to a schedule of 24 hours a day, 7 days a week coverage. This became a particularly acute challenge in smaller rural facilities that only had one solo radiologist with no other to share call. These circumstances spawned a post-dot.com boom of firms and groups that provided medical outsourcing, off-site teleradiology on-call services to hospitals and Radiology Groups around the country. As an example, a teleradiology firm might cover trauma at a hospital in Indiana with doctors based in Texas. Some firms even used overseas doctors in locations like Australia and India. Nighthawk, founded by Paul Berger, was the first to station U.S. licensed radiologists overseas (initially Australia and later Switzerland) to maximize the time zone difference to provide nightcall in U.S. hospitals. Currently, teleradiology firms are facing pricing pressures. Industry consolidation is likely as there are more than 500 of these firms, large and small, throughout the United States.

    Read more →
  • Daisy Intelligence

    Daisy Intelligence

    Daisy Intelligence is a Canadian artificial intelligence (AI) company that provides data analysis services to help retailers, mainly grocers and supermarkets, to determine optimal pricing and promotional mix. The company also helps insurance companies detect fraudulent claims. The company uses a subset of AI known as reinforcement learning. In October 2019, the company moved from the suburban Vaughan, Ontario, to downtown Toronto, joining other AI and technology startups concentrated in the King Street East area. In 2019, the company was ranked No. 39 on The Globe and Mail's annual list of Canada's "top growing companies by three-year revenue growth."

    Read more →
  • Polythematic Structured Subject Heading System

    Polythematic Structured Subject Heading System

    Polythematic Structured Subject Heading System (abbreviated as PSH from the Czech Polytematický Strukturovaný Heslář) is a bilingual Czech–English controlled vocabulary of subject headings developed and maintained by the National Technical Library (the former State Technical Library) in Prague. It was designed for describing and searching information resources according to their subject. PSH contains more than 13,900 terms, which cover the main fields of human knowledge. Because of its release in SKOS, PSH can be used not only for describing documents in a library, but also for indexing web pages. Everyone can use PSH for free. PSH is a part of the Linked Open Data cloud diagram (LOD cloud diagram). The image of the LOD cloud diagram shows datasets that have been published in Linked Data format, by contributors to the Linked Open Data community project and other individuals and organisations. == History and development == The PSH preparation project started in 1993, supported by several grants from the Czech Ministry of Culture and Czech Ministry of Education, Youth and Sport. Since 1995, PSH has been used for indexing the State Technical Library's documents. Starting 1997, PSH has been distributed to other libraries and companies, originally as a commercial, paid product; since 2009 for free. In 2000, the State Technical Library received a grant from the Ministry of Culture to translate PSH into English. The next milestone in its development was its releasing in the SKOS format, in 2009. The vast majority of new subject headings is suggested and approved by the indexing experts from the National Technical Library. However, the users and public can also make suggestions, using an online form, which are then assessed by the experts. The main decisions about the development and the future of PSH are done by the Committee for Coordination of Polythematic Structured Subject Heading System. The Committee consists of specialists from the National Technical Library and cooperating institutions, and representatives from the libraries and companies which use PSH. The Committee meets once a year in the National Technical Library; in the meantime, the members communicate using an electronic mailing list. == Browsing PSH == PSH Browser was released in June 2009. It serves for browsing the PSH system and its distribution in SKOS format. This tool navigates users through PSH from general to specific terms. Users can also use the Search field. PSH manager tool was released in 2012. It serves as an indexing tool especially to catalogers. Catalogers can easy orient in its clear structure. All the terms in PSH manager contain link to the catalogue of NTK. There can be also viewed the record in MARC21 format. == Autoindexing == In 2012 was released beta version of autoindexing application. It is accessible on Autoindexing. Users enter chosen text into indexing field and activate indexing. In few seconds the terms describing content are displayed. == PSH structure == PSH is a tree structure with 44 thematic sections. Subject headings are included in a hierarchy of six (or seven) levels according to their semantic content and specificity. There are hierarchical, associative ("see also") and equivalence ("see") relations in PSH. Hierarchical relations are represented by broader and narrower terms (e.g. physical diagnostic methods is broader term to electrocardiography, and on the other hand, electrocardiography is narrower term to physical diagnostic methods). Equivalence relations link subject headings with their nonpreferred versions (e.g. electrocardiography and ECG). Moreover, associative relations are used to link related subject headings from different parts of PSH, regardless their affiliation to a section, (e.g. electrocardiography: see also cardiology). Every subject heading belongs to just one section, which has its own two-character abbreviation, assigned to every subject heading of the section. This enables users to recognize affiliation of subject headings from lower levels to the thematic sections. The 44 thematic sections have following root nodes: == PSH formats == The main format for storage, maintenance and sharing PSH is the MARC 21 Format for Authority Data, which is implemented in library automated systems. PSH is also available in SKOS, using RDF/XML syntax, which is a version suitable for web distribution. Single headings can be accessed on the PSH website through URI links. Alternatively, the whole vocabulary can be downloaded in one file. It is possible to display tags from PSH (metadata snippets – Dublin Core and CommonTag), which can be embedded in an HTML document to provide its semantic description in a machine-readable way. == New subject headings == New subject headings are primarily obtained through the log analysis in the National Technical Library's on-line catalogue of documents, which are the terms used by end-users when searching various documents. Google Analytics service is now used for gaining search queries used by users. Within the data analysis, users queries are divided into seven categories that contain the title of the document, person, subject, action, institution, geographical terms and others. Then the candidates for new preferred terms and non-preferred terms are identified in the subject category. Users can suggest preferred or non-preferred terms through the web form or via e-mail psh(@)techlib.cz. == PSH and Creative Commons == PSH/SKOS has been available under the Creative Commons License CC BY 3.0 CZ (Attribution-ShareAlike 3.0 Czech Republic)since 2011. Users are free to copy, distribute, display and perform the work and make derivative works, but they must give the original author credit and if they alter, transform, or build upon this work, they have to distribute the resulting work only under a licence identical to this one. Users can download all data in one zip file, which is continuously updated.

    Read more →
  • Hyper basis function network

    Hyper basis function network

    In machine learning, a Hyper basis function network, or HyperBF network, is a generalization of radial basis function (RBF) networks concept, where the Mahalanobis-like distance is used instead of the Euclidean distance measure. Hyper basis function networks were first introduced by Poggio and Girosi in the 1990 paper “Networks for Approximation and Learning”. == Network Architecture == The typical HyperBF network structure consists of a real input vector x ∈ R n {\displaystyle x\in \mathbb {R} ^{n}} , a hidden layer of activation functions and a linear output layer. The output of the network is a scalar function of the input vector, ϕ : R n → R {\displaystyle \phi :\mathbb {R} ^{n}\to \mathbb {R} } , is given by where N {\displaystyle N} is a number of neurons in the hidden layer, μ j {\displaystyle \mu _{j}} and a j {\displaystyle a_{j}} are the center and weight of neuron j {\displaystyle j} . The activation function ρ j ( | | x − μ j | | ) {\displaystyle \rho _{j}(||x-\mu _{j}||)} at the HyperBF network takes the following form where R j {\displaystyle R_{j}} is a positive definite d × d {\displaystyle d\times d} matrix. Depending on the application, the following types of matrices R j {\displaystyle R_{j}} are usually considered R j = 1 2 σ 2 I d × d {\displaystyle R_{j}={\frac {1}{2\sigma ^{2}}}\mathbb {I} _{d\times d}} , where σ > 0 {\displaystyle \sigma >0} . This case corresponds to the regular RBF network. R j = 1 2 σ j 2 I d × d {\displaystyle R_{j}={\frac {1}{2\sigma _{j}^{2}}}\mathbb {I} _{d\times d}} , where σ j > 0 {\displaystyle \sigma _{j}>0} . In this case, the basis functions are radially symmetric, but are scaled with different width. R j = d i a g ( 1 2 σ j 1 2 , . . . , 1 2 σ j z 2 ) I d × d {\displaystyle R_{j}=diag\left({\frac {1}{2\sigma _{j1}^{2}}},...,{\frac {1}{2\sigma _{jz}^{2}}}\right)\mathbb {I} _{d\times d}} , where σ j i > 0 {\displaystyle \sigma _{ji}>0} . Every neuron has an elliptic shape with a varying size. Positive definite matrix, but not diagonal. == Training == Training HyperBF networks involves estimation of weights a j {\displaystyle a_{j}} , shape and centers of neurons R j {\displaystyle R_{j}} and μ j {\displaystyle \mu _{j}} . Poggio and Girosi (1990) describe the training method with moving centers and adaptable neuron shapes. The outline of the method is provided below. Consider the quadratic loss of the network H [ ϕ ∗ ] = ∑ i = 1 N ( y i − ϕ ∗ ( x i ) ) 2 {\displaystyle H[\phi ^{}]=\sum _{i=1}^{N}(y_{i}-\phi ^{}(x_{i}))^{2}} . The following conditions must be satisfied at the optimum: where R j = W T W {\displaystyle R_{j}=W^{T}W} . Then in the gradient descent method the values of a j , μ j , W {\displaystyle a_{j},\mu _{j},W} that minimize H [ ϕ ∗ ] {\displaystyle H[\phi ^{}]} can be found as a stable fixed point of the following dynamic system: where ω {\displaystyle \omega } determines the rate of convergence. Overall, training HyperBF networks can be computationally challenging. Moreover, the high degree of freedom of HyperBF leads to overfitting and poor generalization. However, HyperBF networks have an important advantage that a small number of neurons is enough for learning complex functions.

    Read more →
  • Anomaly detection

    Anomaly detection

    In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data. Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially searched for clear rejection or omission from the data to aid statistical analysis, for example to compute the mean or standard deviation. They were also removed to better predictions from models such as linear regression, and more recently their removal aids the performance of machine learning algorithms. However, in many applications anomalies themselves are of interest and are the observations most desirous in the entire data set, which need to be identified and separated from noise or irrelevant outliers. Three broad categories of anomaly detection techniques exist. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of labelled data and the inherent unbalanced nature of the classes. Semi-supervised anomaly detection techniques assume that some portion of the data is labelled. This may be any combination of the normal or anomalous data, but more often than not, the techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the model. Unsupervised anomaly detection techniques assume the data is unlabelled and are by far the most commonly used due to their wider and relevant application. == Definition == Many attempts have been made in the statistical and computer science communities to define an anomaly. The most prevalent ones include the following, and can be categorised into three groups: those that are ambiguous, those that are specific to a method with pre-defined thresholds usually chosen empirically, and those that are formally defined: === Ill defined === An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. Anomalies are instances or collections of data that occur very rarely in the data set and whose features differ significantly from most of the data. An outlier is an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data. An anomaly is a point or collection of points that is relatively distant from other points in multi-dimensional space of features. Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour. === Specific === Let T be observations from a univariate Gaussian distribution and O a point from T. Then the z-score for O is greater than a pre-selected threshold if and only if O is an outlier. == History == === Intrusion detection === The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly over time. Initially, it was a manual process where system administrators would monitor for unusual activities, such as a vacationing user's account being accessed or unexpected printer activity. This approach was not scalable and was soon superseded by the analysis of audit logs and system logs for signs of malicious behavior. By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to investigate incidents, as the volume of data made it impractical for real-time monitoring. The affordability of digital storage eventually led to audit logs being analyzed online, with specialized programs being developed to sift through the data. These programs, however, were typically run during off-peak hours due to their computational intensity. The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as it was generated, allowing for immediate detection of and response to attacks. This marked a significant shift towards proactive intrusion detection. As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently implemented across large and complex network environments, adapting to the ever-growing variety of security threats and the dynamic nature of modern computing infrastructures. == Applications == Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and law enforcement. === Intrusion detection === Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning. Types of features proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations. The counterpart of anomaly detection in intrusion detection is misuse detection. === Fintech fraud detection === Anomaly detection is vital in fintech for fraud prevention. === Preprocessing === Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy. === Video surveillance === Anomaly detection has become increasingly vital in video surveillance to enhance security and safety. With the advent of deep learning technologies, methods using Convolutional Neural Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying unusual activities or behaviors in video data. These models can process and analyze extensive video feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security threats or safety violations. An important aspect for video surveillance is the development of scalable real-time frameworks. Such pipelines are required for processing multiple video streams with low computational resources. === IT infrastructure === In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and reliability of services. These are complex systems, composed of many interactive elements and large data quantities, requiring methods to process and reduce this data into a human and machine interpretable format. Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks are employed to track and manage system performance and user experience. Detected anomalies can help identify and pre-empt potential performance degradations or system failures, thus maintaining productivity and business process effectiveness. === IoT systems === Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems. It helps in identifying system failures and security breaches in complex networks of IoT devices. The methods must manage real-time data, diverse device types, and scale effectively. Garg et al. have introduced a multi-stage anomaly detection framework that improves upon traditional methods by incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security and operational reliability in smart infrastructure and industrial IoT systems. === Petroleum industry === Anomaly detection is crucial in the petroleum industry for monitoring critical machinery. A 2015 paper proposed a novel segmentation algorithm using support vector machines to analyze sensor data for real-time anomaly detection. === Oil and gas pipeline monitoring === In the oil and gas sector, anomaly detection is not just crucial for maintenance and safety, but also for environmental protection. Aljameel et al. propose an advanced machine learning-based model for detecting minor leaks in oil and gas pipelines, a task traditional methods may miss.

    Read more →
  • Danilo McGarry

    Danilo McGarry

    Danilo McGarry (born 1985) is a British tech executive, writer, and speaker who has led AI initiatives in finance and healthcare. == Early life and education == Danilo McGarry was born in 1985. He received a Bachelor of Science (BSc) with honors in Business Management from the University of Bath. == Career == McGarry began his career in technology and financial services, with positions at companies including Motorola, JPMorgan Chase, and BNP Paribas. He later joined the Royal Bank of Canada (RBC) as an analyst and later became a director, where he led transformation initiatives involving robotic process automation (RPA) in the bank's capital markets operations. McGarry subsequently moved into leadership roles focused on AI. At Citigroup, he served as Head of Artificial Intelligence and Machine Learning, where he launched an AI-driven robotics and automation initiative. At UnitedHealth Group (UHG), he held a senior role in the company's automation program, which utilized a large fleet of software robots in its healthcare operations. In December 2019, McGarry was appointed Global Head of AI & Automation at Alter Domus, a multinational financial services firm. In this role, he established a new AI and automation department. He left the firm in late 2023 to establish his businesses. In 2025, the Chartered Institute of Personnel and Development (CIPD) appointed him as its strategic adviser on artificial intelligence.

    Read more →
  • Versata

    Versata

    Versata is a privately held software company, one of several business units under the ESW Capital umbrella. Versata acquires underperforming or financially struggling enterprise software companies, integrates them into their portfolio, and makes operational changes to improve the viability and performance of the companies. == History == === Early years (1991–2000) === This company was founded in 1991 with the name Image Innovations; Naren Bakshi was co-founder and president, Kevin Fletcher Tweedy was vice president of technology, and they sold a development tool set named Image Application WorkBench that worked with Plexus Software's imaging platform. In 1997, the company name changed to Vision Software. They sold a small suite of software: Vision Builder for accelerated coding; and Vision StoryBoard Pro for creating software documentation. In 1998, their flagship product was a Java development tool named Vision JADE. In January 2000, the company changed names again, this time to Versata, and their e-business automation system, Versata Logic Suite, had three components: Versata Logic Server to host business rules written in Java, Versata Studio for developing the business rules, and Versata Connectors for connecting the logic server to IBM database servers. === Public company (2000–2006) === They went public in March 2000 during the dot-com bubble, raising about $94 million and reaching a market capitalization of over $2.5 billion despite reporting just $13 million in revenue and a $21 million loss in the prior year. In November 2000, Versata expanded into the business workflow area with the acquisition of Verve, Inc. and its workflow management system by the same name. From early 2001 through mid-2003, Versata's revenues were in quarter-over-quarter decline until Alan Baratz took over as CEO. Five consecutive quarters of growth followed until early 2005, when revenues once again took a downward plunge. In mid-2005, the company was notified by NASDAQ that it no longer met NASDAQ's requirements for continued listing, related to maintenance of a minimum amount of shareholder's equity, market value, or net income. In July 2005, Versata was delisted from NASDAQ and publicly traded on the OTC (also known as the Pink Sheets). == Versata, a business unit of ESW Capital == In January 2006, Austin-based Trilogy, Inc. acquired the company and took it private. Trilogy then proceeded to merge portions of Trilogy, specifically, Trilogy Technology Group, into Versata and began acquiring further companies, reorganizing dramatically and offshoring most technical positions to its office in Bangalore, India. From 2006 to 2008, Versata continued to make acquisitions mostly in US. Most of the employees in the acquired companies were laid -off with the majority work being offshored to its India office in Bangalore. In early 2009, Versata made another major overhaul of its business model when it asked all its employees in India to work as contractors through oDesk for a gDev which is an entity incorporated by Trilogy to manage its outsourcing activities. The only employees left in Versata were the ones in US. == Acquisitions == a Corizon was acquired by Metatomix, while Metatomix was part of Versata. b Infopia was acquired by Everest Software, while Everest Software was part of Versata. c Symphony Commerce was acquired by Quantum Retail, while Quantum Retail was part of Versata. == Legal disputes == === Patent infringement and "poison pill" lawsuits with Selectica === The legal disputes with Selectica began in 2004 (before Trilogy acquired Versata in January 2006) and lasted until 2010. While there were many suits and counter-suits, they largely centered around three issues: 2004–2006: Patent infringement in configure, price, and quote (CPQ) software 2005–2007: Patent infringement in contract lifecycle management (CLM) software 2008–2010: The "poison pill" lawsuit In 2004, Selectica and Trilogy had competing CPQ software: Selectica sold Solutions Advisor and Deal Optimization, while Trilogy sold Selling Chain. In April of that year, Trilogy Software sued Selectica for patent infringement. In 2005, before the court ruling, Trilogy made several offers to buy Selectica, but the board rejected them. In January 2006, the court ordered Selectica to pay Trilogy $7.5 million in damages. Four days after the January 2006 judgment in the first lawsuit, Trilogy announced its acquisition of Versata for an undisclosed amount. In 2005, Selectica had acquired the Determine CLM software platform, which included features that overlapped with some offered by Versata. In October 2006, Versata filed a second patent infringement lawsuit. The case was settled in 2007, with Selectica agreeing to pay Trilogy and Versata $10 million, plus up to $7.5 million in additional contingent payments. In 2008, Versata began acquiring Selectica stock. By December, Selectica's board amended its shareholder rights plan to adopt a "poison pill" with an unusually low trigger threshold: if any shareholder acquired more than 4.99% of company stock, their ownership would be diluted. The board explained that the move was meant to protect Selectica's net operating losses (NOLs), which were tax-deductible if the company returned to profitability. Under IRS Section 382, a significant change in stock ownership could cause those NOLs to be disqualified. Versata intentionally triggered the poison pill and also offered to sell back the stocks at a profit (greenmailing them), which prompted a legal dispute over whether Selectica's board had the authority to set such a low threshold and whether defending NOLs justified triggering shareholder dilution. The case ultimately reached the Delaware Supreme Court, which upheld the poison pill in October 2010, ruling in favor of Selectica. === Intellectual property lawsuit over joint development with Sun Microsystems === In 1998, Sun Microsystems hired Trilogy to help Sun's developers in California create a software configurator (later named the WC5 Configurator) that Sun's customers could use to modify products they wanted to buy, customizing products to have the features they wanted. Trilogy worked on the WC5 Configurator for several years, then Sun transferred the work to Oracle to finish. Trilogy believed that they owned the copyright to the work they'd done for Sun, and in 2006 after the merger with Versata they sued Sun for more than $100 million in damages. In April 2009, a jury ruled in favor of Sun and rejected Versata's claims. === Patent lawsuit and ruling on patents of abstract ideas with SAP === SAP developed Pricing Engine, a component in their enterprise resource planning (ERP) system. It competed with an older Trilogy product called Pricer, which was part of Trilogy's Selling Chain platform in the mid-1990s before they merged with Versata. In April 2007—the year after Trilogy acquired Versata—Versata filed a lawsuit against SAP for patent infringement. In August 2009, the jury agreed with Versata and awarded them $139 million. The court granted a new trial on damages and in September 2011, in the retrial, the jury awarded Versata $345 million. This then went to the US Court of Appeals, which in May 2013 affirmed the $345 million damages award, plus interest that had accumulated. In October 2014, Versata and SAP settled their litigation for an undisclosed amount of money. With the dispute between Versata and SAP settled, in June 2013 the Patent Trial and Appeal Board (PTAB) reviewed the validity of the patent itself, and issued a decision in a Covered Business Method (CBM) review, stating that the disputed items were abstract ideas and thus under the US patent law not patentable. In July 2015, the Federal Circuit agreed with PTAB's decision that the challenged items were not patentable. === Trade secrets and damages dispute with Internet Brands === Internet Brands was formerly known as CarsDirect and AutoData Solutions. Like Trilogy, they made software for automakers that helped customers compare vehicles online. In the late 1990s, Trilogy and Internet Brands tried to combine their products but failed to do so, and after a December 1999 lawsuit they made a settlement agreement in May 2001. In 2008, Versata sued Internet Brands claiming they had violated the settlement agreement by making presentations to potential clients stating they had a license from Versata to use and sell Versata technical solutions; and doing so had cost Versata business with Chrysler. Internet Brands' countersuit argued that Versata had misappropriated trade secrets and asked the jury to use Versata's business relationship with Toyota—including revenue from Toyota contracts—as a benchmark to calculate damages. The jury agreed and used that data to determine a $2 million damages award in favor of Internet Brands’ subsidiary, AutoData Solutions. Versata appealed the decision, and in January 2014 the court upheld the $2 million award to Internet Brands. === Patent challenges a

    Read more →
  • Lisp machine

    Lisp machine

    Lisp machines are general-purpose computers designed to efficiently run Lisp as their main software and programming language, usually via hardware support. They are an example of a high-level language computer architecture. In a sense, they were the first commercial single-user workstations. Despite being modest in number (perhaps 7,000 units total as of 1988) Lisp machines commercially pioneered some now-commonplace technologies, including networking innovations such as Chaosnet, and effective garbage collection. Several firms built and sold Lisp machines in the 1980s: Symbolics (3600, 3640, XL1200, MacIvory, and other models), Lisp Machines Incorporated (LMI Lambda), Texas Instruments (Explorer, MicroExplorer), and Xerox (Interlisp-D workstations). The operating systems were written in Lisp Machine Lisp, Interlisp (Xerox), and later partly in Common Lisp. == History == === Historical context === Artificial intelligence (AI) computer programs of the 1960s and 1970s intrinsically required what was then considered a huge amount of computer power, as measured in processor time and memory space. The power requirements of AI research were exacerbated by the Lisp symbolic programming language, when commercial hardware was designed and optimized for assembly- and Fortran-like programming languages. At first, the cost of such computer hardware meant that it had to be shared among many users. As integrated circuit technology shrank the size and cost of computers in the 1960s and early 1970s, and the memory needs of AI programs began to exceed the address space of the most common research computer, the Digital Equipment Corporation (DEC) PDP-10, researchers considered a new approach: a computer designed specifically to develop and run large artificial intelligence programs, and tailored to the semantics of the Lisp language. To provide consistent performance for interactive programs, these machines would often not be shared, but would be dedicated to a single user at a time. === Initial development === In 1973, Richard Greenblatt and Thomas Knight, programmers at Massachusetts Institute of Technology (MIT) Artificial Intelligence Laboratory (AI Lab), began what would become the MIT Lisp Machine Project when they first began building a computer hardwired to run certain basic Lisp operations, rather than run them in software, in a 24-bit tagged architecture. The machine also did incremental (or Arena) garbage collection. More specifically, since Lisp variables are typed at runtime rather than compile time, a simple addition of two variables could take five times as long on conventional hardware, due to test and branch instructions. Lisp Machines ran the tests in parallel with the more conventional single instruction additions. If the simultaneous tests failed, then the result was discarded and recomputed; this meant in many cases a speed increase by several factors. This simultaneous checking approach was used as well in testing the bounds of arrays when referenced, and other memory management necessities (not merely garbage collection or arrays). Type checking was further improved and automated when the conventional byte word of 32 bits was lengthened to 36 bits for Symbolics 3600-model Lisp machines and eventually to 40 bits or more (usually, the excess bits not accounted for by the following were used for error-correcting codes). The first group of extra bits were used to hold type data, making the machine a tagged architecture, and the remaining bits were used to implement compressed data representation (CDR) coding (wherein the usual linked list elements are compressed to occupy roughly half the space), aiding garbage collection by reportedly an order of magnitude. A further improvement was two microcode instructions which specifically supported Lisp functions, reducing the cost of calling a function to as little as 20 clock cycles, in some Symbolics implementations. The first machine was called the CONS machine (named after the list construction operator cons in Lisp). Often it was affectionately referred to as the Knight machine, perhaps since Knight wrote his master's thesis on the subject; it was extremely well received. It was subsequently improved into a version called CADR (a pun; in Lisp, the cadr function, which returns the second item of a list, is pronounced /ˈkeɪ.dəɹ/ or /ˈkɑ.dəɹ/, as some pronounce the word "cadre") which was based on essentially the same architecture. About 25 of what were essentially prototype CADRs were sold within and without MIT for ~$50,000; it quickly became the favorite machine for hacking – many of the most favored software tools were quickly ported to it (e.g. Emacs was ported from ITS in 1975). It was so well received at an AI conference held at MIT in 1978 that Defense Advanced Research Projects Agency (DARPA) began funding its development. === Commercializing MIT Lisp machine technology === In 1979, Russell Noftsker, being convinced that Lisp machines had a bright commercial future due to the strength of the Lisp language and the enabling factor of hardware acceleration, proposed to Greenblatt that they commercialize the technology. In a counter-intuitive move for an AI Lab hacker, Greenblatt acquiesced, hoping perhaps that he could recreate the informal and productive atmosphere of the Lab in a real business. These ideas and goals were considerably different from those of Noftsker. The two negotiated at length, but neither would compromise. As the proposed firm could succeed only with the full and undivided assistance of the AI Lab hackers as a group, Noftsker and Greenblatt decided that the fate of the enterprise was up to them, and so the choice should be left to the hackers. The ensuing discussions of the choice divided the lab into two factions. In February 1979, matters came to a head. The hackers sided with Noftsker, believing that a commercial venture-fund-backed firm had a better chance of surviving and commercializing Lisp machines than Greenblatt's proposed self-sustaining start-up. Greenblatt lost the battle. It was at this juncture that Symbolics, Noftsker's enterprise, slowly came together. While Noftsker was paying his staff a salary, he had no building or any equipment for the hackers to work on. He bargained with Patrick Winston that, in exchange for allowing Symbolics' staff to keep working out of MIT, Symbolics would let MIT use internally and freely all the software Symbolics developed. A consultant from CDC, who was trying to put together a natural language computer application with a group of West-coast programmers, came to Greenblatt, seeking a Lisp machine for his group to work with, about eight months after the disastrous conference with Noftsker. Greenblatt had decided to start his own rival Lisp machine firm, but he had done nothing. The consultant, Alexander Jacobson, decided that the only way Greenblatt was going to start the firm and build the Lisp machines that Jacobson desperately needed was if Jacobson pushed and otherwise helped Greenblatt launch the firm. Jacobson pulled together business plans, a board, a partner for Greenblatt (one F. Stephen Wyle). The newfound firm was named LISP Machine, Inc. (LMI), and was funded by CDC orders, via Jacobson. Around this time Symbolics (Noftsker's firm) began operating. It had been hindered by Noftsker's promise to give Greenblatt a year's head start, and by severe delays in procuring venture capital. Symbolics still had the major advantage that while 3 or 4 of the AI Lab hackers had gone to work for Greenblatt, 14 other hackers had signed onto Symbolics. Two AI Lab people were not hired by either: Richard Stallman and Marvin Minsky. Stallman, however, blamed Symbolics for the decline of the hacker community that had centered around the AI lab. For two years, from 1982 to the end of 1983, Stallman worked by himself to clone the output of the Symbolics programmers, with the aim of preventing them from gaining a monopoly on the lab's computers. Regardless, after a series of internal battles, Symbolics did get off the ground in 1980/1981, selling the CADR as the LM-2, while Lisp Machines, Inc. sold it as the LMI-CADR. Symbolics did not intend to produce many LM-2s, since the 3600 family of Lisp machines was supposed to ship quickly, but the 3600s were repeatedly delayed, and Symbolics ended up producing ~100 LM-2s, each of which sold for $70,000. Both firms developed second-generation products based on the CADR: the Symbolics 3600 and the LMI-LAMBDA (of which LMI managed to sell ~200). The 3600, which shipped a year late, expanded on the CADR by widening the machine word to 36-bits, expanding the address space to 28-bits, and adding hardware to accelerate certain common functions that were implemented in microcode on the CADR. The LMI-LAMBDA, which came out a year after the 3600, in 1983, was compatible with the CADR (it could run CADR microcode), but hardware differences existed. Texas Instruments (TI) joined the fray whe

    Read more →
  • Augmented Analytics

    Augmented Analytics

    Augmented Analytics is an approach of data analytics that employs the use of machine learning and natural language processing to automate analysis processes normally done by a specialist or data scientist. The term was introduced in 2017 by Rita Sallam, Cindi Howson, and Carlie Idoine in a Gartner research paper. Augmented analytics is based on business intelligence and analytics. In the graph extraction step, data from different sources are investigated. == Defining Augmented Analytics == Machine Learning – a systematic computing method that uses algorithms to sift through data to identify relationships, trends, and patterns. It is a process that allows algorithms to dynamically learn from data instead of having a set base of programmed rules. Natural language generation (NLG) – a software capability that takes unstructured data and translates it into plain-English, readable, language. Automating Insights – using machine learning algorithms to automate data analysis processes. Natural Language Query – enabling users to query data using business terms that are either typed onto a search box or spoken. == Data Democratization == Data Democratization is the democratizing data access in order to relieve data congestion and get rid of any sense of data "gatekeepers". This process must be implemented alongside a method for users to make sense of the data. This process is used in hopes of speeding up company decision making and uncovering opportunities hidden in data. There are three aspects to democratising data: Data Parameterisation and Characterisation. Data Decentralisation using an OS of blockchain and DLT technologies, as well as an independently governed secure data exchange to enable trust. Consent Market-driven Data Monetisation. When it comes to connecting assets, there are two features that will accelerate the adoption and usage of data democratisation: decentralized identity management and business data object monetization of data ownership. It enables multiple individuals and organizations to identify, authenticate, and authorize participants and organizations, enabling them to access services, data or systems across multiple networks, organizations, environments, and use cases. It empowers users and enables a personalized, self-service digital onboarding system so that users can self-authenticate without relying on a central administration function to process their information. Simultaneously, decentralized identity management ensures the user is authorized to perform actions subject to the system’s policies based on their attributes (role, department, organization, etc.) and/ or physical location. == Use cases == Agriculture – Farmers collect data on water use, soil temperature, moisture content and crop growth, augmented analytics can be used to make sense of this data and possibly identify insights that the user can then use to make business decisions. Smart Cities – Many cities across the United States, known as Smart Cities collect large amounts of data on a daily basis. Augmented analytics can be used to simplify this data in order to increase effectiveness in city management (transportation, natural disasters, etc.). Analytic Dashboards – Augmented analytics has the ability to take large data sets and create highly interactive and informative analytical dashboards that assist in many organizational decisions. Augmented Data Discovery – Using an augmented analytics process can assist organizations in automatically finding, visualizing and narrating potentially important data correlations and trends. Data Preparation – Augmented analytics platforms have the ability to take large amounts of data and organize and "clean" the data in order for it to be usable for future analyses. Business – Businesses collect large amounts of data, daily. Some examples of types of data collected in business operations include; sales data, consumer behavior data, distribution data. An augmented analytics platform provides access to analysis of this data, which could be used in making business decisions.

    Read more →
  • Production Rule Representation

    Production Rule Representation

    The Production Rule Representation (PRR) is a proposed standard of the Object Management Group (OMG) that aims to define a vendor-neutral model for representing production rules within the Unified Modeling Language (UML), specifically for use in forward-chaining rule engines. == History == The OMG set up a Business Rules Working Group in 2002 as the first standards body to recognize the importance of the "Business Rules Approach". It issued 2 main RFPs in 2003 – a standard for modeling production rules (PRR), and a standard for modeling business rules as business documentation (BSBR, now SBVR). PRR was mostly defined by and for vendors of Business Rule Engines (BREs) (sometimes termed Business Rules Engine(s), like in Wikipedia). Contributors have included all the major BRE vendors, members of RuleML, and leading UML vendors. == Evolution == The PRR RFP originally suggested that PRR use a combination of UML OCL and Action Semantics for rule conditions and actions. However, expecting modellers to learn 2 relatively obscure UML languages in order to define a production rule proved unpalatable. Therefore, PRR OCL was defined that included OCL extensions for simple rule actions (as well as external functions). PRR OCL is currently considered "non-normative" i.e. is not part of the PRR standard per se. PRR beta applies just to a PRR Core that excludes an explicit expression language. The PRR RFP envisaged covering both forward and backward chaining rule engines. However, the lack of vendor support for / interest in backward chaining caused this to be revise to forward chaining and "sequential" semantics. The latter is simply the scripting mode provided by many BPM tools, where rules are listed and executed sequentially as if programmed. This provides PRR with better compatibility with typical BPM scripting engines (and acknowledges the fact that most BREs today support a "sequential" mode of operation, improving performance in some circumstances). == Status == PRR is currently at version 1.0.

    Read more →
  • Diffbot

    Diffbot

    Diffbot is a developer of machine learning and computer vision algorithms and public APIs for extracting data from web pages / web scraping to create a knowledge base. == Overview == The company has gained interest from its application of computer vision technology to web pages, wherein it visually parses a web page for important elements and returns them in a structured format. In 2015 Diffbot announced it was working on its version of an automated "knowledge graph" by crawling the web and using its automatic web page extraction to build a large database of structured web data. In 2019 Diffbot released their Knowledge Graph which has since grown to include over two billion entities (corporations, people, articles, products, discussions, and more), and ten trillion "facts." == Features == The company's products allow software developers to analyze web home pages and article pages, and extract the "important information" while ignoring elements deemed not core to the primary content. In August 2012 the company released its Page Classifier API, which automatically categorizes web pages into specific "page types". As part of this, Diffbot analyzed 750,000 web pages shared on the social media service Twitter and revealed that photos, followed by articles and videos, are the predominant web media shared on the social network. In September 2020 the company released a Natural Language Processing API for automatically building Knowledge Graphs from text. The company raised $2 million in funding in May 2012 from investors including Andy Bechtolsheim and Sky Dayton. Diffbot's customers include Adobe, AOL, Cisco, DuckDuckGo, eBay, Instapaper, Microsoft, Onswipe and Springpad.

    Read more →