AI Chatbot Options

AI Chatbot Options — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • National Cyber Security Policy 2013

    National Cyber Security Policy 2013

    National Cyber Security Policy is a policy framework by Department of Electronics and Information Technology (DeitY) It aims at protecting the public and private infrastructure from cyber attacks. The policy also intends to safeguard "information, such as personal information (of web users), financial and banking information and sovereign data". This was particularly relevant in the wake of US National Security Agency (NSA) leaks that suggested the US government agencies are spying on Indian users, who have no legal or technical safeguards against it. Ministry of Communications and Information Technology (India) defines Cyberspace as a complex environment consisting of interactions between people, software services supported by worldwide distribution of information and communication technology. == Reason for Cyber Security policies == India had no Cyber security policy before 2013. In 2013, The Hindu newspaper, citing documents leaked by NSA whistle-blower Edward Snowden, has alleged that much of the NSA surveillance was focused on India's domestic politics and its strategic and commercial interests. This sparked a furore among people. Under pressure, the government unveiled a National Cyber Security Policy 2013 on 2 July 2013. == Vision == To build a secure and resilient cyberspace for citizens, business, and government and also to protect anyone from intervening in user's privacy.It mentioned a five year target of training five lakh cyber security personnel by 2018. == Mission == To protect information and information infrastructure in cyberspace, build capabilities to prevent and respond to cyber threat, reduce vulnerabilities and minimize damage from cyber incidents through a combination of institutional structures, people, processes, technology, and cooperation. == Objective == Ministry of Communications and Information Technology (India) define objectives as follows: To create a secure cyber ecosystem in the country, generate adequate trust and confidence in IT system and transactions in cyberspace and thereby enhance adoption of IT in all sectors of the economy. To create an assurance framework for the design of security policies and promotion and enabling actions for compliance to global security standards and best practices by way of conformity assessment (Product, process, technology & people). To strengthen the Regulatory Framework for ensuring a SECURE CYBERSPACE ECOSYSTEM. To enhance and create National and Sectoral level 24x7 mechanism for obtaining strategic information regarding threats to ICT infrastructure, creating scenarios for response, resolution and crisis management through effective predictive, preventive, protective response and recovery actions. -To improve visibility of integrity of ICT products and services by establishing infrastructure for testing & validation of security of such product. To create workforce for 500,000 professionals skilled in next 5 years through capacity building skill development and training. To provide fiscal benefit to businesses for adoption of standard security practices and processes. To enable Protection of information while in process, handling, storage & transit so as to safeguard privacy of citizen's data and reducing economic losses due to cyber crime or data theft. To enable effective prevention, investigation and prosecution of cybercrime and enhancement of law enforcement capabilities through appropriate legislative intervention. == Strategies == Creating a secured Ecosystem. Creating an assurance framework. Encouraging Open Standards. Strengthening The regulatory Framework. Creating a mechanism for Security Threats Early Warning, Vulnerability management, and response to security threats. Securing E-Governance services. Protection and resilience of Critical Information Infrastructure. Promotion of Research and Development in cyber security. Reducing supply chain risks Human Resource Development (fostering education and training programs both in formal and informal sectors to Support the Nation's cyber security needs and build capacity. Creating cyber security awareness. Developing effective Public-Private partnerships. To develop bilateral and multilateral relationships in the area of cyber security with another country. (Information sharing and cooperation) a Prioritized approach for implementation.

    Read more →
  • Ontology for Biomedical Investigations

    Ontology for Biomedical Investigations

    The Ontology for Biomedical Investigations (OBI) is an open-access, integrated ontology for the description of biological and clinical investigations. OBI provides a model for the design of an investigation, the protocols and instrumentation used, the materials used, the data generated and the type of analysis performed on it. The project is being developed as part of the OBO Foundry and as such adheres to all the principles therein such as orthogonal coverage (i.e. clear delineation from other foundry member ontologies) and the use of a common formal language. In OBI the common formal language used is the Web Ontology Language (OWL). As of March 2008, a pre-release version of the ontology was made available at the project's SVN repository. == Scope == The Ontology for Biomedical Investigations (OBI) addresses the need for controlled vocabularies to support integration and joint ("cross-omics") analysis of experimental data, a need originally identified in the transcriptomics domain by the FGED Society, which developed the MGED Ontology as an annotation resource for microarray data.Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, et al. (November 2007). "The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration". Nature Biotechnology. 25 (11): 1251–5. doi:10.1038/nbt1346. PMC 2814061. PMID 17989687. OBI uses the basic formal ontology upper-level ontology as a means of describing general entities that do not belong to a specific problem domain. As such, all OBI classes are a subclass of some BFO class. The ontology has the scope of modeling all biomedical investigations and as such contains ontology terms for aspects such as: biological material – for example blood plasma instrument (and parts of an instrument therein) – for example DNA microarray, centrifuge information content – such as an image or a digital information entity such as an electronic medical record design and execution of an investigation (and individual experiments therein) – for example study design, electrophoresis material separation data transformation (incorporating aspects such as data normalization and data analysis) – for example principal components analysis dimensionality reduction, mean calculation Less 'concrete' aspects such as the role a given entity may play in a particular scenario (for example the role of a chemical compound in an experiment) and the function of an entity (for example the digestive function of the stomach to nutriate the body) are also covered in the ontology. == OBI consortium == The MGED Ontology was originally identified in the transcriptomics domain by the FGED Society and was developed to address the needs of data integration. Following a mutual decision to collaborate, this effort later became a wider collaboration between groups such as FGED, PSI and MSI in response to the needs of areas such as transcriptomics, proteomics and metabolomics and the FuGO (Functional Genomics Investigation Ontology) was created. This later became the OBI covering the wider scope of all biomedical investigations. As an international, cross-domain initiative, the OBI consortium draws upon a pool of experts from a variety of fields, not limited to biology. The current list of OBI consortium members is available at the OBI consortium website. The consortium is made up of a coordinating committee which is a combination of two subgroups, the Community Representative (those representing a particular biomedical community) and the Core Developers (ontology developers who may or may not be members of any single community). Separate to the coordinating committee is the Developers Working Group which consists of developers within the communities collaborating in the development of OBI at the discretion of current OBI Consortium members. == Papers on OBI ==

    Read more →
  • External memory algorithm

    External memory algorithm

    In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that are too large to fit into a computer's main memory at once. Such algorithms must be optimized to efficiently fetch and access data stored in slow bulk memory (auxiliary memory) such as hard drives or tape drives, or when memory is on a computer network. External memory algorithms are analyzed in the external memory model. == Model == External memory algorithms are analyzed in an idealized model of computation called the external memory model (or I/O model, or disk access model). The external memory model is an abstract machine similar to the RAM machine model, but with a cache in addition to main memory. The model captures the fact that read and write operations are much faster in a cache than in main memory, and that reading long contiguous blocks is faster than reading randomly using a disk read-and-write head. The running time of an algorithm in the external memory model is defined by the number of reads and writes to memory required. The model was introduced by Alok Aggarwal and Jeffrey Vitter in 1988. The external memory model is related to the cache-oblivious model, but algorithms in the external memory model may know both the block size and the cache size. For this reason, the model is sometimes referred to as the cache-aware model. The model consists of a processor with an internal memory or cache of size M, connected to an unbounded external memory. Both the internal and external memory are divided into blocks of size B. One input/output or memory transfer operation consists of moving a block of B contiguous elements from external to internal memory, and the running time of an algorithm is determined by the number of these input/output operations. == Algorithms == Algorithms in the external memory model take advantage of the fact that retrieving one object from external memory retrieves an entire block of size B. This property is sometimes referred to as locality. Searching for an element among N objects is possible in the external memory model using a B-tree with branching factor B. Using a B-tree, searching, insertion, and deletion can be achieved in O ( log B ⁡ N ) {\displaystyle O(\log _{B}N)} time (in Big O notation). Information theoretically, this is the minimum running time possible for these operations, so using a B-tree is asymptotically optimal. External sorting is sorting in an external memory setting. External sorting can be done via distribution sort, which is similar to quicksort, or via a M B {\displaystyle {\tfrac {M}{B}}} -way merge sort. Both variants achieve the asymptotically optimal runtime of O ( N B log M B ⁡ N B ) {\displaystyle O\left({\frac {N}{B}}\log _{\frac {M}{B}}{\frac {N}{B}}\right)} to sort N objects. This bound also applies to the fast Fourier transform in the external memory model. The permutation problem is to rearrange N elements into a specific permutation. This can either be done either by sorting, which requires the above sorting runtime, or inserting each element in order and ignoring the benefit of locality. Thus, permutation can be done in O ( min ( N , N B log M B ⁡ N B ) ) {\displaystyle O\left(\min \left(N,{\frac {N}{B}}\log _{\frac {M}{B}}{\frac {N}{B}}\right)\right)} time. == Applications == The external memory model captures the memory hierarchy, which is not modeled in other common models used in analyzing data structures, such as the random-access machine, and is useful for proving lower bounds for data structures. The model is also useful for analyzing algorithms that work on datasets too big to fit in internal memory. A typical example is geographic information systems, especially digital elevation models, where the full data set easily exceeds several gigabytes or even terabytes of data. This methodology extends beyond general purpose CPUs and also includes GPU computing as well as classical digital signal processing. In general-purpose computing on graphics processing units (GPGPU), powerful graphics cards (GPUs) with little memory (compared with the more familiar system memory, which is most often referred to simply as RAM) are utilized with relatively slow CPU-to-GPU memory transfer (when compared with computation bandwidth). == History == An early use of the term "out-of-core" as an adjective is in 1962 in reference to devices that are other than the core memory of an IBM 360. An early use of the term "out-of-core" with respect to algorithms appears in 1971.

    Read more →
  • Ubiquitous computing

    Ubiquitous computing

    Ubiquitous computing (or "ubicomp") is a concept in software engineering, hardware engineering and computer science where computing is made to appear seamlessly anytime and everywhere. In contrast to desktop computing, ubiquitous computing implies use on any device, in any location, and in any format. A user interacts with the computer, which can exist in many different forms, including laptop computers, tablets, smart phones and terminals in everyday objects such as a refrigerator or a pair of glasses. The underlying technologies to support ubiquitous computing include the Internet, advanced middleware, kernels, operating systems, mobile codes, sensors, microprocessors, new I/Os and user interfaces, computer networks, mobile protocols, global navigational systems, and new materials. This paradigm is also described as pervasive computing, ambient intelligence, or "everyware". Each term emphasizes slightly different aspects. When primarily concerning the objects involved, it is also known as physical computing, the Internet of Things, haptic computing, and "things that think". Rather than propose a single definition for ubiquitous computing and for these related terms, a taxonomy of properties for ubiquitous computing has been proposed, from which different kinds or flavors of ubiquitous systems and applications can be described. Ubiquitous computing themes include: distributed computing, mobile computing, location computing, mobile networking, sensor networks, human–computer interaction, context-aware smart home technologies, and artificial intelligence. == Core concepts == Ubiquitous computing is the concept of using small internet connected and inexpensive computers to help with everyday functions in an automated fashion. Mark Weiser proposed three basic forms for ubiquitous computing devices: Tabs: a wearable device that is approximately a centimeter in size Pads: a hand-held device that is approximately a decimeter in size Boards: an interactive larger display device that is approximately a meter in size Ubiquitous computing devices proposed by Mark Weiser are all based around flat devices of different sizes with a visual display. These conceptual device categories were later implemented at Xerox PARC in experimental systems including the PARCTab, PARCPad, and LiveBoard, which served as early prototypes of handheld, tablet-style, and large interactive display computing environments. Expanding beyond those concepts there is a large array of other ubiquitous computing devices that could exist. == History == Mark Weiser coined the phrase "ubiquitous computing" around 1988, during his tenure as Chief Technologist of the Xerox Palo Alto Research Center (PARC). Both alone and with PARC Director and Chief Scientist John Seely Brown, Weiser wrote some of the earliest papers on the subject, largely defining it and sketching out its major concerns. == Recognizing the effects of extending processing power == Recognizing that the extension of processing power into everyday scenarios would necessitate understandings of social, cultural and psychological phenomena beyond its proper ambit, Weiser was influenced by many fields outside computer science, including "philosophy, phenomenology, anthropology, psychology, post-Modernism, sociology of science and feminist criticism". He was explicit about "the humanistic origins of the 'invisible ideal in post-modernist thought'", referencing as well the ironically dystopian Philip K. Dick novel Ubik. Andy Hopper from Cambridge University UK proposed and demonstrated the concept of "Teleporting" – where applications follow the user wherever he/she moves. Roy Want (now at Google), while at Olivetti Research Ltd, designed the first "Active Badge System", which is an advanced location computing system where personal mobility is merged with computing. Later at Xerox PARC, he designed and built the "PARCTab" or simply "Tab", widely recognized as the world's first Context-Aware computer, which has great similarity to the modern smartphone. Bill Schilit (now at Google) also did some earlier work in this topic, and participated in the early Mobile Computing workshop held in Santa Cruz in 1996. Ken Sakamura of the University of Tokyo, Japan leads the Ubiquitous Networking Laboratory (UNL), Tokyo as well as the T-Engine Forum. The joint goal of Sakamura's Ubiquitous Networking specification and the T-Engine forum, is to enable any everyday device to broadcast and receive information. MIT has also contributed significant research in this field, notably Things That Think consortium (directed by Hiroshi Ishii, Joseph A. Paradiso and Rosalind Picard) at the Media Lab and the CSAIL effort known as Project Oxygen. Other major contributors include University of Washington (Shwetak Patel, Anind Dey and James Landay), Dartmouth College's HealthX Lab (directed by Andrew Campbell), Georgia Tech's College of Computing (Gregory Abowd and Thad Starner), Cornell Tech's People Aware Computing Lab (directed by Tanzeem Choudhury), NYU's Interactive Telecommunications Program, UC Irvine's Department of Informatics, Microsoft Research, Intel Research and Equator, Ajou University UCRi & CUS. == Examples == One of the earliest ubiquitous systems was artist Natalie Jeremijenko's "Live Wire", also known as "Dangling String", installed at Xerox PARC during Mark Weiser's time there. This was a piece of string attached to a stepper motor and controlled by a LAN connection; network activity caused the string to twitch, yielding a peripherally noticeable indication of traffic. Weiser called this an example of calm technology. A present manifestation of this trend is the widespread diffusion of mobile phones. Many mobile phones support high speed data transmission, video services, and other services with powerful computational ability. Although these mobile devices are not necessarily manifestations of ubiquitous computing, there are examples, such as Japan's Yaoyorozu ("Eight Million Gods") Project in which mobile devices, coupled with radio frequency identification tags demonstrate that ubiquitous computing is already present in some form. Ambient Devices has produced an "orb", a "dashboard", and a "weather beacon": these decorative devices receive data from a wireless network and report current events, such as stock prices and the weather, like the Nabaztag, which was invented by Rafi Haladjian and Olivier Mével, and manufactured by the company Violet. The Australian futurist Mark Pesce has produced a highly configurable 52-LED LAMP enabled lamp which uses Wi-Fi named MooresCloud after Gordon Moore. The Unified Computer Intelligence Corporation launched a device called Ubi – The Ubiquitous Computer designed to allow voice interaction with the home and provide constant access to information. Ubiquitous computing research has focused on building an environment in which computers allow humans to focus attention on select aspects of the environment and operate in supervisory and policy-making roles. Ubiquitous computing emphasizes the creation of a human computer interface that can interpret and support a user's intentions. For example, MIT's Project Oxygen seeks to create a system in which computation is as pervasive as air: In the future, computation will be human centered. It will be freely available everywhere, like batteries and power sockets, or oxygen in the air we breathe...We will not need to carry our own devices around with us. Instead, configurable generic devices, either handheld or embedded in the environment, will bring computation to us, whenever we need it and wherever we might be. As we interact with these "anonymous" devices, they will adopt our information personalities. They will respect our desires for privacy and security. We won't have to type, click, or learn new computer jargon. Instead, we'll communicate naturally, using speech and gestures that describe our intent... This is a fundamental transition that does not seek to escape the physical world and "enter some metallic, gigabyte-infested cyberspace" but rather brings computers and communications to us, making them "synonymous with the useful tasks they perform". Network robots link ubiquitous networks with robots, contributing to the creation of new lifestyles and solutions to address a variety of social problems including the aging of population and nursing care. The "Continuity" set of features, introduced by Apple in OS X Yosemite, can be seen as an example of ubiquitous computing. == Issues == Privacy is easily the most often-cited criticism of ubiquitous computing (ubicomp), and may be the greatest barrier to its long-term success. == Research centres == This is a list of notable institutions who claim to have a focus on Ubiquitous computing sorted by country: Canada Topological Media Lab, Concordia University, Canada Finland Community Imaging Group, University of Oulu, Finland Germany Telecooperation Office (TECO), Karlsruhe Institute of Technology, Ger

    Read more →
  • Human–AI interaction

    Human–AI interaction

    Human–AI interaction is a developing field of research and a sub-field of human–computer interaction (HCI). HCI is a field of research that explores the interactions between humans and computer-based technology, focusing on design implementation, user experience, and psychological factors. With the proliferation of artificial intelligence (AI), there has developed a sub-section of HCI research dedicated specifically to artificial intelligence and how people interact with and are impacted by it. This is human–AI interaction, abbreviated either as HAX or HAII. == Introduction == Artificial intelligence (AI), in general, has fluid definitions and varied research applications, but in brief can be applied to mechanizing tasks that would require human intelligence to complete. AI are tools designed to replicate the human abilities of navigating uncertainty, active learning, and processing information in different contexts. Within the context of HCI and HAX research, artificial intelligence can be broken into two sub-fields, natural language processing (NLP) and computer vision (CV). AI technologies notably include machine-learning, deep-learning and neural networks, and large-language models (LLMs). As a new and rapidly developing technology, AI is changing how computers work and therefore changing how humans interact with computers. Unlike the traditional human-computer interaction, where a human directs a machine, human-AI interaction is characterized by a more collaborative relationship between the computer program (the AI) and the human user, as AI is perceived as an active agent rather than a tool. This changing dynamic creates new questions and necessitates new research methods that are not present in traditional HCI research. According to a scoping review on the state of the discipline, the HAX field comprises research on the "design, development, and evaluation of AI systems" and encompasses the themes of human-AI collaboration, human-AI competition, human-AI conflict, and human-AI symbiosis. == Design == Machine learning and artificial intelligence have been used for decades in targeted advertising and to recommend content in social media. Ethical Guidelines (Framework for ethical AI development) == User Experience (UX) == This section should handle research on how users interact with tools. What techniques do they use, do they develop habits, what types of programs and devices are they using to access these tools, what do they use these tools to do exactly. === Cognitive Frameworks in AI Tool Users === AI has been viewed with various expectations, attributions, and often misconceptions. Many people exclusively understand AI as the LLM chatbots they interact with, like ChatGPT or Claude, or other generative AI programs. [Insert section: discuss how people interact with these specific AI tools as a connection to the following paragraphs] Most fundamentally, humans have a mental model of understanding AI's reasoning and motivation for its decision recommendations, and building a holistic and precise mental model of AI helps people create prompts to receive more valuable responses from AI. However, these mental models are not whole because people can only gain more information about AI through their limited interaction with it; more interaction with AI builds a better mental model that a person may build to produce better prompt outcomes. Research on human-AI interaction has emphasized that users develop mental models of AI systems and revise those models through repeated use, feedback, and explanation, while design research has stressed the importance of communicating capabilities and limitations early and supporting trust calibration through explanation and correction. In a 2025 SSRN working paper, John DeVadoss proposed "Hypothetico-Deductive Interaction" (HDI), a framework that describes human-AI interaction as a mutual process of conjecture and refutation in which users test assumptions about an AI system's capabilities while the system infers and updates assumptions about user goals through its responses and clarifying questions. DeVadoss argued that this framing helps explain prompt iteration, weak capability awareness, and trust miscalibration, and suggested design responses such as clearer communication of uncertainty, easier correction, actionable explanations, and safer failure modes. == Research themes == === Human-AI collaboration === Human-AI collaboration occurs when the human and AI supervise the task on the same level and extent to achieve the same goal. Some collaboration occurs in the form of augmenting human capability. AI may help human ability in analysis and decision-making through providing and weighing a volume of information, and learning to defer to the human decision when it recognizes its unreliability. It is especially beneficial when the human can detect a task that AI can be trusted to make few errors so that there is not a lot of excessive checking process required on the human's end. Some findings show signs of human-AI augmentation, or human–AI symbiosis, in which AI enhances human ability in a way that co-working on a task with AI produces better outcomes than a human working alone. For example: the quality and speed of customer service tasks increase when a human agent collaborates with AI, training on specific models allows AI to improve diagnoses in clinical settings, and AI with human-intervention can improve creativity of artwork while fully AI-generated haikus were rated negatively. Human-AI synergy, a concept in which human-AI collaboration would produce more optimal outcomes than either human or AI working alone could explain why AI does not always help with performance. Some AI features and development may accelerate human-AI synergy, while others may stagnate it. For example, when AI updates for better performance, it sometimes worsens the team performance with human and AI by reducing the compatibility with the new model and the mental model a user has developed on the previous version. Research has found that AI often supports human capabilities in the form of human-AI augmentation and not human-AI synergy, potentially because people rely too much on AI and stop thinking on their own. Prompting people to actively engage in analysis and think when to follow AI recommendations reduces their over-reliance, especially for individuals with higher need for cognition. === Human-AI competition === Robots and computers have substituted routine tasks historically completed by humans, but agentic AI has made it possible to also replace cognitive tasks including taking phone calls for appointments and driving a car. At the point of 2016, research has estimated that 45% of paid activities could be replaced by AI by 2030. Perceived autonomy of robots is known to increase people's negative attitude toward them, and worry about the technology taking over leads people to reject it. There has been a consistent tendency of algorithm aversion in which people prefer human advice over AI advice. However, people are not always able to tell apart tasks completed by AI or other humans. See AI takeover for more information. It is also notable that this sentiment is more prominent in the Western cultures as Westerners tend to show less positive views about AI compared to East Asians. == Research on the psychological impacts of AI == === Perception on others who use AI === As much as people perceive and make judgment about AI itself, they also form impressions of themselves and others who use AI. In the workplace, employees who disclose the use of AI in their tasks are more likely to receive feedback that they are not as hardworking as those who are in the same job who receive non-AI help to complete the same tasks. AI use disclosure diminishes the perceived legitimacy in the employee's task and decision making which ultimately leads observers to distrust people who use AI. Although these negative effects of AI use disclosure are weakened by the observers who use AI frequently themselves, the effect is still not attenuated by the observers' positive attitude towards AI. === Bias, AI, and human === Although AI provides a wide range of information and suggestions to its users, AI itself is not free of biases and stereotypes, and it does not always help people reduce their cognitive errors and biases. People are prone to such errors by failing to see other potential ideas and cases that are not listed by AI responses and committing to a decision suggested by AI that directly contradicts the correct information and directions that they are already aware of. Gender bias is also reflected as the female gendering of AI technologies which conceptualizes females as a helpful assistant. == Emotional connection with AI == Human-AI interaction has been theorized in the context of interpersonal relationships mainly in social psychology, communications and media studies, and as a technology interface through the lens of hu

    Read more →
  • Automated journalism

    Automated journalism

    Automated journalism, also known as algorithmic journalism or robot journalism, is a term that attempts to describe modern technological processes that are now in use in the journalistic profession, such as news articles and videos generated by computer programs. There are four main fields of application for automated journalism, namely automated content production, data mining, news dissemination and content optimization. Through generative artificial intelligence, stories are produced automatically by computers rather than human reporters. In the 2020s, generative pre-trained transformers have enabled the generation of articles, simply by providing prompts. Automated journalism is sometimes seen as an opportunity to free journalists from routine reporting, providing them with more time for complex tasks. It also allows efficiency and cost-cutting, alleviating some financial burden that many news organizations face. However, automated journalism is also perceived as a threat to the authorship and quality of news and a threat to the livelihoods of human journalists. == History == Historically, the process involved an algorithm that scanned large amounts of provided data, selected from an assortment of pre-programmed article structures, ordered key points, and inserted details such as names, places, amounts, rankings, statistics, and other figures. These programs interpret, organize, and present data in human-readable ways. The output can also be customized to fit a certain voice, tone, or style. Early implementations were mainly used for stories based on statistics and numerical figures. Common topics include sports recaps, weather, financial reports, real estate analysis, and earnings reviews. Data science and AI companies such as Automated Insights, Narrative Science, United Robots and Monok develop and provide these algorithms to news outlets. In 2016, early adopters included news providers such as the Associated Press, Forbes, ProPublica, and the Los Angeles Times. StatSheet, an online platform covering college basketball, runs entirely on an automated program. In 2006, Thomson Reuters announced their switch to automation to generate financial news stories on its online news platform. Reuters used a tool called Tracer. An algorithm called Quakebot published a story about a 2014 California earthquake on The Los Angeles Times website within three minutes after the shaking had stopped. The Associated Press began using automation to cover 10,000 minor baseball leagues games annually, using a program from Automated Insights and statistics from MLB Advanced Media. Outside of sports, the Associated Press also uses automation to produce stories on corporate earnings. Since 2014, Associated Press has been publishing quarterly financial stories with help from Automated Insights. In May 2020, Microsoft announced that a number of its MSN contract journalists would be replaced by robot journalism. On 8 September 2020, The Guardian published an article entirely written by the neural network GPT-3, although the published fragments were manually picked by a human editor. Agentic Tribune produces all of its news articles automatically using AI. News broadcasters in Kuwait, Greece, South Korea, India, China and Taiwan have presented news with anchors based on generative AI models, prompting concerns about job losses for human anchors and audience trust in news that has historically been influenced by parasocial relationships with broadcasters, content creators or social media influencers. Algorithmically generated anchors have also been used by allies of ISIS for their broadcasts. In 2023, Google reportedly pitched a tool to news outlets that claimed to "produce news stories" based on input data provided, such as "details of current events". Some news company executives who viewed the pitch described it as "[taking] for granted the effort that went into producing accurate and artful news stories." In February 2024, Google launched a program to pay small publishers to write three articles per day using a beta generative AI model. The program does not require the knowledge or consent of the websites that the publishers are using as sources, nor does it require the published articles to be labeled as being created or assisted by these models. Meta AI, a chatbot based on Llama 3 which summarizes news stories, was noted by The Washington Post to copy sentences from those stories without direct attribution and to potentially further decrease the traffic of online news outlets. == Benefits == === Speed === Robot reporters are built to produce large quantities of information at quicker speeds. The Associated Press announced that their use of automation has increased the volume of earnings reports from customers by more than ten times. With software from Automated Insights and data from other companies, they can produce 150 to 300-word articles in the same time it takes journalists to crunch numbers and prepare information. By automating routine stories and tasks, journalists are promised more time for complex jobs such as investigative reporting and in-depth analysis of events. Francesco Marconi of the Associated Press stated that, through automation, the news agency freed up 20 percent of reporters’ time to focus on higher-impact projects. This has also been stated by a spokesperson at Gannett, who stated "By leveraging AI, we are able to expand coverage and enable our journalists to focus on more in-depth reporting." GBH reports that AI tools help increase the reach of news publishers. Mike Carragi, a product manager at Patch, stated that they were able to increase their reach from 1200 communities to 7000 communities in just a few months without the need for new employees solely through the adoption of generative AI. In fact, many communities are served solely by AI generated content, which creates summaries of existing information within the community. === Cost === Automated journalism is cheaper because more content can be produced within less time. It also lowers labour costs for news organizations. Reduced human input means less expenses on wages or salaries, paid leaves, vacations, and employment insurance. Automation serves as a cost-cutting tool for news outlets struggling with tight budgets but still wish to maintain the scope and quality of their coverage. == Concerns == === Authorship === In an automated story, there is often confusion about who should be credited as the author. Several participants of a study on algorithmic authorship attributed the credit to the programmer; others perceived the news organization as the author, emphasizing the collaborative nature of the work. There is also no way for the reader to verify whether an article was written by a robot or human, which raises issues of transparency although such issues also arise with respect to authorship attribution between human authors too. === Credibility and quality === Concerns about the perceived credibility of automated news is similar to concerns about the perceived credibility of news in general. Critics doubt if algorithms are "fair and accurate, free from subjectivity, error, or attempted influence." Again, these issues about fairness, accuracy, subjectivity, error, and attempts at influence or propaganda has also been present in articles written by humans over thousands of years. A common criticism is that machines do not replace human capabilities such as creativity, humour, and critical-thinking. However, as the technology evolves, the aim is to mimic human characteristics. When the UK's Guardian newspaper used an AI to write an entire article in September 2020, commentators pointed out that the AI still relied on human editorial content. Austin Tanney, the head of AI at Kainos said: "The Guardian got three or four different articles and spliced them together. They also gave it the opening paragraph. It doesn’t belittle what it is. It was written by AI, but there was human editorial on that." The largest single study of readers' evaluations of news articles produced with and without the help of automation exposed 3,135 online news consumers to 24 articles. It found articles that had been automated were significantly less comprehensible, in part because they were considered to contain too many numbers. However, the automated articles were evaluated equally on other criteria including tone, narrative flow, and narrative structure. Beyond human evaluation, there are now numerous algorithmic methods to identify machine written articles although some articles may still contain errors that are obvious for a human to identify, they can at times score better with these automatic identifiers than human-written articles. A 2017 Nieman Reports article by Nicola Bruno discusses whether or not machines will replace journalists and addresses concerns around the concept of automated journalism practices. Ultimately, Bruno came to the conclusion that AI would assist journalist

    Read more →
  • Holographic algorithm

    Holographic algorithm

    In computer science, a holographic algorithm is an algorithm that uses a holographic reduction. A holographic reduction is a constant-time reduction that maps solution fragments many-to-many such that the sum of the solution fragments remains unchanged. These concepts were introduced by Leslie Valiant, who called them holographic because "their effect can be viewed as that of producing interference patterns among the solution fragments". The algorithms are unrelated to laser holography, except metaphorically. Their power comes from the mutual cancellation of many contributions to a sum, analogous to the interference patterns in a hologram. Holographic algorithms have been used to find polynomial-time solutions to problems without such previously known solutions for special cases of satisfiability, vertex cover, and other graph problems. They have received notable coverage due to speculation that they are relevant to the P versus NP problem and their impact on computational complexity theory. Although some of the general problems are #P-hard problems, the special cases solved are not themselves #P-hard, and thus do not prove FP = #P. Holographic algorithms have some similarities with quantum computation, but are completely classical. == Holant problems == Holographic algorithms exist in the context of Holant problems, which generalize counting constraint satisfaction problems (#CSP). A #CSP instance is a hypergraph G=(V,E) called the constraint graph. Each hyperedge represents a variable and each vertex v {\displaystyle v} is assigned a constraint f v . {\displaystyle f_{v}.} A vertex is connected to an hyperedge if the constraint on the vertex involves the variable on the hyperedge. The counting problem is to compute ∑ σ : E → { 0 , 1 } ∏ v ∈ V f v ( σ | E ( v ) ) , ( 1 ) {\displaystyle \sum _{\sigma :E\to \{0,1\}}\prod _{v\in V}f_{v}(\sigma |_{E(v)}),~~~~~~~~~~(1)} which is a sum over all variable assignments, the product of every constraint, where the inputs to the constraint f v {\displaystyle f_{v}} are the variables on the incident hyperedges of v {\displaystyle v} . A Holant problem is like a #CSP except the input must be a graph, not a hypergraph. Restricting the class of input graphs in this way is indeed a generalization. Given a #CSP instance, replace each hyperedge e of size s with a vertex v of degree s with edges incident to the vertices contained in e. The constraint on v is the equality function of arity s. This identifies all of the variables on the edges incident to v, which is the same effect as the single variable on the hyperedge e. In the context of Holant problems, the expression in (1) is called the Holant after a related exponential sum introduced by Valiant. == Holographic reduction == A standard technique in complexity theory is a many-one reduction, where an instance of one problem is reduced to an instance of another (hopefully simpler) problem. However, holographic reductions between two computational problems preserve the sum of solutions without necessarily preserving correspondences between solutions. For instance, the total number of solutions in both sets can be preserved, even though individual problems do not have matching solutions. The sum can also be weighted, rather than simply counting the number of solutions, using linear basis vectors. === General example === It is convenient to consider holographic reductions on bipartite graphs. A general graph can always be transformed it into a bipartite graph while preserving the Holant value. This is done by replacing each edge in the graph by a path of length 2, which is also known as the 2-stretch of the graph. To keep the same Holant value, each new vertex is assigned the binary equality constraint. Consider a bipartite graph G=(U,V,E) where the constraint assigned to every vertex u ∈ U {\displaystyle u\in U} is f u {\displaystyle f_{u}} and the constraint assigned to every vertex v ∈ V {\displaystyle v\in V} is f v {\displaystyle f_{v}} . Denote this counting problem by Holant ( G , f u , f v ) . {\displaystyle {\text{Holant}}(G,f_{u},f_{v}).} If the vertices in U are viewed as one large vertex of degree |E|, then the constraint of this vertex is the tensor product of f u {\displaystyle f_{u}} with itself |U| times, which is denoted by f u ⊗ | U | . {\displaystyle f_{u}^{\otimes |U|}.} Likewise, if the vertices in V are viewed as one large vertex of degree |E|, then the constraint of this vertex is f v ⊗ | V | . {\displaystyle f_{v}^{\otimes |V|}.} Let the constraint f u {\displaystyle f_{u}} be represented by its weighted truth table as a row vector and the constraint f v {\displaystyle f_{v}} be represented by its weighted truth table as a column vector. Then the Holant of this constraint graph is simply f u ⊗ | U | f v ⊗ | V | . {\displaystyle f_{u}^{\otimes |U|}f_{v}^{\otimes |V|}.} Now for any complex 2-by-2 invertible matrix T (the columns of which are the linear basis vectors mentioned above), there is a holographic reduction between Holant ( G , f u , f v ) {\displaystyle {\text{Holant}}(G,f_{u},f_{v})} and Holant ( G , f u T ⊗ ( deg ⁡ u ) , ( T − 1 ) ⊗ ( deg ⁡ v ) f v ) . {\displaystyle {\text{Holant}}(G,f_{u}T^{\otimes (\deg u)},(T^{-1})^{\otimes (\deg v)}f_{v}).} To see this, insert the identity matrix T ⊗ | E | ( T − 1 ) ⊗ | E | {\displaystyle T^{\otimes |E|}(T^{-1})^{\otimes |E|}} in between f u ⊗ | U | f v ⊗ | V | {\displaystyle f_{u}^{\otimes |U|}f_{v}^{\otimes |V|}} to get f u ⊗ | U | f v ⊗ | V | {\displaystyle f_{u}^{\otimes |U|}f_{v}^{\otimes |V|}} = f u ⊗ | U | T ⊗ | E | ( T − 1 ) ⊗ | E | f v ⊗ | V | {\displaystyle =f_{u}^{\otimes |U|}T^{\otimes |E|}(T^{-1})^{\otimes |E|}f_{v}^{\otimes |V|}} = ( f u T ⊗ ( deg ⁡ u ) ) ⊗ | U | ( f v ( T − 1 ) ⊗ ( deg ⁡ v ) ) ⊗ | V | . {\displaystyle =\left(f_{u}T^{\otimes (\deg u)}\right)^{\otimes |U|}\left(f_{v}(T^{-1})^{\otimes (\deg v)}\right)^{\otimes |V|}.} Thus, Holant ( G , f u , f v ) {\displaystyle {\text{Holant}}(G,f_{u},f_{v})} and Holant ( G , f u T ⊗ ( deg ⁡ u ) , ( T − 1 ) ⊗ ( deg ⁡ v ) f v ) {\displaystyle {\text{Holant}}(G,f_{u}T^{\otimes (\deg u)},(T^{-1})^{\otimes (\deg v)}f_{v})} have exactly the same Holant value for every constraint graph. They essentially define the same counting problem. === Specific examples === ==== Vertex covers and independent sets ==== Let G be a graph. There is a 1-to-1 correspondence between the vertex covers of G and the independent sets of G. For any set S of vertices of G, S is a vertex cover in G if and only if the complement of S is an independent set in G. Thus, the number of vertex covers in G is exactly the same as the number of independent sets in G. The equivalence of these two counting problems can also be proved using a holographic reduction. For simplicity, let G be a 3-regular graph. The 2-stretch of G gives a bipartite graph H=(U,V,E), where U corresponds to the edges in G and V corresponds to the vertices in G. The Holant problem that naturally corresponds to counting the number of vertex covers in G is Holant ( H , OR 2 , EQUAL 3 ) . {\displaystyle {\text{Holant}}(H,{\text{OR}}_{2},{\text{EQUAL}}_{3}).} The truth table of OR2 as a row vector is (0,1,1,1). The truth table of EQUAL3 as a column vector is ( 1 , 0 , 0 , 0 , 0 , 0 , 0 , 1 ) T = [ 1 0 ] ⊗ 3 + [ 0 1 ] ⊗ 3 {\displaystyle (1,0,0,0,0,0,0,1)^{T}={\begin{bmatrix}1\\0\end{bmatrix}}^{\otimes 3}+{\begin{bmatrix}0\\1\end{bmatrix}}^{\otimes 3}} . Then under a holographic transformation by [ 0 1 1 0 ] , {\displaystyle {\begin{bmatrix}0&1\\1&0\end{bmatrix}},} OR 2 ⊗ | U | EQUAL 3 ⊗ | V | {\displaystyle {\text{OR}}_{2}^{\otimes |U|}{\text{EQUAL}}_{3}^{\otimes |V|}} = ( 0 , 1 , 1 , 1 ) ⊗ | U | ( [ 1 0 ] ⊗ 3 + [ 0 1 ] ⊗ 3 ) ⊗ | V | {\displaystyle =(0,1,1,1)^{\otimes |U|}\left({\begin{bmatrix}1\\0\end{bmatrix}}^{\otimes 3}+{\begin{bmatrix}0\\1\end{bmatrix}}^{\otimes 3}\right)^{\otimes |V|}} = ( 0 , 1 , 1 , 1 ) ⊗ | U | [ 0 1 1 0 ] ⊗ | E | [ 0 1 1 0 ] ⊗ | E | ( [ 1 0 ] ⊗ 3 + [ 0 1 ] ⊗ 3 ) ⊗ | V | {\displaystyle =(0,1,1,1)^{\otimes |U|}{\begin{bmatrix}0&1\\1&0\end{bmatrix}}^{\otimes |E|}{\begin{bmatrix}0&1\\1&0\end{bmatrix}}^{\otimes |E|}\left({\begin{bmatrix}1\\0\end{bmatrix}}^{\otimes 3}+{\begin{bmatrix}0\\1\end{bmatrix}}^{\otimes 3}\right)^{\otimes |V|}} = ( ( 0 , 1 , 1 , 1 ) [ 0 1 1 0 ] ⊗ 2 ) ⊗ | U | ( ( [ 0 1 1 0 ] [ 1 0 ] ) ⊗ 3 + ( [ 0 1 1 0 ] [ 0 1 ] ) ⊗ 3 ) ⊗ | V | {\displaystyle =\left((0,1,1,1){\begin{bmatrix}0&1\\1&0\end{bmatrix}}^{\otimes 2}\right)^{\otimes |U|}\left(\left({\begin{bmatrix}0&1\\1&0\end{bmatrix}}{\begin{bmatrix}1\\0\end{bmatrix}}\right)^{\otimes 3}+\left({\begin{bmatrix}0&1\\1&0\end{bmatrix}}{\begin{bmatrix}0\\1\end{bmatrix}}\right)^{\otimes 3}\right)^{\otimes |V|}} = ( 1 , 1 , 1 , 0 ) ⊗ | U | ( [ 0 1 ] ⊗ 3 + [ 1 0 ] ⊗ 3 ) ⊗ | V | {\displaystyle =(1,1,1,0)^{\otimes |U|}\left({\begin{bmatrix}0\\1\end{bmatrix}}^{\otimes 3}+{\begin{bmatrix}1\\0\end{bmatrix}}^{\otimes 3}\right)^{\otimes |V|}} = NAND 2 ⊗ | U | EQUAL 3 ⊗ | V | , {\displaystyle ={\text{NAND}}_{2}^{\otim

    Read more →
  • Single-source publishing

    Single-source publishing

    Single-source publishing, also known as single-sourcing publishing, is a content management method which allows the same source content to be used across different forms of media and more than one time. The labor-intensive and expensive work of editing need only be carried out once, on only one document; that source document (the single source of truth) can then be stored in one place and reused. This reduces the potential for error, as corrections are only made one time in the source document. The benefits of single-source publishing primarily relate to the editor rather than the user. The user benefits from the consistency that single-sourcing brings to terminology and information. This assumes the content manager has applied an organized conceptualization to the underlying content (A poor conceptualization can make single-source publishing less useful). Single-source publishing is sometimes used synonymously with multi-channel publishing though whether or not the two terms are synonymous is a matter of discussion. == Definition == While there is a general definition of single-source publishing, there is no single official delineation between single-source publishing and multi-channel publishing, nor are there any official governing bodies to provide such a delineation. Single-source publishing is most often understood as the creation of one source document in an authoring tool and converting that document into different file formats or human languages (or both) multiple times with minimal effort. Multi-channel publishing can either be seen as synonymous with single-source publishing, or similar in that there is one source document but the process itself results in more than a mere reproduction of that source. == History == The origins of single-source publishing lie, indirectly, with the release of Windows 3.0 in 1990. With the eclipsing of MS-DOS by graphical user interfaces, help files went from being unreadable text along the bottom of the screen to hypertext systems such as WinHelp. On-screen help interfaces allowed software companies to cease the printing of large, expensive help manuals with their products, reducing costs for both producer and consumer. This system raised opportunities as well, and many developers fundamentally changed the way they thought about publishing. Writers of software documentation did not simply move from being writers of traditional bound books to writers of electronic publishing, but rather they became authors of central documents which could be reused multiple times across multiple formats. The first single-source publishing project was started in 1993 by Cornelia Hofmann at Schneider Electric in Seligenstadt, using software based on Interleaf to automatically create paper documentation in multiple languages based on a single original source file. XML, developed during the mid- to late-1990s, was also significant to the development of single-source publishing as a method. XML, a markup language, allows developers to separate their documentation into two layers: a shell-like layer based on presentation and a core-like layer based on the actual written content. This method allows developers to write the content only one time while switching it in and out of multiple different formats and delivery methods. In the mid-1990s, several firms began creating and using single-source content for technical documentation (Boeing Helicopter, Sikorsky Aviation and Pratt & Whitney Canada) and user manuals (Ford owners manuals) based on tagged SGML and XML content generated using the Arbortext Epic editor with add-on functions developed by a contractor. The concept behind this usage was that complex, hierarchical content that did not lend itself to discrete componentization could be used across a variety of requirements by tagging the differences within a single document using the capabilities built into SGML and XML. Ford, for example, was able to tag its single owner's manual files so that 12 model years could be generated via a resolution script running on the single completed file. Pratt & Whitney, likewise, was able to tag up to 20 subsets of its jet engine manuals in single-source files, calling out the desired version at publication time. World Book Encyclopedia also used the concept to tag its articles for American and British versions of English. Starting from the early 2000s, single-source publishing was used with an increasing frequency in the field of technical translation. It is still regarded as the most efficient method of publishing the same material in different languages. Once a printed manual was translated, for example, the online help for the software program which the manual accompanies could be automatically generated using the method. Metadata could be created for an entire manual and individual pages or files could then be translated from that metadata with only one step, removing the need to recreate information or even database structures. Although single-source publishing is now decades old, its importance has increased urgently as of the 2010s. As consumption of information products rises and the number of target audiences expands, so does the work of developers and content creators. Within the industry of software and its documentation, there is a perception that the choice is to embrace single-source publishing or render one's operations obsolete. == Criticism == Editors using single-source publishing have been criticized for below-standard work quality, leading some critics to describe single-source publishing as the "conveyor belt assembly" of content creation. While heavily used in technical translation, there are risks of error in regard to indexing. While two words might be synonyms in English, they may not be synonyms in another language. In a document produced via single-sourcing, the index will be translated automatically and the two words will be rendered as synonyms. This is because they are synonyms in the source language, while in the target language they are not.

    Read more →
  • Exposure Notification

    Exposure Notification

    The (Google/Apple) Exposure Notification System (GAEN) is a framework and protocol specification developed by Apple Inc. and Google to facilitate digital contact tracing during the COVID-19 pandemic. When used by health authorities, it augments more traditional contact tracing techniques by automatically logging close approaches among notification system users using Android or iOS smartphones. Exposure Notification is a decentralized reporting protocol built on a combination of Bluetooth Low Energy technology and privacy-preserving cryptography. It is an opt-in feature within COVID-19 apps developed and published by authorized health authorities. Unveiled on April 10, 2020, it was made available on iOS on May 20, 2020, as part of the iOS 13.5 update and on December 14, 2020, as part of the iOS 12.5 update for older iPhones. On Android, it was added to devices via a Google Play Services update, supporting all versions since Android Marshmallow. The Apple/Google protocol is similar to the Decentralized Privacy-Preserving Proximity Tracing (DP-3T) protocol created by the European DP-3T consortium and the Temporary Contact Number (TCN) protocol by Covid Watch, but is implemented at the operating system level, which allows for more efficient operation as a background process. Since May 2020, a variant of the DP-3T protocol is supported by the Exposure Notification Interface. Other protocols are constrained in operation because they are not privileged over normal apps. This leads to issues, particularly on iOS devices where digital contact tracing apps running in the background experience significantly degraded performance. The joint approach is also designed to maintain interoperability between Android and iOS devices, which constitute nearly all of the market. The ACLU stated the approach "appears to mitigate the worst privacy and centralization risks, but there is still room for improvement". In late April, Google and Apple shifted the emphasis of the naming of the system, describing it as an "exposure notification service", rather than "contact tracing" system. == Technical specification == Digital contact tracing protocols typically have two major responsibilities: encounter logging and infection reporting. Exposure Notification only involves encounter logging which is a decentralized architecture. The majority of infection reporting is centralized in individual app implementations. To handle encounter logging, the system uses Bluetooth Low Energy to send tracking messages to nearby devices running the protocol to discover encounters with other people. The tracking messages contain unique identifiers that are encrypted with a secret daily key held by the sending device. These identifiers change every 15–20 minutes as well as Bluetooth MAC address in order to prevent tracking of clients by malicious third parties through observing static identifiers over time. The sender's daily encryption keys are generated using a random number generator. Devices record received messages, retaining them locally for 14 days. If a user tests positive for infection, the last 14 days of their daily encryption keys can be uploaded to a central server, where it is then broadcast to all devices on the network. The method through which daily encryption keys are transmitted to the central server and broadcast is defined by individual app developers. The Google-developed reference implementation calls for a health official to request a one-time verification code (VC) from a verification server, which the user enters into the encounter logging app. This causes the app to obtain a cryptographically signed certificate, which is used to authorize the submission of keys to the central reporting server. The received keys are then provided to the protocol, where each client individually searches for matches in their local encounter history. If a match meeting certain risk parameters is found, the app notifies the user of potential exposure to the infection. Google and Apple intend to use the received signal strength (RSSI) of the beacon messages as a source to infer proximity. RSSI and other signal metadata will also be encrypted to resist deanonymization attacks. === Version 1.0 === To generate encounter identifiers, first a persistent 32-byte private Tracing Key ( t k {\displaystyle tk} ) is generated by a client. From this a 16 byte Daily Tracing Key is derived using the algorithm d t k i = H K D F ( t k , N U L L , 'CT-DTK' | | D i , 16 ) {\displaystyle dtk_{i}=HKDF(tk,NULL,{\text{'CT-DTK'}}||D_{i},16)} , where H K D F ( Key, Salt, Data, OutputLength ) {\displaystyle HKDF({\text{Key, Salt, Data, OutputLength}})} is a HKDF function using SHA-256, and D i {\displaystyle D_{i}} is the day number for the 24-hour window the broadcast is in starting from Unix Epoch Time. These generated keys are later sent to the central reporting server should a user become infected. From the daily tracing key a 16-byte temporary Rolling Proximity Identifier is generated every 10 minutes with the algorithm R P I i , j = Truncate ( H M A C ( d t k i , 'CT-RPI' | | T I N j ) , 16 ) {\displaystyle RPI_{i,j}={\text{Truncate}}(HMAC(dtk_{i},{\text{'CT-RPI'}}||TIN_{j}),16)} , where H M A C ( Key, Data ) {\displaystyle HMAC({\text{Key, Data}})} is a HMAC function using SHA-256, and T I N j {\displaystyle TIN_{j}} is the time interval number, representing a unique index for every 10 minute period in a 24-hour day. The Truncate function returns the first 16 bytes of the HMAC value. When two clients come within proximity of each other they exchange and locally store the current R P I i , j {\displaystyle RPI_{i,j}} as the encounter identifier. Once a registered health authority has confirmed the infection of a user, the user's Daily Tracing Key for the past 14 days is uploaded to the central reporting server. Clients then download this report and individually recalculate every Rolling Proximity Identifier used in the report period, matching it against the user's local encounter log. If a matching entry is found, then contact has been established and the app presents a notification to the user warning them of potential infection. === Version 1.1 === Unlike version 1.0 of the protocol, version 1.1 does not use a persistent tracing key, rather every day a new random 16-byte Temporary Exposure Key ( t e k i {\displaystyle tek_{i}} ) is generated. This is analogous to the daily tracing key from version 1.0. Here i {\displaystyle i} denotes the time is discretized in 10 minute intervals starting from Unix Epoch Time. From this two 128-bit keys are calculated, the Rolling Proximity Identifier Key ( R P I K i {\displaystyle RPIK_{i}} ) and the Associated Encrypted Metadata Key ( A E M K i {\displaystyle AEMK_{i}} ). R P I K i {\displaystyle RPIK_{i}} is calculated with the algorithm R P I K i = H K D F ( t e k i , N U L L , 'EN-RPIK' , 16 ) {\displaystyle RPIK_{i}=HKDF(tek_{i},NULL,{\text{'EN-RPIK'}},16)} , and A E M K i {\displaystyle AEMK_{i}} using the algorithm A E M K i = H K D F ( t e k i , N U L L , 'EN-AEMK' , 16 ) {\displaystyle AEMK_{i}=HKDF(tek_{i},NULL,{\text{'EN-AEMK'}},16)} . From these values a temporary Rolling Proximity Identifier ( R P I i , j {\displaystyle RPI_{i,j}} ) is generated every time the BLE MAC address changes, roughly every 15–20 minutes. The following algorithm is used: R P I i , j = A E S 128 ( R P I K i , 'EN-RPI' | | 0 x 000000000000 | | E N I N j ) {\displaystyle RPI_{i,j}=AES128(RPIK_{i},{\text{'EN-RPI'}}||{\mathtt {0x000000000000}}||ENIN_{j})} , where A E S 128 ( Key, Data ) {\displaystyle AES128({\text{Key, Data}})} is an AES cryptography function with a 128-bit key, the data is one 16-byte block, j {\displaystyle j} denotes the Unix Epoch Time at the moment the roll occurs, and E N I N j {\displaystyle ENIN_{j}} is the corresponding 10-minute interval number. Next, additional Associated Encrypted Metadata is encrypted. What the metadata represents is not specified, likely to allow the later expansion of the protocol. The following algorithm is used: Associated Encrypted Metadata i , j = A E S 128 _ C T R ( A E M K i , R P I i , j , Metadata ) {\displaystyle {\text{Associated Encrypted Metadata}}_{i,j}=AES128\_CTR(AEMK_{i},RPI_{i,j},{\text{Metadata}})} , where A E S 128 _ C T R ( Key, IV, Data ) {\displaystyle AES128\_CTR({\text{Key, IV, Data}})} denotes AES encryption with a 128-bit key in CTR mode. The Rolling Proximity Identifier and the Associated Encrypted Metadata are then combined and broadcast using BLE. Clients exchange and log these payloads. Once a registered health authority has confirmed the infection of a user, the user's Temporary Exposure Keys t e k i {\displaystyle tek_{i}} and their respective interval numbers i {\displaystyle i} for the past 14 days are uploaded to the central reporting server. Clients then download this report and individually recalculate every Rolling Proximity Identifier starting from interval number i {\displaystyle i} ,

    Read more →
  • Outline of artificial intelligence

    Outline of artificial intelligence

    The following outline is provided as an overview of and topical guide to artificial intelligence: Artificial intelligence (AI) is intelligence exhibited by machines or software. It is also the name of the scientific field which studies how to create computers and computer software that are capable of intelligent behavior. == AI terminology == Glossary of artificial intelligence == Goals and applications == === General intelligence === Artificial general intelligence AI-complete === Reasoning and problem solving === Automated reasoning Mathematics Automated theorem prover Computer-assisted proof – Computer algebra General Problem Solver Expert system – Decision support system – Clinical decision support system – === Knowledge representation === Knowledge representation Knowledge management Cyc === Planning === Automated planning and scheduling Strategic planning Sussman anomaly – === Learning === Machine learning – Constrained Conditional Models – Deep learning – Neural modeling fields – Supervised learning – Weak supervision (semi-supervised learning) – Unsupervised learning – === Natural language processing === Natural language processing (outline) – Chatterbots – Language identification – Large language model – Retrieval-augmented generation – Natural language user interface – Natural language understanding – Machine translation – Statistical semantics – Question answering – Semantic translation – Concept mining – Data mining – Text mining – Process mining – E-mail spam filtering – Information extraction – Named-entity extraction – Coreference resolution – Named-entity recognition – Relationship extraction – Terminology extraction – === Perception === Machine perception Pattern recognition – Computer Audition – Speech recognition – Speaker recognition – Computer vision (outline) – Image processing Intelligent word recognition – Object recognition – Optical mark recognition – Handwriting recognition – Optical character recognition – Automatic number plate recognition – Information extraction – Image retrieval – Automatic image annotation – Facial recognition systems – Silent speech interface – Activity recognition – Percept (artificial intelligence) === Robotics === Robotics – Behavior-based robotics – Cognitive – Cybernetics – Developmental robotics – Evolutionary robotics – === Control === Intelligent control Self-management (computer science) – Autonomic Computing – Autonomic Networking – === Social intelligence === Affective computing Kismet === Game playing === Game artificial intelligence – Computer game bot – computer replacement for human players. Video game AI – Computer chess – Computer Go – General game playing – General video game playing – === Creativity, art and entertainment === Artificial creativity Artificial life Artificial intelligence art AI anthropomorphism AI agent AI web browser AI boom AI slop Creative computing Generative artificial intelligence Generative pre trained transformer Uncanny valley Music and artificial intelligence Computational humor Chatbot === Integrated AI systems === AIBO – Sony's robot dog. It integrates vision, hearing and motorskills. Asimo (2000 to present) – humanoid robot developed by Honda, capable of walking, running, negotiating through pedestrian traffic, climbing and descending stairs, recognizing speech commands and the faces of specific individuals, among a growing set of capabilities. MIRAGE – A.I. embodied humanoid in an augmented reality environment. Cog – M.I.T. humanoid robot project under the direction of Rodney Brooks. QRIO – Sony's version of a humanoid robot. TOPIO, TOSY's humanoid robot that can play ping-pong with humans. Watson (2011) – computer developed by IBM that played and won the game show Jeopardy! It is now being used to guide nurses in medical procedures. Purpose: Open domain question answering Technologies employed: Natural language processing Information retrieval Knowledge representation Automated reasoning Machine learning Project Debater (2018) – artificially intelligent computer system, designed to make coherent arguments, developed at IBM's lab in Haifa, Israel. === Intelligent personal assistants === Intelligent personal assistant – Amazon Alexa – Assistant – Braina – Cortana – Google Assistant – Google Now – Mycroft – Siri – Viv – === Other applications === Artificial life – simulation of natural life through the means of computers, robotics, or biochemistry. Automatic target recognition – Diagnosis (artificial intelligence) – Speech generating device – Vehicle infrastructure integration – Virtual Intelligence – == History == History of artificial intelligence Progress in artificial intelligence Timeline of artificial intelligence AI effect – as soon as AI successfully solves a problem, the problem is no longer considered by the public to be a part of AI. This phenomenon has occurred in relation to every AI application produced, so far, throughout the history of development of AI. AI winter – a period of disappointment and funding reductions occurring after a wave of high expectations and funding in AI. Such funding cuts occurred in the 1970s, for instance. Moore's law === History by period === 2017 in artificial intelligence 2018 in artificial intelligence 2019 in artificial intelligence 2020 in artificial intelligence 2021 in artificial intelligence 2022 in artificial intelligence 2023 in artificial intelligence 2024 in artificial intelligence 2025 in artificial intelligence 2026 in artificial intelligence 2027 in artificial intelligence 2028 in artificial intelligence 2029 in artificial intelligence === History by subject === History of logic (formal reasoning is an important precursor of AI) History of machine learning (timeline) History of machine translation (timeline) History of natural language processing History of optical character recognition (timeline) == AI algorithms and techniques == === Search === Discrete search algorithms Uninformed search Brute force search – Problem-solving technique and algorithmic paradigmPages displaying short descriptions of redirect targets Search tree – Data structure in tree form sorted for fast lookup Breadth-first search – Algorithm to search the nodes of a graph Depth-first search – Algorithm to search the nodes of a graph State space search – Class of search algorithmsPages displaying short descriptions of redirect targets Informed search Best-first search – Graph exploring search algorithm A search algorithm – Algorithm used for pathfinding and graph traversal Heuristics – Problem-solving methodPages displaying short descriptions of redirect targets Pruning (algorithm) – Data compression techniquePages displaying short descriptions of redirect targets Adversarial search Minmax algorithm – Decision rule used for minimizing the possible loss for a worst-case scenarioPages displaying short descriptions of redirect targets Logic as search Production system (computer science) – Computer program used to provide artificial intelligence Rule based system – Type of computer systemPages displaying short descriptions of redirect targets Production rule – Computer program used to provide artificial intelligence Inference rule – Method of deriving conclusionsPages displaying short descriptions of redirect targets Horn clause – Type of logical formula Forward chaining – Inference engine in an expert system Backward chaining – Method of forming inferences Planning as search State space search – Class of search algorithmsPages displaying short descriptions of redirect targets Means–ends analysis – Problem solving technique === Optimization search === Optimization (mathematics) algorithms Hill climbing – Optimization algorithm Simulated annealing – Probabilistic optimization technique and metaheuristic Beam search – Heuristic search algorithm Random optimization – Optimization technique in mathematics Evolutionary computation Genetic algorithms – Competitive algorithm for searching a problem spacePages displaying short descriptions of redirect targets Gene expression programming – Evolutionary algorithm Genetic programming – Evolving computer programs with techniques analogous to natural genetic processes Differential evolution – Method of mathematical optimization Society based learning algorithms. Swarm intelligence – Collective behavior of decentralized, self-organized systems Particle swarm optimization – Iterative simulation method Ant colony optimization – Optimization algorithmPages displaying short descriptions of redirect targets Metaheuristic – Optimization technique === Logic === Logic and automated reasoning Programming using logic Logic programming – Programming paradigm based on formal logic See "Logic as search" above. Forms of Logic Propositional logic First-order logic First-order logic with equality Constraint satisfaction – Process in artificial intelligence and operations research Fuzzy logic Fuzzy set theory – Sets whose elements have degrees of membershipPages displaying short descriptions

    Read more →
  • Irish logarithm

    Irish logarithm

    The Irish logarithm was a system of number manipulation invented by Percy Ludgate for machine multiplication. The system used a combination of mechanical cams as lookup tables and mechanical addition to sum pseudo-logarithmic indices to produce partial products, which were then added to produce results. The technique is similar to Zech logarithms (also known as Jacobi logarithms), but uses a system of indices original to Ludgate. == Concept == Ludgate's algorithm compresses the multiplication of two single decimal numbers into two table lookups (to convert the digits into indices), the addition of the two indices to create a new index which is input to a second lookup table that generates the output product. Because both lookup tables are one-dimensional, and the addition of linear movements is simple to implement mechanically, this allows a less complex mechanism than would be needed to implement a two-dimensional 10×10 multiplication lookup table. Ludgate stated that he deliberately chose the values in his tables to be as small as he could make them; given this, Ludgate's tables can be simply constructed from first principles, either via pen-and-paper methods, or a systematic search using only a few tens of lines of program code. They do not correspond to either Zech logarithms, Remak indexes or Korn indexes. == Pseudocode == The following is an implementation of Ludgate's Irish logarithm algorithm in the Python programming language: Table 1 is taken from Ludgate's original paper; given the first table, the contents of Table 2 can be trivially derived from Table 1 and the definition of the algorithm. Note since that the last third of the second table is entirely zeros, this could be exploited to further simplify a mechanical implementation of the algorithm.

    Read more →
  • NCSA Brown Dog

    NCSA Brown Dog

    NCSA Brown Dog is a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research. It is supported by the National Center for Supercomputing Applications (NCSA) that is funded by the National Science Foundation (NSF). == History == Brown Dog is part of the DataNet partners program funded by NSF in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science, engineering and education. Brown Dog is part of a follow-on effort called Data Infrastructure Building Blocks (DIBBs), focused on building software to support DataNet. The project was proposed by researchers at NCSA and the University of Illinois Urbana-Champaign as well as researchers from Boston University and the University of North Carolina at Chapel Hill. == Unstructured, uncurated, long tail data == Much scientific data is smaller, unstructured and uncurated and thus not easily shared. Such data is sometimes referred to as "long tail" data. This borrows a term from statistics and refers to the tail of the distribution of project sizes. The majority of smaller projects lack the resources to properly steward the data they produce. This so-called "long tail" data, both past and present, has the potential to inform future research in many study areas. Much of this data has become inaccessible due to obsolete software and file formats. The resulting impossibility of reviewing data from older research disrupts the overall scientific research project. == Approach == Brown Dog describes itself as the "super mutt" of software (thus the name "Brown Dog"), serving as a low-level data infrastructure to interface digital data content across the internet. Its approach is to use every possible source of automated help (i.e., software) in existence in a robust and provenance-preserving manner to create a service that can deal with as much of this data as possible. The project sees the broader impact of its work in its potential to serve the general public as a sort of "DNS for data", with the goal of making all data and all file formats as accessible as webpages are today. == Technology == Brown Dog seeks to address problems involving the use of uncurated and unstructured data collections through the development of two services: the Data Access Proxy (DAP) to aid in the conversion of file formats and the Data Tilling Services (DTS) for the automatic extraction of metadata from file contents. Once developed, researchers and general public users will be able to download browser plugins and other tools from the Brown Dog tool catalog. === Data Tilling Service === Data Tilling Service (DTS) will allow users to search data collections using an existing file to discover other similar files in a collection. A DTS search field will be appended to configured browsers where example files can be dropped. This tells DTS to search all the files under a given URL for files similar to the dropped file. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return all images in the collection that also contain three people. If DTS encounters a foreign file format, it will utilize DAP to make the file accessible. DTS also indexes the data and extract and appends metadata to files and collections enabling users to gain some sense of the type of data they are encountering. This service runs on port 9443. === Data Access Proxy === Data Access Proxy (DAP) allows users to access data files that would otherwise be unreadable. Similar to an internet gateway or Domain Name Service, the DAP configuration would be entered into a user's machine and browser settings. Data requests over HTTP would first be examined by DAP to determine if the native file format is readable on the client device. If not, DAP converts the file into the best available format readable by the client machine. Alternatively, the user could specify the desired format themselves. This service runs on port 8184. == Use cases == Brown Dog targets three use cases proposed by groups within the EarthCube research communities. Developers and researchers from these communities will work together on use cases that span geoscience, engineering, biology and social science. === Long tail vegetation data in ecology and global change biology === This use case is led by Michael Dietze, Boston University Data on the abundance, species composition, and size structure of vegetation is critically important for a wide array of sub-disciplines in ecology, conservation, natural resource management, and global change biology. However, addressing many of the pressing questions in these disciplines will require that terrestrial biosphere and hydrologic models are able to assimilate the large amount of long-tail data that exists but is largely inaccessible. The Brown Dog team in cooperation with researches from Dietze's lab will facilitate the capture of a huge body of smaller research-oriented vegetation data sets collected over many decades and historical vegetation data embedded in Public Land Survey data dating back to 1785. This data will be used as initial conditions for models, to make sense of other large data sets and for model calibration and validation. === Designing green infrastructure considering storm water and human requirements === This use case is led by Barbara Minsker], University of Illinois at Urbana-Champaign]; William Sullivan, University of Illinois at Urbana-Champaign; Arthur Schmidt, University of Illinois at Urbana-Champaign. This case study involves developing novel green infrastructure design criteria and models that integrate requirements for storm water management and ecosystem and human health and well being. To address the scientific and social problems associated with the design of green spaces, data accessibility and availability is a major challenge. This study will focus on identified areas of the Green Healthy Neighborhood Planning region within the City of Chicago where existing local sewer performance is most deficient and where changes in impervious area through green infrastructure would be beneficial to under served neighborhoods. Brown Dog will be used to extract long-tail experimental data on human landscape preferences and health impacts. This data will be used to develop a human health impacts model that will then be linked together with a terrestrial biosphere model and a storm water model using Brown Dog technology. === Development and application for critical zone studies === This use case is led by Praveen Kumar, University of Illinois at Urbana-Champaign Critical Zone (CZ) is the "skin" of the earth that extends from the treetops to the bedrock that is created by life processes working at scales from microbes to biomes. The Critical Zone supports all terrestrial living systems. Its upper part is the bio-mantle. This is where terrestrial biota live, reproduce, use and expend energy, and where their wastes and remains accumulate and decompose. It encompasses the soil, which acts as a geomembrane through which water and solutes, energy, gases, solids, and organisms interact with the atmosphere, biosphere, hydrosphere, and lithosphere. A variety of drivers affect this bio-dynamic zone, ranging from climate and deforestation to agriculture, grazing and human development. Understanding and predicting these effects is central to managing and sustaining vital ecosystem services such as soil fertility, water purification, and production of food resources, and, at larger scales, global carbon cycling and carbon sequestration. The CZ provides a unifying framework for integrating terrestrial surface and near-surface environments, and reflects an intricate web of biological and chemical processes and human impacts occurring at vastly different temporal and spatial scales. The nature of these data create significant challenges for inter-disciplinary studies of the CZ because integration of the variety and number of data products and models has been a barrier. On the other hand, CZ data provides an excellent opportunity for defining, testing and implementing Brown Dog technologies. In this context "unstructured" data is viewed broadly as consisting of a collection of heterogeneous data with formats that reflect temporal and disciplinary legacies, data from emerging low cost open hardware based sensors and embedded sensor networks that lack well defined metadata and sensor characteristics, as well as data that are available as maps, images and text. == NSF Award == CIF21 DIBBs: Brown Dog was awarded in the winter of 2013 with a start date of October 1, 2013. Estimated expiration date is September 30, 2018. The award amount was $10,519,716.00, the largest DIBB award. The principal investigator is Kenton McHenry of NCSA at the University of Illinois at Urbana-Champaign. Coleaders are Jong Lee NCSA/UIU

    Read more →
  • GNU toolchain

    GNU toolchain

    The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain (a suite of tools used in a serial manner) used for developing software applications and operating systems. The GNU toolchain plays a vital role in development of Linux, some BSD systems, and software for embedded systems. Parts of the GNU toolchain are also directly used with or ported to other platforms such as Solaris, macOS, Microsoft Windows (via Cygwin and MinGW/MSYS/WSL2), Sony PlayStation Portable (used by PSP modding scene) and Sony PlayStation 3. == Components == Projects in the GNU toolchain are: GNU Autotools (build system) – Software build toolset from GNU GNU Binutils – GNU software development tools for executable code GNU Bison – Yacc-compatible parser generator program GNU C Library – GNU implementation of the standard C libraryPages displaying short descriptions of redirect targets GNU Compiler Collection – Free and open-source compiler for various programming languages GNU Debugger – Source-level debugger GNU m4 – General-purpose macro processor GNU make – Software build automation tool

    Read more →
  • Collision problem

    Collision problem

    The r-to-1 collision problem is an important theoretical problem in complexity theory, quantum computing, and computational mathematics. The collision problem most often refers to the 2-to-1 version: given n {\displaystyle n} even and a function f : { 1 , … , n } → { 1 , … , n } {\displaystyle f:\,\{1,\ldots ,n\}\rightarrow \{1,\ldots ,n\}} , we are promised that f is either 1-to-1 or 2-to-1. We are only allowed to make queries about the value of f ( i ) {\displaystyle f(i)} for any i ∈ { 1 , … , n } {\displaystyle i\in \{1,\ldots ,n\}} . The problem then asks how many such queries we need to make to determine with certainty whether f is 1-to-1 or 2-to-1. == Classical solutions == === Deterministic === Solving the 2-to-1 version deterministically requires n 2 + 1 {\textstyle {\frac {n}{2}}+1} queries, and in general distinguishing r-to-1 functions from 1-to-1 functions requires n r + 1 {\textstyle {\frac {n}{r}}+1} queries. This is a straightforward application of the pigeonhole principle: if a function is r-to-1, then after n r + 1 {\textstyle {\frac {n}{r}}+1} queries we are guaranteed to have found a collision. If a function is 1-to-1, then no collision exists. Thus, n r + 1 {\textstyle {\frac {n}{r}}+1} queries suffice. If we are unlucky, then the first n / r {\displaystyle n/r} queries could return distinct answers, so n r + 1 {\textstyle {\frac {n}{r}}+1} queries is also necessary. === Randomized === If we allow randomness, the problem is easier. By the birthday paradox, if we choose (distinct) queries at random, then with high probability we find a collision in any fixed 2-to-1 function after Θ ( n ) {\displaystyle \Theta ({\sqrt {n}})} queries. == Quantum solution == The BHT algorithm, which uses Grover's algorithm, solves this problem optimally by only making O ( n 1 / 3 ) {\displaystyle O(n^{1/3})} queries to f. The matching lower bound of Ω ( n 1 / 3 ) {\displaystyle \Omega (n^{1/3})} was proved by Aaronson and Shi using the polynomial method.

    Read more →
  • Dependency network (graphical model)

    Dependency network (graphical model)

    Dependency networks (DNs) are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables. Unlike Bayesian networks, DNs may contain cycles. Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents. == Markov blanket == In a Bayesian network, the Markov blanket of a node is the set of parents and children of that node, together with the children's parents. The values of the parents and children of a node evidently give information about that node. However, its children's parents also have to be included in the Markov blanket, because they can be used to explain away the node in question. In a Markov random field, the Markov blanket for a node is simply its adjacent (or neighboring) nodes. In a dependency network, the Markov blanket for a node is simply the set of its parents. == Dependency network versus Bayesian networks == Dependency networks have advantages and disadvantages with respect to Bayesian networks. In particular, they are easier to parameterize from data, as there are efficient algorithms for learning both the structure and probabilities of a dependency network from data. Such algorithms are not available for Bayesian networks, for which the problem of determining the optimal structure is NP-hard. Nonetheless, a dependency network may be more difficult to construct using a knowledge-based approach driven by expert-knowledge. == Dependency networks versus Markov networks == Consistent dependency networks and Markov networks have the same representational power. Nonetheless, it is possible to construct non-consistent dependency networks, i.e., dependency networks for which there is no compatible valid joint probability distribution. Markov networks, in contrast, are always consistent. == Definition == A consistent dependency network for a set of random variables X = ( X 1 , … , X n ) {\textstyle \mathbf {X} =(X_{1},\ldots ,X_{n})} with joint distribution p ( x ) {\displaystyle p(\mathbf {x} )} is a pair ( G , P ) {\displaystyle (G,P)} where G {\displaystyle G} is a cyclic directed graph, where each of its nodes corresponds to a variable in X {\displaystyle \mathbf {X} } , and P {\displaystyle P} is a set of conditional probability distributions. The parents of node X i {\displaystyle X_{i}} , denoted P a i {\displaystyle \mathbf {Pa_{i}} } , correspond to those variables P a i ⊆ ( X 1 , … , X i − 1 , X i + 1 , … , X n ) {\displaystyle \mathbf {Pa_{i}} \subseteq (X_{1},\ldots ,X_{i-1},X_{i+1},\ldots ,X_{n})} that satisfy the following independence relationships p ( x i ∣ p a i ) = p ( x i ∣ x 1 , … , x i − 1 , x i + 1 , … , x n ) = p ( x i ∣ x − x i ) . {\displaystyle p(x_{i}\mid \mathbf {pa_{i}} )=p(x_{i}\mid x_{1},\ldots ,x_{i-1},x_{i+1},\ldots ,x_{n})=p(x_{i}\mid \mathbf {x} -{x_{i}}).} The dependency network is consistent in the sense that each local distribution can be obtained from the joint distribution p ( x ) {\displaystyle p(\mathbf {x} )} . Dependency networks learned using large data sets with large sample sizes will almost always be consistent. A non-consistent network is a network for which there is no joint probability distribution compatible with the pair ( G , P ) {\displaystyle (G,P)} . In that case, there is no joint probability distribution that satisfies the independence relationships subsumed by that pair. == Structure and parameters learning == Two important tasks in a dependency network are to learn its structure and probabilities from data. Essentially, the learning algorithm consists of independently performing a probabilistic regression or classification for each variable in the domain. It comes from observation that the local distribution for variable X i {\displaystyle X_{i}} in a dependency network is the conditional distribution p ( x i | x − x i ) {\displaystyle p(x_{i}|\mathbf {x} -{x_{i}})} , which can be estimated by any number of classification or regression techniques, such as methods using a probabilistic decision tree, a neural network or a probabilistic support-vector machine. Hence, for each variable X i {\displaystyle X_{i}} in domain X {\displaystyle X} , we independently estimate its local distribution from data using a classification algorithm, even though it is a distinct method for each variable. Here, we will briefly show how probabilistic decision trees are used to estimate the local distributions. For each variable X i {\displaystyle X_{i}} in X {\displaystyle \mathbf {X} } , a probabilistic decision tree is learned where X i {\displaystyle X_{i}} is the target variable and X − X i {\displaystyle \mathbf {X} -X_{i}} are the input variables. To learn a decision tree structure for X i {\displaystyle X_{i}} , the search algorithm begins with a singleton root node without children. Then, each leaf node in the tree is replaced with a binary split on some variable X j {\displaystyle X_{j}} in X − X i {\displaystyle \mathbf {X} -X_{i}} , until no more replacements increase the score of the tree. == Probabilistic Inference == A probabilistic inference is the task in which we wish to answer probabilistic queries of the form p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} , given a graphical model for X {\displaystyle \mathbf {X} } , where Y {\displaystyle \mathbf {Y} } (the 'target' variables) Z {\displaystyle \mathbf {Z} } (the 'input' variables) are disjoint subsets of X {\displaystyle \mathbf {X} } . One of the alternatives for performing probabilistic inference is using Gibbs sampling. A naive approach for this uses an ordered Gibbs sampler, an important difficulty of which is that if either p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} or p ( z ) {\displaystyle p(\mathbf {z} )} is small, then many iterations are required for an accurate probability estimate. Another approach for estimating p ( y ∣ z ) {\displaystyle p(\mathbf {y\mid z} )} when p ( z ) {\displaystyle p(\mathbf {z} )} is small is to use modified ordered Gibbs sampler, where Z = z {\displaystyle \mathbf {Z=z} } is fixed during Gibbs sampling. It may also happen that y {\displaystyle \mathbf {y} } is rare, e.g. when Y {\displaystyle \mathbf {Y} } has many variables. So, the law of total probability along with the independencies encoded in a dependency network can be used to decompose the inference task into a set of inference tasks on single variables. This approach comes with the advantage that some terms may be obtained by direct lookup, thereby avoiding some Gibbs sampling. You can see below an algorithm that can be used for obtain p ( y | z ) {\displaystyle p(\mathbf {y|z} )} for a particular instance of y ∈ Y {\displaystyle \mathbf {y} \in \mathbf {Y} } and z ∈ Z {\displaystyle \mathbf {z} \in \mathbf {Z} } , where Y {\displaystyle \mathbf {Y} } and Z {\displaystyle \mathbf {Z} } are disjoint subsets. Algorithm 1: U := Y {\displaystyle \mathbf {U:=Y} } ( the unprocessed variables ) P := Z {\displaystyle \mathbf {P:=Z} } ( the processed and conditioning variables ) p := z {\displaystyle \mathbf {p:=z} } ( the values for P {\displaystyle \mathbf {P} } ) While U ≠ ∅ {\displaystyle \mathbf {U} \neq \emptyset } : Choose X i ∈ U {\displaystyle X_{i}\in \mathbf {U} } such that X i {\displaystyle X_{i}} has no more parents in U {\displaystyle U} than any variable in U {\displaystyle U} If all the parents of X {\displaystyle X} are in P {\displaystyle \mathbf {P} } p ( x i | p ) := p ( x i | p a i ) {\displaystyle p(x_{i}|\mathbf {p} ):=p(x_{i}|\mathbf {pa_{i}} )} Else Use a modified ordered Gibbs sampler to determine p ( x i | p ) {\displaystyle p(x_{i}|\mathbf {p} )} U := U − X i {\displaystyle \mathbf {U:=U} -X_{i}} P := P + X i {\displaystyle \mathbf {P:=P} +X_{i}} p := p + x i {\displaystyle \mathbf {p:=p} +x_{i}} Returns the product of the conditionals p ( x i | p ) {\displaystyle p(x_{i}|\mathbf {p} )} == Applications == In addition to the applications to probabilistic inference, the following applications are in the category of Collaborative Filtering (CF), which is the task of predicting preferences. Dependency networks are a natural model class on which to base CF predictions, once an algorithm for this task only needs estimation of p ( x i = 1 | x − x i = 0 ) {\displaystyle p(x_{i}=1|\mathbf {x} -{x_{i}}=0)} to produce recommendations. In particular, these estimates may be obtained by a direct lookup in a dependency network. Predicting what movies a person will like based on his or her ratings of movies seen; Predicting what web pages a person will access based on his or her history on the site; Predicting what news stories a person is interested in based on other stories he or she read; Predicting what product a person will buy based on products he or she has already purchased and/or dropped into his or her shopping basket. Another class of useful applications for dependency networks is related to data visualization, that is

    Read more →