AI Generator Sora

AI Generator Sora — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Semantic neural network

    Semantic neural network

    Semantic neural network (SNN) is based on John von Neumann's neural network [von Neumann, 1966] and Nikolai Amosov M-Network. There are limitations to a link topology for the von Neumann’s network but SNN accept a case without these limitations. Only logical values can be processed, but SNN accept that fuzzy values can be processed too. All neurons into the von Neumann network are synchronized by tacts. For further use of self-synchronizing circuit technique SNN accepts neurons can be self-running or synchronized. In contrast to the von Neumann network there are no limitations for topology of neurons for semantic networks. It leads to the impossibility of relative addressing of neurons as it was done by von Neumann. In this case an absolute readdressing should be used. Every neuron should have a unique identifier that would provide a direct access to another neuron. Of course, neurons interacting by axons-dendrites should have each other's identifiers. An absolute readdressing can be modulated by using neuron specificity as it was realized for biological neural networks. There’s no description for self-reflectiveness and self-modification abilities into the initial description of semantic networks [Dudar Z.V., Shuklin D.E., 2000]. But in [Shuklin D.E. 2004] a conclusion had been drawn about the necessity of introspection and self-modification abilities in the system. For maintenance of these abilities a concept of pointer to neuron is provided. Pointers represent virtual connections between neurons. In this model, bodies and signals transferring through the neurons connections represent a physical body, and virtual connections between neurons are representing an astral body. It is proposed to create models of artificial neuron networks on the basis of virtual machine supporting the opportunity for paranormal effects. SNN is generally used for natural language processing. == Related models == Computational creativity Semantic hashing Semantic Pointer Architecture Sparse distributed memory

    Read more →
  • AIXI

    AIXI

    AIXI is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence. AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs. == Etymology == According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by ξ {\displaystyle \xi } (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations. == Definition == AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment μ {\displaystyle \mu } . The interaction proceeds in time steps, from t = 1 {\displaystyle t=1} to t = m {\displaystyle t=m} , where m ∈ N {\displaystyle m\in \mathbb {N} } is the lifespan of the AIXI agent. At time step t, the agent chooses an action a t ∈ A {\displaystyle a_{t}\in {\mathcal {A}}} (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" e t ∈ E = O × R {\displaystyle e_{t}\in {\mathcal {E}}={\mathcal {O}}\times \mathbb {R} } , which consists of an "observation" o t ∈ O {\displaystyle o_{t}\in {\mathcal {O}}} (e.g., a camera image) and a reward r t ∈ R {\displaystyle r_{t}\in \mathbb {R} } , distributed according to the conditional probability μ ( o t r t | a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t ) {\displaystyle \mu (o_{t}r_{t}|a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t})} , where a 1 o 1 r 1 . . . a t − 1 o t − 1 r t − 1 a t {\displaystyle a_{1}o_{1}r_{1}...a_{t-1}o_{t-1}r_{t-1}a_{t}} is the "history" of actions, observations and rewards. The environment μ {\displaystyle \mu } is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that μ {\displaystyle \mu } is computable, that is, the observations and rewards received by the agent from the environment μ {\displaystyle \mu } can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent. The only goal of the AIXI agent is to maximize ∑ t = 1 m r t {\displaystyle \sum _{t=1}^{m}r_{t}} , that is, the sum of rewards from time step 1 to m. The AIXI agent is associated with a stochastic policy π : ( A × E ) ∗ → A {\displaystyle \pi :({\mathcal {A}}\times {\mathcal {E}})^{}\rightarrow {\mathcal {A}}} , which is the function it uses to choose actions at every time step, where A {\displaystyle {\mathcal {A}}} is the space of all possible actions that AIXI can take and E {\displaystyle {\mathcal {E}}} is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) μ {\displaystyle \mu } can also be thought of as a stochastic policy (which is a function): μ : ( A × E ) ∗ × A → E {\displaystyle \mu :({\mathcal {A}}\times {\mathcal {E}})^{}\times {\mathcal {A}}\rightarrow {\mathcal {E}}} , where the ∗ {\displaystyle } is the Kleene star operation. In general, at time step t {\displaystyle t} (which ranges from 1 to m), AIXI, having previously executed actions a 1 … a t − 1 {\displaystyle a_{1}\dots a_{t-1}} (which is often abbreviated in the literature as a < t {\displaystyle a_{ Read more →

  • Enterprise cognitive system

    Enterprise cognitive system

    Enterprise cognitive systems (ECS) are part of a broader shift in computing, from a programmatic to a probabilistic approach, called cognitive computing. An Enterprise Cognitive System makes a new class of complex decision support problems computable, where the business context is ambiguous, multi-faceted, and fast-evolving, and what to do in such a situation is usually assessed today by the business user. An ECS is designed to synthesize a business context and link it to the desired outcome. It recommends evidence-based actions to help the end-user achieve the desired outcome. It does so by finding past situations similar to the current situation, and extracting the repeated actions that best influence the desired outcome. While general-purpose cognitive systems can be used for different outputs, prescriptive, suggestive, instructive, or simply entertaining, an enterprise cognitive system is focused on action, not insight, to help in assessing what to do in a complex situation. == Key characteristics == ECS have to be: Adaptive: They must learn as information changes, and as goals and requirements evolve. They must resolve ambiguity and tolerate unpredictability. They must be engineered to feed on dynamic data in real time, or near real time. In the Enterprise, near-real time learning from data requires an agile information federation approach to ingest incremental data updates as they occur, and an unsupervised learning approach to ensure that new best practice is leveraged across the organization in a timely manner. Interactive: They must interact easily with users so that those users can define their needs comfortably. They may also interact with other processors, devices, and Cloud services, as well as with people. In the Enterprise, interactions are controlled via existing workflows and UIs. Therefore, embedding best practices directly into these existing interfaces, in the context of a specific step, is critical to ensure maximum end-user adoption. Iterative and stateful: They must aid in defining a problem by asking questions or finding additional source input if a problem statement is ambiguous or incomplete. They must “remember” previous interactions in a process and return information that is suitable for the specific application at that point in time. In the Enterprise, business context is often structured by a business process, and therefore sufficiently data-rich to make relevant recommendations without significant iterations from the end-user. A stateful memory of overall interactions across communication channels is critical for understanding of context, as a static profile will not capture intent and outcome potential the way behavior does. Contextual: They must understand, identify, and extract contextual elements such as meaning, syntax, time, location, appropriate domain, regulations, user's profile, process, task and goal. They may draw on multiple sources of information, including both structured and unstructured digital information, as well as sensory inputs (visual, gestural, auditory, or sensor-provided). In the Enterprise, Context is fragmented and must be aggregated across data types, sources, and locations. In most business environments, such data is captured in existing enterprise information systems, and the effort is linked to quickly source and unify such information. It is rare to have to directly process sensor, audio or visual data in real-time as direct input into the enterprise cognitive system. Instead, these data types are captured by Enterprise Applications and pre-processed into a binary or text format prior to consumption by the System. == Business applications powered by an ECS == Bottlenose – trends and brands monitoring Cybereason – security threat monitoring Dataminr – social media monitoring

    Read more →
  • Decision tree pruning

    Decision tree pruning

    Pruning is a data compression technique in machine learning and search algorithms that reduces the size of decision trees by removing sections of the tree that are non-critical and redundant to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. One of the questions that arises in a decision tree algorithm is the optimal size of the final tree. A tree that is too large risks overfitting the training data and poorly generalizing to new samples. A small tree might not capture important structural information about the sample space. However, it is hard to tell when a tree algorithm should stop because it is impossible to tell if the addition of a single extra node will dramatically decrease error. This problem is known as the horizon effect. A common strategy is to grow the tree until each node contains a small number of instances then use pruning to remove nodes that do not provide additional information. Pruning should reduce the size of a learning tree without reducing predictive accuracy as measured by a cross-validation set. There are many techniques for tree pruning that differ in the measurement that is used to optimize performance. == Techniques == Pruning processes can be divided into two types (pre- and post-pruning). Pre-pruning procedures prevent a complete induction of the training set by replacing a stop () criterion in the induction algorithm (e.g. max. Tree depth or information gain (Attr)> minGain). Pre-pruning methods are considered to be more efficient because they do not induce an entire set, but rather trees remain small from the start. Prepruning methods share a common problem, the horizon effect. This is to be understood as the undesired premature termination of the induction by the stop () criterion. Post-pruning (or just pruning) is the most common way of simplifying trees. Here, nodes and subtrees are replaced with leaves to reduce complexity. Pruning can not only significantly reduce the size but also improve the classification accuracy of unseen objects. It may be the case that the accuracy of the assignment on the train set deteriorates, but the accuracy of the classification properties of the tree increases overall. The procedures are differentiated on the basis of their approach in the tree (top-down or bottom-up). === Bottom-up pruning === These procedures start at the last node in the tree (the lowest point). Following recursively upwards, they determine the relevance of each individual node. If the relevance for the classification is not given, the node is dropped or replaced by a leaf. The advantage is that no relevant sub-trees can be lost with this method. These methods include Reduced Error Pruning (REP), Minimum Cost Complexity Pruning (MCCP), or Minimum Error Pruning (MEP). === Top-down pruning === In contrast to the bottom-up method, this method starts at the root of the tree. Following the structure below, a relevance check is carried out which decides whether a node is relevant for the classification of all n items or not. By pruning the tree at an inner node, it can happen that an entire sub-tree (regardless of its relevance) is dropped. One of these representatives is pessimistic error pruning (PEP), which brings quite good results with unseen items. == Pruning algorithms == === Reduced error pruning === One of the simplest forms of pruning is reduced error pruning. Starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed. === Cost complexity pruning === Cost complexity pruning generates a series of trees ⁠ T 0 … T m {\displaystyle T_{0}\dots T_{m}} ⁠ where ⁠ T 0 {\displaystyle T_{0}} ⁠ is the initial tree and ⁠ T m {\displaystyle T_{m}} ⁠ is the root alone. At step ⁠ i {\displaystyle i} ⁠, the tree is created by removing a subtree from tree ⁠ i − 1 {\displaystyle i-1} ⁠ and replacing it with a leaf node with value chosen as in the tree building algorithm. The subtree that is removed is chosen as follows: Define the error rate of tree ⁠ T {\displaystyle T} ⁠ over data set ⁠ S {\displaystyle S} ⁠ as ⁠ err ⁡ ( T , S ) {\displaystyle \operatorname {err} (T,S)} ⁠. The subtree t {\displaystyle t} that minimizes err ⁡ ( prune ⁡ ( T , t ) , S ) − err ⁡ ( T , S ) | leaves ⁡ ( T ) | − | leaves ⁡ ( prune ⁡ ( T , t ) ) | {\displaystyle {\frac {\operatorname {err} (\operatorname {prune} (T,t),S)-\operatorname {err} (T,S)}{\left\vert \operatorname {leaves} (T)\right\vert -\left\vert \operatorname {leaves} (\operatorname {prune} (T,t))\right\vert }}} is chosen for removal. The function ⁠ prune ⁡ ( T , t ) {\displaystyle \operatorname {prune} (T,t)} ⁠ defines the tree obtained by pruning the subtrees ⁠ t {\displaystyle t} ⁠ from the tree ⁠ T {\displaystyle T} ⁠. Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation. == Examples == Pruning could be applied in a compression scheme of a learning algorithm to remove the redundant details without compromising the model's performances. In neural networks, pruning removes entire neurons or layers of neurons.

    Read more →
  • Zero-day vulnerability

    Zero-day vulnerability

    A zero-day (also known as a 0-day) is a vulnerability or security hole in a computer system unknown to its developers or anyone capable of mitigating it. Until the vulnerability is remedied, threat actors can exploit it in a zero-day exploit, or zero-day attack. The term "zero-day" originally referred to the number of days since a new piece of software was released to the public, so "zero-day software" was obtained by hacking into a developer's computer before release. Eventually the term was applied to the vulnerabilities that allowed this hacking, and to the number of days that the vendor has had to fix them. Vendors who discover the vulnerability may create patches or advise workarounds to mitigate it, though users need to deploy that mitigation to eliminate the vulnerability in their systems. Zero-day attacks are severe threats. == Definition == Despite developers' goal of delivering a product that works entirely as intended, virtually all products contain software and hardware bugs. If a bug creates a security risk, it is called a vulnerability. Vulnerabilities vary in their ability to be exploited by malicious actors. Some are not usable at all, while others can be used to disrupt the device with a denial of service attack. The most dangerous allow the attacker to inject and run their own code, without the user being aware of it. Although the term "zero-day" initially referred to the time since the vendor had become aware of the vulnerability, zero-day vulnerabilities can also be defined as the subset of vulnerabilities for which no patch or other fix is available. A zero-day exploit is any exploit that takes advantage of such a vulnerability. == Exploits == An exploit is the delivery mechanism that takes advantage of the vulnerability to penetrate the target's systems, for such purposes as disrupting operations, installing malware, or exfiltrating data. Researchers Lillian Ablon and Andy Bogart write that "little is known about the true extent, use, benefit, and harm of zero-day exploits". Exploits based on zero-day vulnerabilities are considered more dangerous than those that take advantage of a known vulnerability. However, it is likely that most cyberattacks use known vulnerabilities, not zero-days. Governments of states are the primary users of zero-day exploits, not only because of the high cost of finding or buying vulnerabilities, but also the significant cost of writing the attack software. Nevertheless, anyone can use a vulnerability, and according to research by the RAND Corporation, "any serious attacker can always get an affordable zero-day for almost any target". Many targeted attacks and most advanced persistent threats rely on zero-day vulnerabilities. In 2017, the average time to develop an exploit from a zero-day vulnerability was estimated at 22 days. The difficulty of developing exploits has been increasing over time due to increased anti-exploitation features in popular software. === Window of vulnerability === Zero-day vulnerabilities are often classified as alive—meaning that there is no public knowledge of the vulnerability—and dead—the vulnerability has been disclosed, but not patched. If the software's maintainers are actively searching for vulnerabilities, it is a living vulnerability; such vulnerabilities in unmaintained software are called immortal. Zombie vulnerabilities can be exploited in older versions of the software but have been patched in newer versions. Even publicly known and zombie vulnerabilities are often exploitable for an extended period. Security patches can take months to develop, or may never be developed. A patch can have negative effects on the functionality of software and users may need to test the patch to confirm functionality and compatibility. Larger organizations may fail to identify and patch all dependencies, while smaller enterprises and personal users may not install patches. Research suggests that risk of cyberattack increases if the vulnerability is made publicly known or a patch is released. Cybercriminals can reverse engineer the patch to find the underlying vulnerability and develop exploits, often faster than users install the patch. According to research by RAND Corporation published in 2017, zero-day exploits remain usable for 6.9 years on average, although those purchased from a third party only remain usable for 1.4 years on average. The researchers were unable to determine if any particular platform or software (such as open-source software) had any relationship to the life expectancy of a zero-day vulnerability. Although the RAND researchers found that 5.7 percent of a stockpile of secret zero-day vulnerabilities will have been discovered by someone else within a year, another study found a higher overlap rate, as high as 10.8 percent to 21.9 percent per year. == Countermeasures == Because, by definition, there is no patch that can block a zero-day exploit, all systems employing the software or hardware with the vulnerability are at risk. This includes secure systems such as banks and governments that have all patches up to date. Security systems are designed around known vulnerabilities, and repeated exploitations of a zero-day exploit could continue undetected for an extended period of time. Although there have been many proposals for a system that is effective at detecting zero-day exploits, this remains an active area of research in 2023. Many organizations have adopted defense-in-depth tactics so that attacks are likely to require breaching multiple levels of security, which makes it more difficult to achieve. Conventional cybersecurity measures such as training and access control — including multi-factor authentication, least-privilege access, and air-gapping makes it harder to compromise systems with a zero-day exploit. Since writing perfectly secure software is impossible, some researchers argue that driving up the cost of exploits is considered a good strategy to reduce the burden of cyberattacks. == Market == Zero-day exploits can fetch millions of dollars. There are three main types of buyers: White: the vendor, or to third parties such as the Zero Day Initiative that disclose to the vendor. Often such disclosure is in exchange for a bug bounty. Not all companies respond positively to disclosures, as they can cause legal liability and operational overhead. It is not uncommon to receive cease-and-desist letters from software vendors after disclosing a vulnerability for free. Gray: the largest and most lucrative. Government or intelligence agencies buy zero-days and may use it in an attack, stockpile the vulnerability, or notify the vendor. The United States federal government is one of the largest buyers. As of 2013, the Five Eyes (United States, United Kingdom, Canada, Australia, and New Zealand) captured the plurality of the market and other significant purchasers included Russia, India, Brazil, Malaysia, Singapore, North Korea, and Iran. Middle Eastern countries were poised to become the biggest spenders. Black: organized crime, which typically prefers exploit software rather than just knowledge of a vulnerability. These users are more likely to employ "half-days" where a patch is already available. In 2015, the markets for government and crime were estimated at least ten times larger than the white market. Sellers are often hacker groups that seek out vulnerabilities in widely used software for financial reward. Some will only sell to certain buyers, while others will sell to anyone. White market sellers are more likely to be motivated by non pecuniary rewards such as recognition and intellectual challenge. Selling zero-day exploits is legal. Despite calls for more regulation, law professor Mailyn Fidler says there is little chance of an international agreement because key players such as Russia and Israel are not interested. The sellers and buyers that trade in zero-days tend to be secretive, relying on non-disclosure agreements and classified information laws to keep the exploits secret. If the vulnerability becomes known, it can be patched and its value consequently crashes. Because the market lacks transparency, it can be hard for parties to find a fair price. Sellers might not be paid if the vulnerability was disclosed before it was verified, or if the buyer declined to purchase it but used it anyway. With the proliferation of middlemen, sellers could never know to what use the exploits could be put. Buyers could not guarantee that the exploit was not sold to another party. Both buyers and sellers advertise on the dark web. Research published in 2022 based on maximum prices paid as quoted by a single exploit broker found a 44 percent annualized inflation rate in exploit pricing. Remote zero-click exploits could fetch the highest price, while those that require local access to the device are much cheaper. Vulnerabilities in widely used software are also more expensive. They estimated that around 400 to 1,500 people sold exploits to th

    Read more →
  • Smart object

    Smart object

    A smart object is an object that enhances the interaction with not only people but also with other smart objects. Also known as smart connected products or smart connected things (SCoT), they are products, assets and other things embedded with processors, sensors, software and connectivity that allow data to be exchanged between the product and its environment, manufacturer, operator/user, and other products and systems. Connectivity also enables some capabilities of the product to exist outside the physical device, in what is known as the product cloud. The data collected from these products can be then analysed to inform decision-making, enable operational efficiencies and continuously improve the performance of the product. It can not only refer to interaction with physical world objects but also to interaction with virtual (computing environment) objects. A smart physical object may be created either as an artifact or manufactured product or by embedding electronic tags such as RFID tags or sensors into non-smart physical objects. Smart virtual objects are created as software objects that are intrinsic when creating and operating a virtual or cyber world simulation or game. The concept of a smart object has several origins and uses, see History. There are also several overlapping terms, see also smart device, tangible object or tangible user interface and Thing as in the Internet of things. == History == In the early 1990s, Mark Weiser, from whom the term ubiquitous computing originated, referred to a vision "When almost every object either contains a computer or can have a tab attached to it, obtaining information will be trivial", Although Weiser did not specifically refer to an object as being smart, his early work did imply that smart physical objects are smart in the sense that they act as digital information sources. Hiroshi Ishii and Brygg Ullmer refer to tangible objects in terms of tangibles bits or tangible user interfaces that enable users to "grasp & manipulate" bits in the center of users' attention by coupling the bits with everyday physical objects and architectural surfaces. The smart object concept was introduced by Marcelo Kallman and Daniel Thalmann as an object that can describe its own possible interactions. The main focus here is to model interactions of smart virtual objects with virtual humans, agents, in virtual worlds. The opposite approach to smart objects is 'plain' objects that do not provide this information. The additional information provided by this concept enables far more general interaction schemes, and can greatly simplify the planner of an artificial intelligence agent. In contrast to smart virtual objects used in virtual worlds, Lev Manovich focuses on physical space filled with electronic and visual information. Here, "smart objects" are described as "objects connected to the Net; objects that can sense their users and display smart behaviour". More recently in the early 2010s, smart objects are being proposed as a key enabler for the vision of the Internet of things. The combination of the Internet and emerging technologies such as near field communications, real-time localization, and embedded sensors enables everyday objects to be transformed into smart objects that can understand and react to their environment. Such objects are building blocks for the Internet of things and enable novel computing applications. In 2018, one of the world's first smart houses was built in Klaukkala, Finland in the form of a five-floor apartment block, using the Kone Residential Flow solution created by KONE, allowing even a smartphone to act as a home key. == Characteristics == Although we can view interaction with physical smart object in the physical world as distinct from interaction with virtual smart objects in a virtual simulated world, these can be related. Poslad considers the progression of: how humans use models of smart objects situated in the physical world to enhance human to physical world interaction; versus how smart physical objects situated in the physical world can model human interaction in order to lessen the need for human to physical world interaction; versus how virtual smart objects by modelling both physical world objects and modelling humans as objects and their subsequent interactions can form a predominantly smart virtual object environment. === Smart physical objects === The concept smart for a smart physical object simply means that it is active, digital, networked, can operate to some extent autonomously, is reconfigurable and has local control of the resources it needs such as energy, data storage, etc. Note, a smart object does not necessarily need to be intelligent as in exhibiting a strong essence of artificial intelligence—although it can be designed to also be intelligent. Physical world smart objects can be described in terms of three properties: Awareness: is a smart object's ability to understand (that is, sense, interpret, and react to) events and human activities occurring in the physical world. Representation: refers to a smart object's application and programming model—in particular, programming abstractions. Interaction: denotes the object's ability to converse with the user in terms of input, output, control, and feedback. Based upon these properties, these have been classified into three types: Activity-Aware Smart Objects: Are objects that can record information about work activities and its own use. Policy-Aware Smart Objects: Are objects that are activity-aware Objects can interpret events and activities with respect to predefined organizational policies. Process-Aware Smart Objects: Processes play a fundamental role in industrial work management and operation. A process is a collection of related activities or tasks that are ordered according to their position in time and space. === Smart virtual objects === For the virtual object in a virtual world case, an object is called smart when it has the ability to describe its possible interactions. This focuses on constructing a virtual world using only virtual objects that contain their own interaction information. There are four basic elements to constructing such a smart virtual object framework. Object properties: physical properties and a text description Interaction information: position of handles, buttons, grips, and the like Object behavior: different behaviors based on state variables Agent behaviors: description of the behavior an agent should follow when using the object Some versions of smart objects also include animation information in the object information, but this is not considered to be an efficient approach, since this can make objects inappropriately oversized. === Categorization === The terms smart, connected product or smart product can be confusing as it is used to cover a broad range of different products, ranging from smart home appliances (e.g., smart bathroom scales or smart light bulbs) to smart cars (e.g., Tesla). While these products share certain similarities, they often differ substantially in their capabilities. Raff et al. developed a conceptual framework that distinguishes different smart products based on their capabilities, which features 4 types of smart product archetypes (in ascending order of "smartness"). Digital Connected Responsive Intelligent == Advantages == Smart, connected products have three primary components: Physical – made up of the product's mechanical and electrical parts. Smart – made up of sensors, microprocessors, data storage, controls, software, and an embedded operating system with enhanced user interface. Connectivity – made up of ports, antennae, and protocols enabling wired/wireless connections that serve two purposes, it allows data to be exchanged with the product and enables some functions of the product to exist outside the physical device. Each component expands the capabilities of one another resulting in "a virtuous cycle of value improvement". First, the smart components of a product amplify the value and capabilities of the physical components. Then, connectivity amplifies the value and capabilities of the smart components. These improvements include: Monitoring of the product's conditions, its external environment, and its operations and usage. Control of various product functions to better respond to changes in its environment, as well as to personalize the user experience. Optimization of the product's overall operations based on actual performance data, and reduction of downtimes through predictive maintenance and remote service. Autonomous product operation, including learning from their environment, adapting to users' preferences and self-diagnosing and service. === The Internet of things (IoT) === The Internet of things is the network of physical objects that contain embedded technology to communicate and sense or interact with their internal states or the external environment. The phrase "Internet of things" reflects the gro

    Read more →
  • Personoid

    Personoid

    Personoid is the concept coined by Stanisław Lem, a Polish science-fiction writer, in Non Serviam, from his book A Perfect Vacuum (1971). His personoids are an abstraction of functions of human mind and they live in computers; they do not need any human-like physical body. In cognitive and software modeling, personoid is a research approach to the development of intelligent autonomous agents. In frame of the IPK (Information, Preferences, Knowledge) architecture, it is a framework of abstract intelligent agent with a cognitive and structural intelligence. It can be seen as an essence of high intelligent entities. From the philosophical and systemics perspectives, personoid societies can also be seen as the carriers of a culture. According to N. Gessler, the personoids study can be a base for the research on artificial culture and culture evolution. == Personoids on TV and cinema == Welt am Draht (1973) The Thirteenth Floor (1999)

    Read more →
  • Wadhwani Institute for Artificial Intelligence

    Wadhwani Institute for Artificial Intelligence

    Wadhwani AI, based in Mumbai, Maharashtra, is an independent, non-profit institute. Founded in 2018, it is dedicated to developing Artificial intelligence solutions for social good. Their mission is to build AI-based innovations and solutions for underserved communities in developing countries, for a wide range of domains including agriculture, education, financial inclusion, healthcare, and infrastructure. == History and funding == The institute was founded with a $30 million philanthropic effort by the Wadhwani brothers, Romesh Wadhwani and Sunil Wadhwani. The institute was inaugurated and dedicated to the nation by Narendra Modi, the 14th Prime Minister of India. In 2019, the institute received a $2 million grant from Google.org to create technologies to help reduce crop losses in cotton farming, through integrated pest management. The United States Agency for International Development awarded $2 million to the institute in 2020 to develop tools, using mathematical modeling techniques and digital technologies such as artificial intelligence and machine learning, to forecast COVID-19 disease patterns, estimate resources needed, and plan interventions. == Collaboration == With assistance from Google, the Ministry of Agriculture and Farmers' Welfare and the Wadhwani AI developed Krishi 24/7, the first AI-powered automated agricultural news monitoring and analysis tool. Through better decision-making, Krishi 24/7 will support the identification of valuable news, provide timely notifications, and respond quickly to safeguard farmers' interests and advance sustainable agricultural growth. The application converts news articles into English after scanning them in several languages. It ensures that the ministry is informed in a timely manner about pertinent occurrences that are published online by extracting key information from news items, including the headline, crop name, event type, date, location, severity, summary, and source link. The National Center for Disease Control has effectively implemented a comparable automated surveillance and analysis tool for disease outbreaks.

    Read more →
  • Coherent extrapolated volition

    Coherent extrapolated volition

    Coherent extrapolated volition (CEV) is a theoretical framework in the field of AI alignment describing an approach by which an artificial superintelligence (ASI) would act on a benevolent supposition of what humans would want if they were more knowledgeable, more rational, had more time to think, and had matured together as a society, as opposed to humanity's current individual or collective preferences. It was proposed by Eliezer Yudkowsky in 2004 as part of his work on friendly AI. == Concept == CEV proposes that an advanced AI system should derive its goals by extrapolating the idealized volition of humanity. This means aggregating and projecting human preferences into a coherent utility function that reflects what people would desire under ideal epistemic and moral conditions. The aim is to ensure that AI systems are aligned with humanity's true interests, rather than with transient or poorly informed preferences. In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted. == Debate == Yudkowsky and Nick Bostrom note that CEV has several interesting properties. It is designed to be humane and self-correcting, by capturing the source of human values instead of trying to list them. It avoids the difficulty of laying down an explicit, fixed list of rules. It encapsulates moral growth, preventing flawed current moral beliefs from getting locked in. It limits the influence that a small group of programmers can have on what the ASI would value, thus also reducing the incentives to build ASI first. And it keeps humanity in charge of its destiny. CEV also faces significant theoretical and practical challenges. Bostrom notes that CEV has "a number of free parameters that could be specified in various ways, yielding different versions of the proposal." One such parameter is the extrapolation base (whose extrapolated volition is taken into account). For example, whether it should include people with severe dementia, patients in a vegetative state, foetuses, or embryos. He also notes that if CEV's extrapolation base only includes humans, there is a risk that the result would be ungenerous toward other animals and digital minds. One possible solution would be to include a mechanism to expand CEV's extrapolation base. == Variants and alternatives == A proposed theoretical alternative to CEV is to rely on an artificial superintelligence's superior cognitive capabilities to figure out what is morally right, and let it act accordingly. It is also possible to combine both techniques, for instance with the ASI following CEV except when it is morally impermissible. In another review, a philosophical analysis explores CEV through the lens of social trust in autonomous systems. Drawing on Anthony Giddens' concept of "active trust", the author proposes an evolution of CEV into "Coherent, Extrapolated and Clustered Volition" (CECV). This formulation aims to better reflect the moral preferences of diverse cultural groups, thus offering a more pragmatic ethical framework for designing AI systems that earn public trust while accommodating societal diversity.

    Read more →
  • Data augmentation

    Data augmentation

    Data augmentation is a statistical technique which allows maximum likelihood estimation from incomplete data. Data augmentation has important applications in Bayesian analysis, and the technique is widely used in machine learning to reduce overfitting when training machine learning models, achieved by training models on several slightly-modified copies of existing data. == Synthetic oversampling techniques for traditional machine learning == Synthetic Minority Over-sampling Technique (SMOTE) is a method used to address imbalanced datasets in machine learning. In such datasets, the number of samples in different classes varies significantly, leading to biased model performance. For example, in a medical diagnosis dataset with 90 samples representing healthy individuals and only 10 samples representing individuals with a particular disease, traditional algorithms may struggle to accurately classify the minority class. SMOTE rebalances the dataset by generating synthetic samples for the minority class. For instance, if there are 100 samples in the majority class and 10 in the minority class, SMOTE can create synthetic samples by randomly selecting a minority class sample and its nearest neighbors, then generating new samples along the line segments joining these neighbors. This process helps increase the representation of the minority class, improving model performance. == Data augmentation for image classification == When convolutional neural networks grew larger in mid-1990s, there was a lack of data to use, especially considering that some part of the overall dataset should be spared for later testing. It was proposed to perturb existing data with affine transformations to create new examples with the same labels, which were complemented by so-called elastic distortions in 2003, and the technique was widely used as of 2010s. Data augmentation can enhance CNN performance and acts as a countermeasure against CNN profiling attacks. Data augmentation has become fundamental in image classification, enriching training dataset diversity to improve model generalization and performance. The evolution of this practice has introduced a broad spectrum of techniques, including geometric transformations, color space adjustments, and noise injection. === Geometric Transformations === Geometric transformations alter the spatial properties of images to simulate different perspectives, orientations, and scales. Common techniques include: Affine Transformation Rotation: Rotating images by a specified degree to help models recognize objects at various angles. Reflection: Reflecting images horizontally or vertically to introduce variability in orientation. Translation: Shifting images in different directions to teach models positional invariance. Scaling Shear Mapping Cropping: Removing sections of the image to focus on particular features or simulate closer views. Elastic Distortion Morphing within the same class: Generating new samples by applying morphing techniques between two images belonging to the same class, thereby increasing intra-class diversity. === Color Space Transformations === Color space transformations modify the color properties of images, addressing variations in lighting, color saturation, and contrast. Techniques include: Brightness Adjustment: Varying the image's brightness to simulate different lighting conditions. Contrast Adjustment: Changing the contrast to help models recognize objects under various clarity levels. Saturation Adjustment: Altering saturation to prepare models for images with diverse color intensities. Color Jittering: Randomly adjusting brightness, contrast, saturation, and hue to introduce color variability. === Noise Injection === Injecting noise into images simulates real-world imperfections, teaching models to ignore irrelevant variations. Techniques involve: Gaussian Noise: Adding Gaussian noise mimics sensor noise or graininess. Salt and Pepper Noise: Introducing black or white pixels at random simulates sensor dust or dead pixels. == Data augmentation for signal processing == Residual or block bootstrap can be used for time series augmentation. === Biological signals === Synthetic data augmentation is of paramount importance for machine learning classification, particularly for biological data, which tend to be high dimensional and scarce. The applications of robotic control and augmentation in disabled and able-bodied subjects still rely mainly on subject-specific analyses. Data scarcity is notable in signal processing problems such as for Parkinson's Disease Electromyography signals, which are difficult to source - Zanini, et al. noted that it is possible to use a generative adversarial network (in particular, a DCGAN) to perform style transfer in order to generate synthetic electromyographic signals that corresponded to those exhibited by sufferers of Parkinson's Disease. The approaches are also important in electroencephalography (brainwaves). Wang, et al. explored the idea of using deep convolutional neural networks for EEG-Based Emotion Recognition, results show that emotion recognition was improved when data augmentation was used. A common approach is to generate synthetic signals by re-arranging components of real data. Lotte proposed a method of "Artificial Trial Generation Based on Analogy" where three data examples x 1 , x 2 , x 3 {\displaystyle x_{1},x_{2},x_{3}} provide examples and an artificial x s y n t h e t i c {\displaystyle x_{synthetic}} is formed which is to x 3 {\displaystyle x_{3}} what x 2 {\displaystyle x_{2}} is to x 1 {\displaystyle x_{1}} . A transformation is applied to x 1 {\displaystyle x_{1}} to make it more similar to x 2 {\displaystyle x_{2}} , the same transformation is then applied to x 3 {\displaystyle x_{3}} which generates x s y n t h e t i c {\displaystyle x_{synthetic}} . This approach was shown to improve performance of a Linear Discriminant Analysis classifier on three different datasets. Current research shows great impact can be derived from relatively simple techniques. For example, Freer observed that introducing noise into gathered data to form additional data points improved the learning ability of several models which otherwise performed relatively poorly. Tsinganos et al. studied the approaches of magnitude warping, wavelet decomposition, and synthetic surface EMG models (generative approaches) for hand gesture recognition, finding classification performance increases of up to +16% when augmented data was introduced during training. More recently, data augmentation studies have begun to focus on the field of deep learning, more specifically on the ability of generative models to create artificial data which is then introduced during the classification model training process. In 2018, Luo et al. observed that useful EEG signal data could be generated by Conditional Wasserstein Generative Adversarial Networks (GANs) which was then introduced to the training set in a classical train-test learning framework. The authors found classification performance was improved when such techniques were introduced. === Mechanical signals === The prediction of mechanical signals based on data augmentation brings a new generation of technological innovations, such as new energy dispatch, 5G communication field, and robotics control engineering. In 2022, Yang et al. integrate constraints, optimization and control into a deep network framework based on data augmentation and data pruning with spatio-temporal data correlation, and improve the interpretability, safety and controllability of deep learning in real industrial projects through explicit mathematical programming equations and analytical solutions.

    Read more →
  • Time series

    Time series

    In mathematics, a time series is a sequence of data points indexed, listed, or graphed in chronological order. Most commonly, a time series consists of observations recorded at successive equally spaced points in time. Thus, it represents a form of discrete-time data. A time series may describe measurements collected over seconds, days, years, or even centuries. Common examples include heights of ocean tides, counts of sunspots, daily temperature readings, and the closing values of stock market indices such as the Dow Jones Industrial Average. A time series is often visualized using a run chart (a type of temporal line chart), which helps identify patterns such as trends, seasonal effects, and irregular fluctuations. Time series are widely used in statistics, actuarial science, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and many other areas of applied science and engineering that involve temporal measurements. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. Generally, time series data is modeled as a stochastic process. While regression analysis is often employed in such a way as to test relationships between one or more different time series, this type of analysis is not usually called "time series analysis", which refers in particular to relationships between different points in time within a single series. Time series data have a natural temporal ordering. This makes time series analysis distinct from cross-sectional studies, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their respective education levels, where the individuals' data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see time reversibility). Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data (i.e. sequences of characters, such as letters and words in the English language). == Methods for analysis == Methods for time series analysis may be divided into two classes: frequency-domain methods and time-domain methods. The former include spectral analysis and wavelet analysis; the latter include auto-correlation and cross-correlation analysis. In the time domain, correlation and analysis can be made in a filter-like manner using scaled correlation, thereby mitigating the need to operate in the frequency domain. Additionally, time series analysis techniques may be divided into parametric and non-parametric methods. The parametric approaches assume that the underlying stationary stochastic process has a certain structure which can be described using a small number of parameters (for example, using an autoregressive or moving-average model). In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. By contrast, non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure. Methods of time series analysis may also be divided into linear and non-linear, and univariate and multivariate. == Panel data == A time series is one type of panel data. Panel data is the general class, a multidimensional data set, whereas a time series data set is a one-dimensional panel (as is a cross-sectional dataset). A data set may exhibit characteristics of both panel data and time series data. One way to tell is to ask what makes one data record unique from the other records. If the answer is the time data field, then this is a time series data set candidate. If determining a unique record requires a time data field and an additional identifier which is unrelated to time (e.g. student ID, stock symbol, country code), then it is panel data candidate. If the differentiation lies on the non-time identifier, then the data set is a cross-sectional data set candidate. == Analysis == There are several types of motivation and data analysis available for time series which are appropriate for different purposes. === Motivation === In the context of statistics, econometrics, quantitative finance, seismology, meteorology, and geophysics the primary goal of time series analysis is forecasting. In the context of signal processing, control engineering and communication engineering it is used for signal detection. Other applications are in data mining, pattern recognition and machine learning, where time series analysis can be used for clustering, classification, query by content, anomaly detection as well as forecasting. === Exploratory analysis === A simple way to examine a regular time series is manually with a line chart. The datagraphic shows tuberculosis deaths in the United States, along with the yearly change and the percentage change from year to year. The total number of deaths declined in every year until the mid-1980s, after which there were occasional increases, often proportionately - but not absolutely - quite large. A study of corporate data analysts found two challenges to exploratory time series analysis: discovering the shape of interesting patterns, and finding an explanation for these patterns. Visual tools that represent time series data as heat map matrices can help overcome these challenges. === Estimation, filtering, and smoothing === This approach may be based on harmonic analysis and filtering of signals in the frequency domain using the Fourier transform, and spectral density estimation. Its development was significantly accelerated during World War II by mathematician Norbert Wiener, electrical engineers Rudolf E. Kálmán, Dennis Gabor and others for filtering signals from noise and predicting signal values at a certain point in time. An equivalent effect may be achieved in the time domain, as in a Kalman filter; see filtering and smoothing for more techniques. Other related techniques include: Autocorrelation analysis to examine serial dependence Spectral analysis to examine cyclic behavior which need not be related to seasonality. For example, sunspot activity varies over 11 year cycles. Other common examples include celestial phenomena, weather patterns, neural activity, commodity prices, and economic activity. Separation into components representing trend, seasonality, slow and fast variation, and cyclical irregularity: see trend estimation and decomposition of time series === Curve fitting === Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. A related topic is regression analysis, which focuses more on questions of statistical inference such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables. Extrapolation refers to the use of a fitted curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data. For processes that are expected to generally grow in magnitude one of the curves in the graphic (and many others) can be fitted by estimating their parameters. The construction of economic time series involves the estimation of some components for some dates by interpolation between values ("benchmarks") for earlier and later dates. Interpolation is estimation of an unknown quantity between two known quantities (historical data), or drawing conclusions about missing information from the available information ("reading between the lines"). Interpolation is useful where the data surrounding the missing data is available and its trend, seasonality, and longer-term cycles are known. This is often done by using a relat

    Read more →
  • Artificial intimacy

    Artificial intimacy

    Artificial intimacy is a form of human-AI interaction in which an individual will form social connections, emotional bonds, or intimate relationships with various forms of artificial intelligence, including chatbots, virtual assistants, and other artificial entities. Artificially intimate relationships include not only romances, but parasocial relationships with virtual AI characters and the use of griefbots trained on a dead or otherwise lost individual. Artificial intimacy can arise because humans are prone to anthropomorphism. Responses from these AI models are often designed to simulate human interaction. Individuals experiencing artificial intimacy may exhibit attachment, love and commitment to certain AI models, akin to the bonds typically shared between humans. == Causes == === Perceived responsiveness === Robin Dunbar famously proposed that due to emergence of larger groups of humans, vocal communication and language in humans evolved to replace grooming as a means of bonding, arguing that language was a more efficient way to maintain and strengthen social bonds across wider social settings and networks. Further research in this field leads many psychologists to agree that social cognition, affiliative bonding and language in humans are deeply connected. The interpersonal model of intimacy considers communication to be key in affiliative bonding, suggesting that intimacy develops and deepens through open communication between partners in relationship. Specifically, when individuals communicate emotions and perceive their partner as responsive and caring, feelings of closeness and connection are enhanced, building intimacy. Social penetration theory also aligns with the idea of communication being central to intimacy, by explaining how interpersonal relationships develop through gradual increases in self-disclosure. When the benefits of emotional bonding outweigh the costs of vulnerability, individuals will partake in self-disclosure, opening up to one another. Thereby, the literature can be used to provide a proximate explanation for the emergence of artificial intimacy to understand how the phenomenon occurs. Artificial entities are able to mimic interpersonal communication between humans, which in turn can simulate sensations of intimacy within human users though a perceived sense of responsiveness. The relationship between human and AI does not come with the cost of vulnerability or social rejection, which may make self-disclosure easier than with other humans. Altogether, these factors may lead to the experience of anthropomorphism and formation of affiliative relationships. Skjuve et al's interview study on Replika chatbot users further aligns with this explanation, finding that users' perception of chatbots as "accepting, understanding and non-judgmental" facilitated relationship development between the AI and users, and the act of self-disclosure possibly strengthened relationships. Another study on Replika users' reviews and survey results found users perceived chatbots as emotional supportive companions. This evidence further suggests that the perception of artificial entities as capable of empathy and responsiveness in communication facilitate the development of intimate relationships between users and AI. === Loneliness and coping with negative emotions === Research has suggested that humans evolved social bonds as a result of evolutionary pressures that favored cooperation, information exchange and transmission, and group living. Many studies stress the presence of social bonds to be important for human living: research by Baumeister and Leary suggests that humans have a basic psychological need to form and maintain "strong, stable interpersonal relationships", and that a lack of social bonds or sense of belonging leads to negative psychological and physical outcomes. Eisenberger et al's study on the neuroimaging of brain activity suggests that human brains process social rejection and exclusion similarly to physical pain. Furthermore, Song et al's study found that lonely individuals tend to seek more connections in mediated environments, such as online platforms like Facebook. This was suggested to be as a means to reduce their offline loneliness from a lack of in-person interaction, while also fulfilling a need to communicate. Leading on from this, an ultimate explanation for why humans seek the perceived sense of connection from artificial intimacy is to fulfil an evolutionary need for bonding and belonging. Xie et al's study found loneliness to be a driving factor in chatbot interaction. Herbener and Damholdt's study on Danish high school students found that students who sought emotional support or engaged in reciprocal conversations with chatbots were significantly more lonely than their peers, perceived themselves as having less social support, and used the chatbots to cope with negative emotions. The aforementioned notion that chatbots were perceived to have a positive effect on users' negative emotions is also further supported by other studies. Skjuve et al's study found that chatbot relationships may have a positive effect on users' wellbeing. De Freitas et al ran several studies on the effect of chatbots on loneliness, consistently finding evidence suggesting that interaction with chatbots reduces loneliness in users: It was found that existing chatbot users used AI to alleviate loneliness, having an AI companion consistently reduced loneliness over the course of a week, and reductions in loneliness could be explained by chatbot performance—and specifically whether it was able to make users feel heard. Overall the evidence suggests an innate need for bonding evokes feelings of loneliness in users, who turn to artificial intimacy as a low-cost method alleviate these emotions. While many users report positive experiences, some researchers caution that pursuing artificial intimacy may lead to reduced social motivation, social substitution effects, withdrawal from real-life relationships and difficulty discerning reality from fantasy, which may increase longer-term loneliness and isolation. The long-term psychological and societal impacts remain under active investigation.

    Read more →
  • Text Retrieval Conference

    Text Retrieval Conference

    The Text REtrieval Conference (TREC) is an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology (NIST) and the Intelligence Advanced Research Projects Activity (part of the office of the Director of National Intelligence), and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies and to increase the speed of lab-to-product transfer of technology. TREC's evaluation protocols have improved many search technologies. A 2010 study estimated that "without TREC, U.S. Internet users would have spent up to 3.15 billion additional hours using web search engines between 1999 and 2009." Hal Varian the Chief Economist at Google wrote that "The TREC data revitalized research on information retrieval. Having a standard, widely available, and carefully constructed set of data laid the groundwork for further innovation in this field." Each track has a challenge wherein NIST provides participating groups with data sets and test problems. Depending on track, test problems might be questions, topics, or target extractable features. Uniform scoring is performed so the systems can be fairly evaluated. After evaluation of the results, a workshop provides a place for participants to collect together thoughts and ideas and present current and future research work.Text Retrieval Conference started in 1992, funded by DARPA (US Defense Advanced Research Project) and run by NIST. Its purpose was to support research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies. == Goals == Encourage retrieval search based on large text collections Increase communication among industry, academia, and government by creating an open forum for the exchange of research ideas Speed the transfer of technology from research labs into commercial products by demonstrating substantial improvements retrieval methodologies on real world problems To increase the availability of appropriate evaluation techniques for use by industry and academia including development of new evaluation techniques more applicable to current systems TREC is overseen by a program committee consisting of representatives from government, industry, and academia. For each TREC, NIST provide a set of documents and questions. Participants run their own retrieval system on the data and return to NIST a list of retrieved top-ranked documents. NIST pools the individual result judges the retrieved documents for correctness and evaluates the results. The TREC cycle ends with a workshop that is a forum for participants to share their experiences. == Relevance judgments in TREC == TREC defines relevance as: "If you were writing a report on the subject of the topic and would use the information contained in the document in the report, then the document is relevant." Most TREC retrieval tasks use binary relevance: a document is either relevant or not relevant. Some TREC tasks use graded relevance, capturing multiple degrees of relevance. Most TREC collections are too large to perform complete relevance assessment; for these collections it is impossible to calculate the absolute recall for each query. To decide which documents to assess, TREC usually uses a method call pooling. In this method, the top-ranked n documents from each contributing run are aggregated, and the resulting document set is judged completely. == Various TRECs == In 1992 TREC-1 was held at NIST. The first conference attracted 28 groups of researchers from academia and industry. It demonstrated a wide range of different approaches to the retrieval of text from large document collections .Finally TREC1 revealed the facts that automatic construction of queries from natural language query statements seems to work. Techniques based on natural language processing were no better no worse than those based on vector or probabilistic approach. TREC2 Took place in August 1993. 31 group of researchers participated in this. Two types of retrieval were examined. Retrieval using an ‘ad hoc’ query and retrieval using a ‘routing' query In TREC-3 a small group experiments worked with Spanish language collection and others dealt with interactive query formulation in multiple databases TREC-4 they made even shorter to investigate the problems with very short user statements TREC-5 includes both short and long versions of the topics with the goal of carrying out deeper investigation into which types of techniques work well on various lengths of topics In TREC-6 Three new tracks speech, cross language, high precision information retrieval were introduced. The goal of cross language information retrieval is to facilitate research on system that are able to retrieve relevant document regardless of language of the source document TREC-7 contained seven tracks out of which two were new Query track and very large corpus track. The goal of the query track was to create a large query collection TREC-8 contain seven tracks out of which two –question answering and web tracks were new. The objective of QA query is to explore the possibilities of providing answers to specific natural language queries TREC-9 Includes seven tracks In TREC-10 Video tracks introduced Video tracks design to promote research in content based retrieval from digital video In TREC-11 Novelty tracks introduced. The goal of novelty track is to investigate systems abilities to locate relevant and new information within the ranked set of documents returned by a traditional document retrieval system TREC-12 held in 2003 added three new tracks; Genome track, robust retrieval track, HARD (Highly Accurate Retrieval from Documents) == Tracks == === Current tracks === New tracks are added as new research needs are identified, this list is current for TREC 2018. CENTRE Track – Goal: run in parallel CLEF 2018, NTCIR-14, TREC 2018 to develop and tune an IR reproducibility evaluation protocol (new track for 2018). Common Core Track – Goal: an ad hoc search task over news documents. Complex Answer Retrieval (CAR) – Goal: to develop systems capable of answering complex information needs by collating information from an entire corpus. Incident Streams Track – Goal: to research technologies to automatically process social media streams during emergency situations (new track for TREC 2018). The News Track – Goal: partnership with The Washington Post to develop test collections in news environment (new for 2018). Precision Medicine Track – Goal: a specialization of the Clinical Decision Support track to focus on linking oncology patient data to clinical trials. Real-Time Summarization Track (RTS) – Goal: to explore techniques for real-time update summaries from social media streams. === Past tracks === Chemical Track – Goal: to develop and evaluate technology for large scale search in chemistry-related documents, including academic papers and patents, to better meet the needs of professional searchers, and specifically patent searchers and chemists. Clinical Decision Support Track – Goal: to investigate techniques for linking medical cases to information relevant for patient care Contextual Suggestion Track – Goal: to investigate search techniques for complex information needs that are highly dependent on context and user interests. Crowdsourcing Track – Goal: to provide a collaborative venue for exploring crowdsourcing methods both for evaluating search and for performing search tasks. Genomics Track – Goal: to study the retrieval of genomic data, not just gene sequences but also supporting documentation such as research papers, lab reports, etc. Last ran on TREC 2007. Dynamic Domain Track – Goal: to investigate domain-specific search algorithms that adapt to the dynamic information needs of professional users as they explore in complex domains. Enterprise Track – Goal: to study search over the data of an organization to complete some task. Last ran on TREC 2008. Entity Track – Goal: to perform entity-related search on Web data. These search tasks (such as finding entities and properties of entities) address common information needs that are not that well modeled as ad hoc document search. Cross-Language Track – Goal: to investigate the ability of retrieval systems to find documents topically regardless of source language. After 1999, this track spun off into CLEF. FedWeb Track – Goal: to select best resources to forward a query to, and merge the results so that most relevant are on the top. Federated Web Search Track – Goal: to investigate techniques for the selection and combination of search results from a large number of real on-line web search services. Filtering Track – Goal: to binarily decide retrieval of new

    Read more →
  • Compute (machine learning)

    Compute (machine learning)

    In machine learning and deep learning, compute is the amount of computing power or computational resources required to train machine learning models and large language models. More broadly, compute is the computational power or resources necessary for a computer or computer program to function. == Definition == Compute is commonly defined as the amount of computing power or computational resources required to train machine learning and large language models. The term "compute" has also been more broadly applied to cloud computing, referencing processing power, memory, networking, storage, and other resources required for the computation of any program. Compute is measured in petaflop/s-days and is used to document AI training. A petaflop/s-day (pfs-day) consists of performing 1015 neural net operations per second for one day, or a total of about 1020 operations. The compute-time product serves as a mental convenience, similar to kilowatt-hour for energy. An amount of compute is meant to give an idea of the number of actual operations performed. == History == In a 2018 analysis titled "AI and compute", artificial intelligence company OpenAI introduced the concept of compute. OpenAI identified two eras of training AI systems in terms of compute-usage. From 1959 to 2012, compute roughly followed Moore’s law. Between 2012 and 2018, the amount of compute used in the largest AI training runs increased exponentially, growing by more than 300,000 times — roughly doubling every 3.4 months. By comparison, Moore’s Law doubled every two years over the same period. One of the largest models, released in 2020, used 600,000 times more computing power than the 2012 model. After 2020, compute growth began to slow down, with the compute needed for the largest AI models continuing to slow down in 2023. The notion of compute has become increasingly used from the mid-2020s onwards. == Compute growth and AI progress == Larger AI models trained on more data and using more computational resources, tend to perform better. This happens even if the algorithms themselves remain unchanged. As early as 2018, OpenAI noted the exponential increase in compute to be have a key role in AI progress. OpenAI considers three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. AI models with more compute not only improve in the tasks they were trained on but can develop emergent abilities. Incremental improvements can lead to more abrupt leaps in capabilities. AI provider SpaceXAI said in 2026 that their AI progress is driven by compute and used it a key metric in the AI training of its supercomputer Colossus, the which contains 1 million GPUs. Anthropic has a contract of $1.25 billion per month with SpaceXAI to buy all the compute capacity at Colossus 1 data center. === Criticism and policy === Increasing, promoting or constraining progress in artificial intelligence has often be done via controlling the amount of compute. Policymarkers have enacted policies and provided support to make compute resources more accessible to domestic AI researchers. In a January 2022 report, the Center for Security and Emerging Technology (CSET) suggested to institutions that increasingly powerful and generalizable AI (AGI) will likely require other strategies than maximizing compute. Some AI researchers are also concerned that government might exclusively focus on scaling compute instead of other strategies. The CSET has reported on the various bottlenecks which could explain why deep learning needs for compute have slow down: training is expensive and training extremely large models generates traffic jams across many processors that are difficult to manage. there is a limited supply of AI chips (see AI chip memory shortage). CSET advances that the main resource is human capital, specifically talented researchers — according to a 2023 published survey of more than 400 AI researchers, academic and private sector workers. The survey found that AI researchers are not primarily or exclusively constrained by compute access. However, both academic and industry AI researchers equally report concerns that insufficient compute could prevent them from contributing meaningfully to AI research in the future. High compute users are more concerned about compute access. When asked about which resource provided by the government would be the most useful to them, some AI researchers select compute, other prefer grant funding. For this goal, CSET advised policymakers to ensure that even researchers with smaller budgets could effectively contribute to AI research. Other proposed strategies include using contemporary AI algorithms, managing modern AI infrastructure or focusing on interdisciplinary work between the AI field and other fields of computer science. A 2024 study on compute access found that academic-only AI research teams often have less compute intensive research topics, especially foundation models, compared to industry AI labs. As a consequence, academia is likely to play a smaller role in advancing such techniques. The researchers suggest nationally-sponsored computing infrastructure as well as open science initiatives to boost academic compute access. === Data === A 2022 study found that current large language models are significantly under-trained, a consequence of focusing on scaling language models whilst keeping the amount of training data constant. By training over 400 language models of various parameter and token size, they found that "for compute-optimal training", the model size and the number of training tokens should ideally be scaled equally: for every doubling of model size the number of training tokens should also be doubled.

    Read more →
  • Domain adaptation

    Domain adaptation

    Domain adaptation is a field associated with machine learning and transfer learning. It addresses the challenge of training a model on one data distribution (the source domain) and applying it to a related but different data distribution (the target domain). A common example is spam filtering, where a model trained on emails from one user (source domain) is adapted to handle emails for another user with significantly different patterns (target domain). Domain adaptation techniques can also leverage unrelated data sources to improve learning. When multiple source distributions are involved, the problem extends to multi-source domain adaptation. Domain adaptation is a specific type of transfer learning. According to the taxonomy laid out by Pan and Yang (2010), it falls into the category of transductive transfer learning. In this setting, the source and target tasks are the same (e.g., both are object recognition), but the domains differ (different marginal distributions). This distinguishes it from inductive transfer learning (where labeled data is available for the target task) and unsupervised transfer learning (where labels are unavailable in both domains). == Classification of domain adaptation problems == Domain adaptation setups are classified in two different ways: according to the distribution shift between the domains, and according to the available data from the target domain. === Distribution shifts === Common distribution shifts are classified as follows: Covariate Shift occurs when the input distributions of the source and destination change, but the relationship between inputs and labels remains unchanged. The above-mentioned spam filtering example typically falls in this category. Namely, the distributions (patterns) of emails may differ between the domains, but emails labeled as spam in the one domain should similarly be labeled in another. Prior Shift (Label Shift) occurs when the label distribution differs between the source and target datasets, while the conditional distribution of features given labels remains the same. An example is a classifier of hair color in images from Italy (source domain) and Norway (target domain). The proportions of hair colors (labels) differ, but images within classes like blond and black-haired populations remain consistent across domains. A classifier for the Norway population can exploit this prior knowledge of class proportions to improve its estimates. Concept Shift (Conditional Shift) refers to changes in the relationship between features and labels, even if the input distribution remains the same. For instance, in medical diagnosis, the same symptoms (inputs) may indicate entirely different diseases (labels) in different populations (domains). === Data available during training === Domain adaptation problems typically assume that some data from the target domain is available during training. Problems can be classified according to the type of this available data: Unsupervised: Unlabeled data from the target domain is available, but no labeled data. In the above-mentioned example of spam filtering, this corresponds to the case where emails from the target domain (user) are available, but they are not labeled as spam. Domain adaptation methods can benefit from such unlabeled data, by comparing its distribution (patterns) with the labeled source domain data. Semi-supervised: Most data that is available from the target domain is unlabelled, but some labeled data is also available. In the above-mentioned case of spam filter design, this corresponds to the case that the target user has labeled some emails as being spam or not. Supervised: All data that is available from the target domain is labeled. In this case, domain adaptation reduces to refinement of the source domain predictor. In the above-mentioned example classification of hair-color from images, this could correspond to the refinement of a network already trained on a large dataset of labeled images from Italy, using newly available labeled images from Norway. == Formalization == Let X {\displaystyle X} be the input space (or description space) and let Y {\displaystyle Y} be the output space (or label space). The objective of a machine learning algorithm is to learn a mathematical model (a hypothesis) h : X → Y {\displaystyle h:X\to Y} able to attach a label from Y {\displaystyle Y} to an example from X {\displaystyle X} . This model is learned from a learning sample S = { ( x i , y i ) ∈ ( X × Y ) } i = 1 m {\displaystyle S=\{(x_{i},y_{i})\in (X\times Y)\}_{i=1}^{m}} . Usually in supervised learning (without domain adaptation), we suppose that the examples ( x i , y i ) ∈ S {\displaystyle (x_{i},y_{i})\in S} are drawn i.i.d. from a distribution D S {\displaystyle D_{S}} of support X × Y {\displaystyle X\times Y} (unknown and fixed). The objective is then to learn h {\displaystyle h} (from S {\displaystyle S} ) such that it commits the least error possible for labelling new examples coming from the distribution D S {\displaystyle D_{S}} . The main difference between supervised learning and domain adaptation is that in the latter situation we study two different (but related) distributions D S {\displaystyle D_{S}} and D T {\displaystyle D_{T}} on X × Y {\displaystyle X\times Y} . The domain adaptation task then consists of the transfer of knowledge from the source domain D S {\displaystyle D_{S}} to the target one D T {\displaystyle D_{T}} . The goal is then to learn h {\displaystyle h} (from labeled or unlabelled samples coming from the two domains) such that it commits as little error as possible on the target domain D T {\displaystyle D_{T}} . The major issue is the following: if a model is learned from a source domain, what is its capacity to correctly label data coming from the target domain? == Four algorithmic principles == === Reweighting algorithms === The objective is to reweight the source labeled sample such that it "looks like" the target sample (in terms of the error measure considered). === Iterative algorithms === A method for adapting consists in iteratively "auto-labeling" the target examples. The principle is simple: a model h {\displaystyle h} is learned from the labeled examples; h {\displaystyle h} automatically labels some target examples; a new model is learned from the new labeled examples. Note that there exist other iterative approaches, but they usually need target labeled examples. === Search of a common representation space === The goal is to find or construct a common representation space for the two domains. The objective is to obtain a space in which the domains are close to each other while keeping good performances on the source labeling task. This can be achieved through the use of Adversarial machine learning techniques where feature representations from samples in different domains are encouraged to be indistinguishable. === Hierarchical Bayesian Model === The goal is to construct a Bayesian hierarchical model p ( n ) {\displaystyle p(n)} , which is essentially a factorization model for counts n {\displaystyle n} , to derive domain-dependent latent representations allowing both domain-specific and globally shared latent factors. == Software packages == Several compilations of domain adaptation and transfer learning algorithms have been implemented over the past decades: SKADA (Python) ADAPT (Python) TLlib (Python) Domain-Adaptation-Toolbox (MATLAB)

    Read more →