AI Chatbot Social Network

AI Chatbot Social Network — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cloud management

    Cloud management

    Cloud management refers to the administration and oversight of cloud computing products and services. Public clouds are managed by cloud service providers, which operate the underlying infrastructure such as servers, storage, networking, and data center facilities. Users may also opt to manage their public cloud services with a third-party cloud management tool. Users of public cloud services can generally select from three basic cloud provisioning categories: User self-provisioning: Customers purchase cloud services directly from the provider, typically through a web form or console interface. The customer pays on a per-transaction basis. Advanced provisioning: Customers contract in advance a predetermined amount of resources, which are prepared in advance of service. The customer pays a flat fee or a monthly fee. Dynamic provisioning: The provider allocates resources when the customer needs them, then decommissions them when they are no longer needed. The customer is charged on a pay-per-use basis. Managing a private cloud requires software tools to help create a virtualized pool of compute resources, provide a self-service portal for end users and handle security, resource allocation, tracking and billing. Management tools for private clouds tend to be service driven, as opposed to resource driven, because cloud environments are typically highly virtualized and organized in terms of portable workloads. In hybrid cloud environments, compute, network and storage resources must be managed across multiple domains, so a good management strategy should start by defining what needs to be managed, and where and how to do it. Policies to help govern these domains should include configuration and installation of images, access control, and budgeting and reporting. Access control often includes the use of Single sign-on (SSO), in which a user logs in once and gains access to all systems without being prompted to log in again at each of them. == Characteristics of Cloud Management == Cloud management combines software and technologies in a design for managing cloud environments. Software developers have responded to the management challenges of cloud computing with a variety of cloud management platforms and tools. These tools include native tools offered by public cloud providers as well as third-party tools designed to provide consistent functionality across multiple cloud providers. Administrators must balance the competing requirements of efficient consistency across different cloud platforms with access to different native functionality within individual cloud platforms. The growing acceptance of public cloud and increased multicloud usage is driving the need for consistent cross-platform management. Rapid adoption of cloud services is introducing a new set of management challenges for those technical professionals responsible for managing IT systems and services. Cloud-management platforms and tools should have the ability to provide minimum functionality in the following categories. Functionality can be both natively provided or orchestrated via third-party integration. Provisioning and orchestration: create, modify, and delete resources as well as orchestrate workflows and management of workloads Automation: Enable cloud consumption and deployment of app services via infrastructure-as-code and other DevOps concepts Security and compliance: manage role-based access of cloud services and enforce security configurations Service request: collect and fulfill requests from users to access and deploy cloud resources. Monitoring and logging: collect performance and availability metrics as well as automate incident management and log aggregation Inventory and classification: discover and maintain pre-existing brownfield cloud resources plus monitor and manage changes Cost management and optimization: track and rightsize cloud spend and align capacity and performance to actual demand Migration, backup, and DR: enable data protection, disaster recovery, and data mobility via snapshots and/or data replication Organizations may group these criteria into key use cases including Cloud Brokerage, DevOps Automation, Governance, and Day-2 Life Cycle Operations. Enterprises with large-scale cloud implementations may require more robust cloud management tools which include specific characteristics, such as the ability to manage multiple platforms from a single point of reference, or intelligent analytics to automate processes like application lifecycle management. High-end cloud management tools should also have the ability to handle system failures automatically with capabilities such as self-monitoring, an explicit notification mechanism, and include failover and self-healing capabilities. == Multi-Cloud and Hybrid Cloud Management Challenges == Legacy management infrastructures, which are based on the concept of dedicated system relationships and architecture constructs, are not well suited to cloud environments where instances are continually launched and decommissioned. Instead, the dynamic nature of cloud computing requires monitoring and management tools that are adaptable, extensible and customizable. Cloud computing presents a number of management challenges. Companies using public clouds do not have ownership of the equipment hosting the cloud environment, and because the environment is not contained within their own networks, public cloud customers do not have full visibility or control. Users of public cloud services must also integrate with an architecture defined by the cloud provider, using its specific parameters for working with cloud components. Integration includes tying into the cloud APIs for configuring IP addresses, subnets, firewalls and data service functions for storage. Because control of these functions is based on the cloud provider’s infrastructure and services, public cloud users must integrate with the cloud infrastructure management. Capacity management is a challenge for both public and private cloud environments because end users have the ability to deploy applications using self-service portals. Applications of all sizes may appear in the environment, consume an unpredictable amount of resources, then disappear at any time. A possible solution is profiling the applications impact on computational resources. As result, the performance models allow the prediction of how resource utilization changes according to application patterns. Thus, resources can be dynamically scaled to meet the expected demand. This is critical to cloud providers that need to provision resources quickly to meet a growing demand by their applications. Charge-back—or, pricing resource use on a granular basis—is a challenge for both public and private cloud environments. Charge-back is a challenge for public cloud service providers because they must price their services competitively while still creating profit. Users of public cloud services may find charge-back challenging because it is difficult for IT groups to assess actual resource costs on a granular basis due to overlapping resources within an organization that may be paid for by an individual business unit, such as electrical power. For private cloud operators, charge-back is fairly straightforward, but the challenge lies in guessing how to allocate resources as closely as possible to actual resource usage to achieve the greatest operational efficiency. Exceeding budgets can be a risk. Hybrid cloud environments, which combine public and private cloud services, sometimes with traditional infrastructure elements, present their own set of management challenges. These include security concerns if sensitive data lands on public cloud servers, budget concerns around overuse of storage or bandwidth and proliferation of mismanaged images. Managing the information flow in a hybrid cloud environment is also a significant challenge. On-premises clouds must share information with applications hosted off-premises by public cloud providers, and this information may change constantly. Hybrid cloud environments also typically include a complex mix of policies, permissions and limits that must be managed consistently across both public and private clouds. == Cloud Management Platforms (CMP) == CMPs provide a means for a cloud service customer to manage the deployment and operation of applications and associated datasets across multiple cloud service infrastructures, including both on-premises cloud infrastructure and public cloud service provider infrastructure. In other words, CMPs provide management capabilities for hybrid cloud and multi-cloud environments. A cloud management platform (CMP) provides broad cloud management functionality atop both public cloud provider platforms and private cloud platforms. CMPs manage cloud services and resources that are distributed across multiple cloud platforms. The value of CMPs stands in delivering the maximum level of consistency between platforms without comp

    Read more →
  • Feature (machine learning)

    Feature (machine learning)

    In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a data set. Choosing informative, discriminating, and independent features is crucial to producing effective algorithms for pattern recognition, classification, and regression tasks. Features are usually numeric, but other types such as strings and graphs are used in syntactic pattern recognition, after some pre-processing step such as one-hot encoding. The concept of "features" is related to that of explanatory variables used in statistical techniques such as linear regression. == Feature types == In feature engineering, two types of features are commonly used: numerical and categorical. Numerical features are continuous values that can be measured on a scale. Examples of numerical features include age, height, weight, and income. Numerical features can be used in machine learning algorithms directly. Categorical features are discrete values that can be grouped into categories. Examples of categorical features include gender, color, and zip code. Categorical features typically need to be converted to numerical features before they can be used in machine learning algorithms. This can be done using a variety of techniques, such as one-hot encoding, label encoding, and ordinal encoding. The type of feature that is used in feature engineering depends on the specific machine learning algorithm that is being used. Some machine learning algorithms, such as decision trees, can handle both numerical and categorical features. Other machine learning algorithms, such as linear regression, can only handle numerical features. == Classification == A numeric feature can be conveniently described by a feature vector. One way to achieve binary classification is using a linear predictor function (related to the perceptron) with a feature vector as input. The method consists of calculating the scalar product between the feature vector and a vector of weights, qualifying those observations whose result exceeds a threshold. Algorithms for classification from a feature vector include nearest neighbor classification, neural networks, and statistical techniques such as Bayesian approaches. == Examples == In character recognition, features may include histograms counting the number of black pixels along horizontal and vertical directions, number of internal holes, stroke detection and many others. In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches, logarithmic Mel-scale spectral vectors and Mel-frequency cepstral coefficients, which represent the frequency characteristics of audio signals. In spam detection algorithms, features may include the presence or absence of certain email headers, the email structure, the language, the frequency of specific terms, the grammatical correctness of the text. In computer vision, there are a large number of possible features, such as edges and objects. == Feature vectors == In pattern recognition and machine learning, a feature vector is an n-dimensional vector of numerical features that represent some object. Many algorithms in machine learning require a numerical representation of objects, since such representations facilitate processing and statistical analysis. When representing images, the feature values might correspond to the pixels of an image, while when representing texts the features might be the frequencies of occurrence of textual terms. Feature vectors are equivalent to the vectors of explanatory variables used in statistical procedures such as linear regression. Feature vectors are often combined with weights using a dot product in order to construct a linear predictor function that is used to determine a score for making a prediction. The vector space associated with these vectors is often called the feature space. In order to reduce the dimensionality of the feature space, a number of dimensionality reduction techniques can be employed. Higher-level features can be obtained from already available features and added to the feature vector; for example, for the study of diseases the feature 'Age' is useful and is defined as Age = 'Year of death' minus 'Year of birth' . This process is referred to as feature construction. Feature construction is the application of a set of constructive operators to a set of existing features resulting in construction of new features. Examples of such constructive operators include checking for the equality conditions {=, ≠}, the arithmetic operators {+,−,×, /}, the array operators {max(S), min(S), average(S)} as well as other more sophisticated operators, for example count(S, C) that counts the number of features in the feature vector S satisfying some condition C or, for example, distances to other recognition classes generalized by some accepting device. Feature construction has long been considered a powerful tool for increasing both accuracy and understanding of structure, particularly in high-dimensional problems. Applications include studies of disease and emotion recognition from speech. == Selection and extraction == The initial set of raw features can be redundant and large enough that estimation and optimization is made difficult or ineffective. Therefore, a preliminary step in many applications of machine learning and pattern recognition consists of selecting a subset of features, or constructing a new and reduced set of features to facilitate learning, and to improve generalization and interpretability. Extracting or selecting features is a combination of art and science; developing systems to do so is known as feature engineering. It requires the experimentation of multiple possibilities and the combination of automated techniques with the intuition and knowledge of the domain expert. Automating this process is feature learning, where a machine not only uses features for learning, but learns the features itself.

    Read more →
  • Symbol level

    Symbol level

    In knowledge-based systems, agents choose actions based on the principle of rationality to move closer to a desired goal. The agent is able to make decisions based on knowledge it has about the world (see knowledge level). But for the agent to actually change its state, it must use whatever means it has available. This level of description for the agent's behavior is the symbol level. The term was coined by Allen Newell in 1982. For example, in a computer program, the knowledge level consists of the information contained in its data structures that it uses to perform certain actions. The symbol level consists of the program's algorithms, the data structures themselves, and so on.

    Read more →
  • A Logical Calculus of the Ideas Immanent in Nervous Activity

    A Logical Calculus of the Ideas Immanent in Nervous Activity

    "A Logical Calculus of the Ideas Immanent in Nervous Activity" is a 1943 paper written by Warren Sturgis McCulloch and Walter Pitts, published in the journal The Bulletin of Mathematical Biophysics. The paper proposed a mathematical model of the nervous system as a network of simple logical elements, later known as artificial neurons, or McCulloch–Pitts neurons. These neurons receive inputs, perform a weighted sum, and fire an output signal based on a threshold function. By connecting these units in various configurations, McCulloch and Pitts demonstrated that their model could perform all logical functions. It is a seminal work in cognitive science, computational neuroscience, computer science, and artificial intelligence. It was a foundational result in automata theory. John von Neumann cited it as a significant result. == Mathematics == The artificial neuron used in the original paper is slightly different from the modern version. They considered neural networks that operate in discrete steps of time t = 0 , 1 , … {\displaystyle t=0,1,\dots } . The neural network contains a number of neurons. Let the state of a neuron i {\displaystyle i} at time t {\displaystyle t} be N i ( t ) {\displaystyle N_{i}(t)} . The state of a neuron can either be 0 or 1, standing for "not firing" and "firing". Each neuron also has a firing threshold θ {\displaystyle \theta } , such that it fires if the total input exceeds the threshold. Each neuron can connect to any other neuron (including itself) with positive synapses (excitatory) or negative synapses (inhibitory). That is, each neuron can connect to another neuron with a weight w {\displaystyle w} taking an integer value. A peripheral afferent is a neuron with no incoming synapses. We can regard each neural network as a directed graph, with the nodes being the neurons, and the directed edges being the synapses. A neural network has a circle or a circuit if there exists a directed circle in the graph. Let w i j ( t ) {\displaystyle w_{ij}(t)} be the connection weight from neuron j {\displaystyle j} to neuron i {\displaystyle i} at time t {\displaystyle t} , then its next state is N i ( t + 1 ) = H ( ∑ j = 1 n w i j ( t ) N j ( t ) − θ i ( t ) ) , {\displaystyle N_{i}(t+1)=H\left(\sum _{j=1}^{n}w_{ij}(t)N_{j}(t)-\theta _{i}(t)\right),} where H {\displaystyle H} is the Heaviside step function (outputting 1 if the input is greater than or equal to 0, and 0 otherwise). === Symbolic logic === The paper used, as a logical language for describing neural networks, "Language II" from The Logical Syntax of Language by Rudolf Carnap with some notations taken from Principia Mathematica by Alfred North Whitehead and Bertrand Russell. Language II covers substantial parts of classical mathematics, including real analysis and portions of set theory. To describe a neural network with peripheral afferents N 1 , N 2 , … , N p {\displaystyle N_{1},N_{2},\dots ,N_{p}} and non-peripheral afferents N p + 1 , N p + 2 , … , N n {\displaystyle N_{p+1},N_{p+2},\dots ,N_{n}} they considered logical predicate of form P r ( N 1 , N 2 , … , N p , t ) {\displaystyle Pr(N_{1},N_{2},\dots ,N_{p},t)} where P r {\displaystyle Pr} is a first-order logic predicate function (a function that outputs a boolean), N 1 , … , N p {\displaystyle N_{1},\dots ,N_{p}} are predicates that take t {\displaystyle t} as an argument, and t {\displaystyle t} is the only free variable in the predicate. Intuitively speaking, N 1 , … , N p {\displaystyle N_{1},\dots ,N_{p}} specifies the binary input patterns going into the neural network over all time, and P r ( N 1 , N 2 , … , N n , t ) {\displaystyle Pr(N_{1},N_{2},\dots ,N_{n},t)} is a function that takes some binary input patterns, and constructs an output binary pattern P r ( N 1 , N 2 , … , N n , 0 ) , P r ( N 1 , N 2 , … , N n , 1 ) , … {\displaystyle Pr(N_{1},N_{2},\dots ,N_{n},0),Pr(N_{1},N_{2},\dots ,N_{n},1),\dots } . A logical sentence P r ( N 1 , N 2 , … , N n , t ) {\displaystyle Pr(N_{1},N_{2},\dots ,N_{n},t)} is realized by a neural network iff there exists a time-delay T ≥ 0 {\displaystyle T\geq 0} , a neuron i {\displaystyle i} in the network, and an initial state for the non-peripheral neurons N p + 1 ( 0 ) , … , N n ( 0 ) {\displaystyle N_{p+1}(0),\dots ,N_{n}(0)} , such that for any time t {\displaystyle t} , the truth-value of the logical sentence is equal to the state of the neuron i {\displaystyle i} at time t + T {\displaystyle t+T} . That is, ∀ t = 0 , 1 , 2 , … , P r ( N 1 , N 2 , … , N p , t ) = N i ( t + T ) {\displaystyle \forall t=0,1,2,\dots ,\quad Pr(N_{1},N_{2},\dots ,N_{p},t)=N_{i}(t+T)} === Equivalence === In the paper, they considered some alternative definitions of artificial neural networks, and have shown them to be equivalent, that is, neural networks under one definition realizes precisely the same logical sentences as neural networks under another definition. They considered three forms of inhibition: relative inhibition, absolute inhibition, and extinction. The definition above is relative inhibition. By "absolute inhibition" they meant that if any negative synapse fires, then the neuron will not fire. By "extinction" they meant that if at time t {\displaystyle t} , any inhibitory synapse fires on a neuron i {\displaystyle i} , then θ i ( t + j ) = θ i ( 0 ) + b j {\displaystyle \theta _{i}(t+j)=\theta _{i}(0)+b_{j}} for j = 1 , 2 , 3 , … {\displaystyle j=1,2,3,\dots } , until the next time an inhibitory synapse fires on i {\displaystyle i} . It is required that b j = 0 {\displaystyle b_{j}=0} for all large j {\displaystyle j} . Theorem 4 and 5 state that these are equivalent. They considered three forms of excitation: spatial summation, temporal summation, and facilitation. The definition above is spatial summation (which they pictured as having multiple synapses placed close together, so that the effect of their firing sums up). By "temporal summation" they meant that the total incoming signal is ∑ τ = 0 T ∑ j = 1 n w i j ( t ) N j ( t − τ ) {\displaystyle \sum _{\tau =0}^{T}\sum _{j=1}^{n}w_{ij}(t)N_{j}(t-\tau )} for some T ≥ 1 {\displaystyle T\geq 1} . By "facilitation" they meant the same as extinction, except that b j ≤ 0 {\displaystyle b_{j}\leq 0} . Theorem 6 states that these are equivalent. They considered neural networks that do not change, and those that change by Hebbian learning. That is, they assume that at t = 0 {\displaystyle t=0} , some excitatory synaptic connections are not active. If at any t {\displaystyle t} , both N i ( t ) = 1 , N j ( t ) = 1 {\displaystyle N_{i}(t)=1,N_{j}(t)=1} , then any latent excitatory synapse between i , j {\displaystyle i,j} becomes active. Theorem 7 states that these are equivalent. === Logical expressivity === They considered "temporal propositional expressions" (TPE), which are propositional formulas with one free variable t {\displaystyle t} . For example, N 1 ( t ) ∨ N 2 ( t ) ∧ ¬ N 3 ( t ) {\displaystyle N_{1}(t)\vee N_{2}(t)\wedge \neg N_{3}(t)} is such an expression. Theorem 1 and 2 together showed that neural nets without circles are equivalent to TPE. For neural nets with loops, they noted that "realizable P r {\displaystyle Pr} may involve reference to past events of an indefinite degree of remoteness". These then encodes for sentences like "There was some x such that x was a ψ" or ( ∃ x ) ( ψ x ) {\displaystyle (\exists x)(\psi x)} . Theorems 8 to 10 showed that neural nets with loops can encode all first-order logic with equality and conversely, any looped neural networks is equivalent to a sentence in first-order logic with equality, thus showing that they are equivalent in logical expressiveness. As a remark, they noted that a neural network, if furnished with a tape, scanners, and write-heads, is equivalent to a Turing machine, and conversely, every Turing machine is equivalent to some such neural network. Thus, these neural networks are equivalent to Turing computability and Church's lambda-definability. == Context == === Previous work === The paper built upon several previous strands of work. In the symbolic logic side, it built on the previous work by Carnap, Whitehead, and Russell. This was contributed by Walter Pitts, who had a strong proficiency with symbolic logic. Pitts provided mathematical and logical rigor to McCulloch’s vague ideas on psychons (atoms of psychological events) and circular causality. In the neuroscience side, it built on previous work by the mathematical biology research group centered around Nicolas Rashevsky, of which McCulloch was a member. The paper was published in the Bulletin of Mathematical Biophysics, which was founded by Rashevsky in 1939. During the late 1930s, Rashevsky's research group was producing papers that had difficulty publishing in other journals at the time, so Rashevsky decided to found a new journal exclusively devoted to mathematical biophysics. Also in the Rashevsky's group was Alston Scott Householder, who in 1941 published an abstract model

    Read more →
  • Customer support

    Customer support

    Customer support is a range of services to assist customers in making cost effective and correct use of a product. It includes assistance in planning, installation, training, troubleshooting, maintenance, upgrading, and disposal of a product. Regarding technology products such as mobile phones, televisions, computers, software products or other electronic or mechanical goods, it is termed technical support. It aims to ensure users can effectively operate the product and resolve any issues that may arise throughout its lifecycle. Support is delivered through various channels, including telephone, email, live chat, self-service knowledge bases, and social media. Research indicates that most customers attempt to resolve issues through self-service before contacting a representative. For products sold across multiple regions, support may be provided in several languages, as consumers tend to prefer assistance in their native language. Requirements for customer contact centres are defined in international standards such as ISO 18295.

    Read more →
  • AI browser

    AI browser

    An AI browser is a web browser with integrated artificial intelligence capabilities, such as automatically summarizing web page content or answering questions about it. A more specialized type is an agentic browser, based on the concept of agentic AI, which can take actions – such as navigating webpages or filling out forms – on behalf of the user. Several agentic browsers emerged in 2025, including ChatGPT Atlas (macOS only), Comet, and Dia. As of 2025, this is a recent development in the browser market, including new entrants from OpenAI, Opera and Perplexity. The designation of 'AI browser' also includes established browsers that later added non-agentic AI features, such as Microsoft Edge with the Copilot chatbot, Google Chrome with the Gemini chatbot (for Windows desktop users in the US with their language set to English), and Firefox with multiple chatbot providers (such as ChatGPT, Claude, Copilot, Gemini, and Le Chat). AI browsers have been noted to be susceptible to prompt injection attacks. == Browser extensions and integrations == Rather than creating entirely new browsers, some AI browsing solutions integrate with existing browsers through extensions or companion applications. These tools add agentic capabilities to established browsers without requiring users to switch platforms. Examples include Composite, which functions as a cross-browser agent that works with Chrome, Edge, and other browsers to automate web-based tasks for workers. == Cloud-based implementations == Cloud-based implementations of AI browsers allow users to run automated browsing agents without local installation. These systems operate on remote servers using frameworks such as Puppeteer or Playwright. Examples include Browserbase, Browser-use and AI Browser. The AI typically parses the Document Object Model (DOM) to locate and interact with page elements, and may also analyze browser screenshots to interpret layout and structure. == Criticisms and dangers == AI browsers have been noted to be susceptible to being vulnerable to prompt injection attacks, in which the content of websites can be used to hijack the control of the browser. Multiple organisations have argued against using AI browsers due to this vulnerability. The United Kingdom national cyber security centre and Gartner consider them to be too risky for adoption by most organisations. A study by the CISPA Helmholtz Center and Saarland University concluded that this vulnerability makes them easy targets for malware, fraud, automated defamation, disinformation and biased outputs.

    Read more →
  • Algorithmic probability

    Algorithmic probability

    In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the method together with Bayes' rule to obtain probabilities of prediction for an algorithm's future outputs. In the mathematical formalism used, the observations have the form of finite binary strings viewed as outputs of Turing machines, and the universal prior is a probability distribution over the set of finite binary strings calculated from a probability distribution over programs (that is, inputs to a universal Turing machine). The prior is universal in the Turing-computability sense, i.e. no string has zero probability. It is not computable, but it can be approximated. Formally, the probability P {\displaystyle P} is not a probability and it is not computable. It is only "lower semi-computable" and a "semi-measure". By "semi-measure", it means that 0 ≤ ∑ x P ( x ) < 1 {\displaystyle 0\leq \sum _{x}P(x)<1} . That is, the "probability" does not actually sum up to one, unlike actual probabilities. This is because some inputs to the Turing machine causes it to never halt, which means the probability mass allocated to those inputs is lost. By "lower semi-computable", it means there is a Turing machine that, given an input string x {\displaystyle x} , can print out a sequence y 1 < y 2 < ⋯ {\displaystyle y_{1} Read more →

  • Statistical relational learning

    Statistical relational learning

    Statistical relational learning (SRL) is a subdiscipline of artificial intelligence and machine learning that is concerned with domain models that exhibit both uncertainty (which can be dealt with using statistical methods) and complex, relational structure. Typically, the knowledge representation formalisms developed in SRL use (a subset of) first-order logic to describe relational properties of a domain in a general manner (universal quantification) and draw upon probabilistic graphical models (such as Bayesian networks or Markov networks) to model the uncertainty; some also build upon the methods of inductive logic programming. Significant contributions to the field have been made since the late 1990s. As is evident from the characterization above, the field is not strictly limited to learning aspects; it is equally concerned with reasoning (specifically probabilistic inference) and knowledge representation. Therefore, alternative terms that reflect the main foci of the field include statistical relational learning and reasoning (emphasizing the importance of reasoning) and first-order probabilistic languages (emphasizing the key properties of the languages with which models are represented). Another term that is sometimes used in the literature is relational machine learning (RML). == Canonical tasks == A number of canonical tasks are associated with statistical relational learning, the most common ones being. collective classification, i.e. the (simultaneous) prediction of the class of several objects given objects' attributes and their relations link prediction, i.e. predicting whether or not two or more objects are related link-based clustering, i.e. the grouping of similar objects, where similarity is determined according to the links of an object, and the related task of collaborative filtering, i.e. the filtering for information that is relevant to an entity (where a piece of information is considered relevant to an entity if it is known to be relevant to a similar entity) social network modelling object identification/entity resolution/record linkage, i.e. the identification of equivalent entries in two or more separate databases/datasets == Representation formalisms == One of the fundamental design goals of the representation formalisms developed in SRL is to abstract away from concrete entities and to represent instead general principles that are intended to be universally applicable. Since there are countless ways in which such principles can be represented, many representation formalisms have been proposed in recent years. In the following, some of the more common ones are listed in alphabetical order: Bayesian logic program BLOG model Markov logic networks Multi-entity Bayesian network Probabilistic logic programs Probabilistic relational model – a Probabilistic Relational Model (PRM) is the counterpart of a Bayesian network in statistical relational learning. Probabilistic soft logic Recursive random field Relational Bayesian network Relational dependency network Relational Markov network Relational Kalman filtering

    Read more →
  • Self-management (computer science)

    Self-management (computer science)

    Self-management is the process by which computer systems manage their own operation without human intervention. Self-management technologies are expected to pervade the next generation of network management systems. The growing complexity of modern networked computer systems is a limiting factor in their expansion. The increasing heterogeneity of corporate computer systems, the inclusion of mobile computing devices, and the combination of different networking technologies like WLAN, cellular phone networks, and mobile ad hoc networks make the conventional, manual management difficult, time-consuming, and error-prone. More recently, self-management has been suggested as a solution to increasing complexity in cloud computing. An industrial initiative towards realizing self-management is the Autonomic Computing Initiative (ACI) started by IBM in 2001. The ACI defines the following four functional areas: Self-configuration Auto-configuration of components Self-healing Automatic discovery, and correction of faults; automatically applying all necessary actions to bring system back to normal operation Self-optimization Automatic monitoring and control of resources to ensure the optimal functioning with respect to the defined requirements Self-protection Proactive identification and protection from arbitrary attacks

    Read more →
  • Highway network

    Highway network

    In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks. The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem, thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways"). Highway Networks have found use in text sequence labeling and speech recognition tasks. In 2014, the state of the art was training deep neural networks with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet (December). ResNet behaves like an open-gated Highway Net. == Model == The model has two gates in addition to the H ( W H , x ) {\displaystyle H(W_{H},x)} gate: the transform gate T ( W T , x ) {\displaystyle T(W_{T},x)} and the carry gate C ( W C , x ) {\displaystyle C(W_{C},x)} . The latter two gates are non-linear transfer functions (specifically sigmoid by convention). The function H {\displaystyle H} can be any desired transfer function. The carry gate is defined as: C ( W C , x ) = 1 − T ( W T , x ) {\displaystyle C(W_{C},x)=1-T(W_{T},x)} while the transform gate is just a gate with a sigmoid transfer function. == Structure == The structure of a hidden layer in the Highway Network follows the equation: y = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ C ( x , W C ) = H ( x , W H ) ⋅ T ( x , W T ) + x ⋅ ( 1 − T ( x , W T ) ) {\displaystyle {\begin{aligned}y=H(x,W_{H})\cdot T(x,W_{T})+x\cdot C(x,W_{C})\\=H(x,W_{H})\cdot T(x,W_{T})+x\cdot (1-T(x,W_{T}))\end{aligned}}} == Related work == Sepp Hochreiter analyzed the vanishing gradient problem in 1991 and attributed to it the reason why deep learning did not work well. To overcome this problem, Long Short-Term Memory (LSTM) recurrent neural networks have residual connections with a weight of 1.0 in every LSTM cell (called the constant error carrousel) to compute y t + 1 = F ( x t ) + x t {\textstyle y_{t+1}=F(x_{t})+x_{t}} . During backpropagation through time, this becomes the residual formula y = F ( x ) + x {\textstyle y=F(x)+x} for feedforward neural networks. This enables training very deep recurrent neural networks with a very long time span t. A later LSTM version published in 2000 modulates the identity LSTM connections by so-called "forget gates" such that their weights are not fixed to 1.0 but can be learned. In experiments, the forget gates were initialized with positive bias weights, thus being opened, addressing the vanishing gradient problem. As long as the forget gates of the 2000 LSTM are open, it behaves like the 1997 LSTM. The Highway Network of May 2015 applies these principles to feedforward neural networks. It was reported to be "the first very deep feedforward network with hundreds of layers". It is like a 2000 LSTM with forget gates unfolded in time, while the later Residual Nets have no equivalent of forget gates and are like the unfolded original 1997 LSTM. If the skip connections in Highway Networks are "without gates," or if their gates are kept open (activation 1.0), they become Residual Networks. The residual connection is a special case of the "short-cut connection" or "skip connection" by Rosenblatt (1961) and Lang & Witbrock (1988) which has the form x ↦ F ( x ) + A x {\displaystyle x\mapsto F(x)+Ax} . Here the randomly initialized weight matrix A does not have to be the identity mapping. Every residual connection is a skip connection, but almost all skip connections are not residual connections. The original Highway Network paper not only introduced the basic principle for very deep feedforward networks, but also included experimental results with 20, 50, and 100 layers networks, and mentioned ongoing experiments with up to 900 layers. Networks with 50 or 100 layers had lower training error than their plain network counterparts, but no lower training error than their 20 layers counterpart (on the MNIST dataset, Figure 1 in ). No improvement on test accuracy was reported with networks deeper than 19 layers (on the CIFAR-10 dataset; Table 1 in ). The ResNet paper, however, provided strong experimental evidence of the benefits of going deeper than 20 layers. It argued that the identity mapping without modulation is crucial and mentioned that modulation in the skip connection can still lead to vanishing signals in forward and backward propagation (Section 3 in ). This is also why the forget gates of the 2000 LSTM were initially opened through positive bias weights: as long as the gates are open, it behaves like the 1997 LSTM. Similarly, a Highway Net whose gates are opened through strongly positive bias weights behaves like a ResNet. The skip connections used in modern neural networks (e.g., Transformers) are dominantly identity mappings.

    Read more →
  • Argumentation framework

    Argumentation framework

    In artificial intelligence and related fields, an argumentation framework is a way to deal with contentious information and draw conclusions from it using formalized arguments. In an abstract argumentation framework, entry-level information is a set of abstract arguments that, for instance, represent data or a proposition. Conflicts between arguments are represented by a binary relation on the set of arguments. In concrete terms, an argumentation framework is represented with a directed graph such that the nodes are the arguments, and the arrows represent the attack relation. There exist some extensions of the Dung's framework, like the logic-based argumentation frameworks or the value-based argumentation frameworks. == Abstract argumentation frameworks == === Formal framework === Abstract argumentation frameworks, also called argumentation frameworks à la Dung, are defined formally as a pair: A set of abstract elements called arguments, denoted A {\displaystyle A} A binary relation on A {\displaystyle A} , called attack relation, denoted R {\displaystyle R} For instance, the argumentation system S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } with A = { a , b , c , d } {\displaystyle A=\{a,b,c,d\}} and R = { ( a , b ) , ( b , c ) , ( d , c ) } {\displaystyle R=\{(a,b),(b,c),(d,c)\}} contains four arguments ( a , b , c {\displaystyle a,b,c} and d {\displaystyle d} ) and three attacks ( a {\displaystyle a} attacks b {\displaystyle b} , b {\displaystyle b} attacks c {\displaystyle c} and d {\displaystyle d} attacks c {\displaystyle c} ). Dung defines some notions : an argument a ∈ A {\displaystyle a\in A} is acceptable with respect to E ⊆ A {\displaystyle E\subseteq A} if and only if E {\displaystyle E} defends a {\displaystyle a} , that is ∀ b ∈ A {\displaystyle \forall b\in A} such that ( b , a ) ∈ R , ∃ c ∈ E {\displaystyle (b,a)\in R,\exists c\in E} such that ( c , b ) ∈ R {\displaystyle (c,b)\in R} , a set of arguments E {\displaystyle E} is conflict-free if there is no attack between its arguments, formally : ∀ a , b ∈ E , ( a , b ) ∉ R {\displaystyle \forall a,b\in E,(a,b)\not \in R} , a set of arguments E {\displaystyle E} is admissible if and only if it is conflict-free and all its arguments are acceptable with respect to E {\displaystyle E} . === Different semantics of acceptance === ==== Extensions ==== To decide if an argument can be accepted or not, or if several arguments can be accepted together, Dung defines several semantics of acceptance that allows, given an argumentation system, sets of arguments (called extensions) to be computed. For instance, given S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } , E {\displaystyle E} is a complete extension of S {\displaystyle S} only if it is an admissible set and every acceptable argument with respect to E {\displaystyle E} belongs to E {\displaystyle E} , E {\displaystyle E} is a preferred extension of S {\displaystyle S} only if it is a maximal element (with respect to the set-theoretical inclusion) among the admissible sets with respect to S {\displaystyle S} , E {\displaystyle E} is a stable extension of S {\displaystyle S} only if it is a conflict-free set that attacks every argument that does not belong in E {\displaystyle E} (formally, ∀ a ∈ A ∖ E , ∃ b ∈ E {\displaystyle \forall a\in A\backslash E,\exists b\in E} such that ( b , a ) ∈ R {\displaystyle (b,a)\in R} , E {\displaystyle E} is the (unique) grounded extension of S {\displaystyle S} only if it is the smallest element (with respect to set inclusion) among the complete extensions of S {\displaystyle S} . There exists some inclusions between the sets of extensions built with these semantics : Every stable extension is preferred, Every preferred extension is complete, The grounded extension is complete, If the system is well-founded (there exists no infinite sequence a 0 , a 1 , … , a n , … {\displaystyle a_{0},a_{1},\dots ,a_{n},\dots } such that ∀ i > 0 , ( a i + 1 , a i ) ∈ R {\displaystyle \forall i>0,(a_{i+1},a_{i})\in R} ), all these semantics coincide—only one extension is grounded, stable, preferred, and complete. Some other semantics have been defined. One introduce the notation E x t σ ( S ) {\displaystyle Ext_{\sigma }(S)} to note the set of σ {\displaystyle \sigma } -extensions of the system S {\displaystyle S} . In the case of the system S {\displaystyle S} in the figure above, E x t σ ( S ) = { { a , d } } {\displaystyle Ext_{\sigma }(S)=\{\{a,d\}\}} for every Dung's semantic—the system is well-founded. That explains why the semantics coincide, and the accepted arguments are: a {\displaystyle a} and d {\displaystyle d} . ==== Labellings ==== Labellings are a more expressive way than extensions to express the acceptance of the arguments. Concretely, a labelling is a mapping that associates every argument with a label in (the argument is accepted), out (the argument is rejected), or undec (the argument is undefined—not accepted or refused). One can also note a labelling as a set of pairs ( a r g u m e n t , l a b e l ) {\displaystyle ({\mathit {argument}},{\mathit {label}})} . Such a mapping does not make sense without additional constraint. The notion of reinstatement labelling guarantees the sense of the mapping. L {\displaystyle L} is a reinstatement labelling on the system S = ⟨ A , R ⟩ {\displaystyle S=\langle A,R\rangle } if and only if : ∀ a ∈ A , L ( a ) = i n {\displaystyle \forall a\in A,L(a)={\mathit {in}}} if and only if ∀ b ∈ A {\displaystyle \forall b\in A} such that ( b , a ) ∈ R , L ( b ) = o u t {\displaystyle (b,a)\in R,L(b)={\mathit {out}}} ∀ a ∈ A , L ( a ) = o u t {\displaystyle \forall a\in A,L(a)={\mathit {out}}} if and only if ∃ b ∈ A {\displaystyle \exists b\in A} such that ( b , a ) ∈ R {\displaystyle (b,a)\in R} and L ( b ) = i n {\displaystyle L(b)={\mathit {in}}} ∀ a ∈ A , L ( a ) = u n d e c {\displaystyle \forall a\in A,L(a)={\mathit {undec}}} if and only if L ( a ) ≠ i n {\displaystyle L(a)\neq {\mathit {in}}} and L ( a ) ≠ o u t {\displaystyle L(a)\neq {\mathit {out}}} One can convert every extension into a reinstatement labelling: the arguments of the extension are in, those attacked by an argument of the extension are out, and the others are undec. Conversely, one can build an extension from a reinstatement labelling just by keeping the arguments in. Indeed, Caminada proved that the reinstatement labellings and the complete extensions can be mapped in a bijective way. Moreover, the other Datung's semantics can be associated to some particular sets of reinstatement labellings. Reinstatement labellings distinguish arguments not accepted because they are attacked by accepted arguments from undefined arguments—that is, those that are not defended cannot defend themselves. An argument is undec if it is attacked by at least another undec. If it is attacked only by arguments out, it must be in, and if it is attacked some argument in, then it is out. The unique reinstatement labelling that corresponds to the system S {\displaystyle S} above is L = { ( a , i n ) , ( b , o u t ) , ( c , o u t ) , ( d , i n ) } {\displaystyle L=\{(a,{\mathit {in}}),(b,{\mathit {out}}),(c,{\mathit {out}}),(d,{\mathit {in}})\}} . === Inference from an argumentation system === In the general case when several extensions are computed for a given semantic σ {\displaystyle \sigma } , the agent that reasons from the system can use several mechanisms to infer information: Credulous inference: the agent accepts an argument if it belongs to at least one of the σ {\displaystyle \sigma } -extensions—in which case, the agent risks accepting some arguments that are not acceptable together ( a {\displaystyle a} attacks b {\displaystyle b} , and a {\displaystyle a} and b {\displaystyle b} each belongs to an extension) Skeptical inference: the agent accepts an argument only if it belongs to every σ {\displaystyle \sigma } -extension. In this case, the agent risks deducing too little information (if the intersection of the extensions is empty or has a very small cardinal). For these two methods to infer information, one can identify the set of accepted arguments, respectively C r σ ( S ) {\displaystyle Cr_{\sigma }(S)} the set of the arguments credulously accepted under the semantic σ {\displaystyle \sigma } , and S c σ ( S ) {\displaystyle Sc_{\sigma }(S)} the set of arguments accepted skeptically under the semantic σ {\displaystyle \sigma } (the σ {\displaystyle \sigma } can be missed if there is no possible ambiguity about the semantic). Of course, when there is only one extension (for instance, when the system is well-founded), this problem is very simple: the agent accepts arguments of the unique extension and rejects others. The same reasoning can be done with labellings that correspond to the chosen semantic : an argument can be accepted if it is in for each labelling and refused if it is out for each labelling, the others being in an undecided state (the status of the arguments can remind the

    Read more →
  • Evaluation of binary classifiers

    Evaluation of binary classifiers

    Evaluation of a binary classifier typically assigns a numerical value, or values, to a classifier that represent its accuracy. An example is error rate, which measures how frequently the classifier makes a mistake. There are many metrics that can be used; different fields have different preferences. For example, in medicine sensitivity and specificity are often used, while in computer science precision and recall are preferred. An important distinction is between metrics that are independent of the prevalence or skew (how often each class occurs in the population), and metrics that depend on the prevalence – both types are useful, but they have very different properties. Often, evaluation is used to compare two methods of classification, so that one can be adopted and the other discarded. Such comparisons are more directly achieved by a form of evaluation that results in a single unitary metric rather than a pair of metrics. == Contingency table == Given a data set, a classification (the output of a classifier on that set) gives two numbers: the number of positives and the number of negatives, which add up to the total size of the set. To evaluate a classifier, one compares its output to another reference classification – ideally a perfect classification, but in practice the output of another gold standard test – and cross tabulates the data into a 2×2 contingency table, comparing the two classifications. One then evaluates the classifier relative to the gold standard by computing summary statistics of these 4 numbers. Generally these statistics will be scale invariant (scaling all the numbers by the same factor does not change the output), to make them independent of population size, which is achieved by using ratios of homogeneous functions, most simply homogeneous linear or homogeneous quadratic functions. Say we test some people for the presence of a disease. Some of these people have the disease, and our test correctly says they are positive. They are called true positives (TP). Some have the disease, but the test incorrectly claims they don't. They are called false negatives (FN). Some don't have the disease, and the test says they don't – true negatives (TN). Finally, there might be healthy people who have a positive test result – false positives (FP). These can be arranged into a 2×2 contingency table (confusion matrix), conventionally with the test result on the vertical axis and the actual condition on the horizontal axis. These numbers can then be totaled, yielding both a grand total and marginal totals. Totaling the entire table, the number of true positives, false negatives, true negatives, and false positives add up to 100% of the set. Totaling the columns (adding vertically) the number of true positives and false positives add up to 100% of the test positives, and likewise for negatives. Totaling the rows (adding horizontally), the number of true positives and false negatives add up to 100% of the condition positives (conversely for negatives). The basic marginal ratio statistics are obtained by dividing the 2×2=4 values in the table by the marginal totals (either rows or columns), yielding 2 auxiliary 2×2 tables, for a total of 8 ratios. These ratios come in 4 complementary pairs, each pair summing to 1, and so each of these derived 2×2 tables can be summarized as a pair of 2 numbers, together with their complements. Further statistics can be obtained by taking ratios of these ratios, ratios of ratios, or more complicated functions. The contingency table and the most common derived ratios are summarized below; see sequel for details. Note that the rows correspond to the condition actually being positive or negative (or classified as such by the gold standard), as indicated by the color-coding, and the associated statistics are prevalence-independent, while the columns correspond to the test being positive or negative, and the associated statistics are prevalence-dependent. There are analogous likelihood ratios for prediction values, but these are less commonly used, and not depicted above. == Pairs of metrics == Often accuracy is evaluated with a pair of metrics composed in a standard pattern. === Sensitivity and specificity === The fundamental prevalence-independent statistics are sensitivity and specificity. Sensitivity or True Positive Rate (TPR), also known as recall, is the proportion of people that tested positive and are positive (True Positive, TP) of all the people that actually are positive (Condition Positive, CP = TP + FN). It can be seen as the probability that the test is positive given that the patient is sick. With higher sensitivity, fewer actual cases of disease go undetected (or, in the case of the factory quality control, fewer faulty products go to the market). Specificity (SPC) or True Negative Rate (TNR) is the proportion of people that tested negative and are negative (True Negative, TN) of all the people that actually are negative (Condition Negative, CN = TN + FP). As with sensitivity, it can be looked at as the probability that the test result is negative given that the patient is not sick. With higher specificity, fewer healthy people are labeled as sick (or, in the factory case, fewer good products are discarded). The relationship between sensitivity and specificity, as well as the performance of the classifier, can be visualized and studied using the Receiver Operating Characteristic (ROC) curve. In theory, sensitivity and specificity are independent in the sense that it is possible to achieve 100% in both (such as in the red/blue ball example given above). In more practical, less contrived instances, however, there is usually a trade-off, such that they are inversely proportional to one another to some extent. This is because we rarely measure the actual thing we would like to classify; rather, we generally measure an indicator of the thing we would like to classify, referred to as a surrogate marker. The reason why 100% is achievable in the ball example is because redness and blueness is determined by directly detecting redness and blueness. However, indicators are sometimes compromised, such as when non-indicators mimic indicators or when indicators are time-dependent, only becoming evident after a certain lag time. The following example of a pregnancy test will make use of such an indicator. Modern pregnancy tests do not use the pregnancy itself to determine pregnancy status; rather, human chorionic gonadotropin is used, or hCG, present in the urine of gravid females, as a surrogate marker to indicate that a woman is pregnant. Because hCG can also be produced by a tumor, the specificity of modern pregnancy tests cannot be 100% (because false positives are possible). Also, because hCG is present in the urine in such small concentrations after fertilization and early embryogenesis, the sensitivity of modern pregnancy tests cannot be 100% (because false negatives are possible). === Positive and negative predictive values === In addition to sensitivity and specificity, the performance of a binary classification test can be measured with positive predictive value (PPV), also known as precision, and negative predictive value (NPV). The positive prediction value answers the question "If the test result is positive, how well does that predict an actual presence of disease?". It is calculated as TP/(TP + FP); that is, it is the proportion of true positives out of all positive results. The negative prediction value is the same, but for negatives, naturally. ==== Impact of prevalence on predictive values ==== Prevalence has a significant impact on prediction values. As an example, suppose there is a test for a disease with 99% sensitivity and 99% specificity. If 2000 people are tested and the prevalence (in the sample) is 50%, 1000 of them are sick and 1000 of them are healthy. Thus about 990 true positives and 990 true negatives are likely, with 10 false positives and 10 false negatives. The positive and negative prediction values would be 99%, so there can be high confidence in the result. However, if the prevalence is only 5%, so of the 2000 people only 100 are really sick, then the prediction values change significantly. The likely result is 99 true positives, 1 false negative, 1881 true negatives and 19 false positives. Of the 19+99 people tested positive, only 99 really have the disease – that means, intuitively, that given that a patient's test result is positive, there is only 84% chance that they really have the disease. On the other hand, given that the patient's test result is negative, there is only 1 chance in 1882, or 0.05% probability, that the patient has the disease despite the test result. === Precision and recall === Precision and recall can be interpreted as (estimated) conditional probabilities: Precision is given by P ( C = P | C ^ = P ) {\displaystyle P(C=P|{\hat {C}}=P)} while recall is given by P ( C ^ = P | C = P ) {\displaystyle P({\hat {C}}=P|C=P)} , where C ^ {\

    Read more →
  • Scene text

    Scene text

    Scene text is text that appears in an image captured by a camera in an outdoor environment. The detection and recognition of scene text from camera captured images are computer vision tasks which became important after smart phones with good cameras became ubiquitous. The text in scene images varies in shape, font, colour and position. The recognition of scene text is further complicated sometimes by non-uniform illumination and focus. To improve scene text recognition, the International Conference on Document Analysis and Recognition (ICDAR) conducts a robust reading competition once in two years. The competition was held in 2003, 2005 and during every ICDAR conference. International association for pattern recognition (IAPR) has created a list of datasets as Reading systems. == Text detection == Text detection is the process of detecting the text present in the image, followed by surrounding it with a rectangular bounding box. Text detection can be carried out using image based techniques or frequency based techniques. In image based techniques, an image is segmented into multiple segments. Each segment is a connected component of pixels with similar characteristics. The statistical features of connected components are utilised to group them and form the text. Machine learning approaches such as support vector machine and convolutional neural networks are used to classify the components into text and non-text. In frequency based techniques, discrete Fourier transform (DFT) or discrete wavelet transform (DWT) are used to extract the high frequency coefficients. It is assumed that the text present in an image has high frequency components and selecting only the high frequency coefficients filters the text from the non-text regions in an image. == Word recognition == In word recognition, the text is assumed to be already detected and located and the rectangular bounding box containing the text is available. The word present in the bounding box needs to be recognized. The methods available to perform word recognition can be broadly classified into top-down and bottom-up approaches. In the top-down approaches, a set of words from a dictionary is used to identify which word suits the given image. Images are not segmented in most of these methods. Hence, the top-down approach is sometimes referred as segmentation free recognition. In the bottom-up approaches, the image is segmented into multiple components and the segmented image is passed through a recognition engine. Either an off the shelf Optical character recognition (OCR) engine or a custom-trained one is used to recognise the text.

    Read more →
  • Time series

    Time series

    In mathematics, a time series is a sequence of data points indexed, listed, or graphed in chronological order. Most commonly, a time series consists of observations recorded at successive equally spaced points in time. Thus, it represents a form of discrete-time data. A time series may describe measurements collected over seconds, days, years, or even centuries. Common examples include heights of ocean tides, counts of sunspots, daily temperature readings, and the closing values of stock market indices such as the Dow Jones Industrial Average. A time series is often visualized using a run chart (a type of temporal line chart), which helps identify patterns such as trends, seasonal effects, and irregular fluctuations. Time series are widely used in statistics, actuarial science, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and many other areas of applied science and engineering that involve temporal measurements. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. Generally, time series data is modeled as a stochastic process. While regression analysis is often employed in such a way as to test relationships between one or more different time series, this type of analysis is not usually called "time series analysis", which refers in particular to relationships between different points in time within a single series. Time series data have a natural temporal ordering. This makes time series analysis distinct from cross-sectional studies, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their respective education levels, where the individuals' data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see time reversibility). Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data (i.e. sequences of characters, such as letters and words in the English language). == Methods for analysis == Methods for time series analysis may be divided into two classes: frequency-domain methods and time-domain methods. The former include spectral analysis and wavelet analysis; the latter include auto-correlation and cross-correlation analysis. In the time domain, correlation and analysis can be made in a filter-like manner using scaled correlation, thereby mitigating the need to operate in the frequency domain. Additionally, time series analysis techniques may be divided into parametric and non-parametric methods. The parametric approaches assume that the underlying stationary stochastic process has a certain structure which can be described using a small number of parameters (for example, using an autoregressive or moving-average model). In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. By contrast, non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure. Methods of time series analysis may also be divided into linear and non-linear, and univariate and multivariate. == Panel data == A time series is one type of panel data. Panel data is the general class, a multidimensional data set, whereas a time series data set is a one-dimensional panel (as is a cross-sectional dataset). A data set may exhibit characteristics of both panel data and time series data. One way to tell is to ask what makes one data record unique from the other records. If the answer is the time data field, then this is a time series data set candidate. If determining a unique record requires a time data field and an additional identifier which is unrelated to time (e.g. student ID, stock symbol, country code), then it is panel data candidate. If the differentiation lies on the non-time identifier, then the data set is a cross-sectional data set candidate. == Analysis == There are several types of motivation and data analysis available for time series which are appropriate for different purposes. === Motivation === In the context of statistics, econometrics, quantitative finance, seismology, meteorology, and geophysics the primary goal of time series analysis is forecasting. In the context of signal processing, control engineering and communication engineering it is used for signal detection. Other applications are in data mining, pattern recognition and machine learning, where time series analysis can be used for clustering, classification, query by content, anomaly detection as well as forecasting. === Exploratory analysis === A simple way to examine a regular time series is manually with a line chart. The datagraphic shows tuberculosis deaths in the United States, along with the yearly change and the percentage change from year to year. The total number of deaths declined in every year until the mid-1980s, after which there were occasional increases, often proportionately - but not absolutely - quite large. A study of corporate data analysts found two challenges to exploratory time series analysis: discovering the shape of interesting patterns, and finding an explanation for these patterns. Visual tools that represent time series data as heat map matrices can help overcome these challenges. === Estimation, filtering, and smoothing === This approach may be based on harmonic analysis and filtering of signals in the frequency domain using the Fourier transform, and spectral density estimation. Its development was significantly accelerated during World War II by mathematician Norbert Wiener, electrical engineers Rudolf E. Kálmán, Dennis Gabor and others for filtering signals from noise and predicting signal values at a certain point in time. An equivalent effect may be achieved in the time domain, as in a Kalman filter; see filtering and smoothing for more techniques. Other related techniques include: Autocorrelation analysis to examine serial dependence Spectral analysis to examine cyclic behavior which need not be related to seasonality. For example, sunspot activity varies over 11 year cycles. Other common examples include celestial phenomena, weather patterns, neural activity, commodity prices, and economic activity. Separation into components representing trend, seasonality, slow and fast variation, and cyclical irregularity: see trend estimation and decomposition of time series === Curve fitting === Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. A related topic is regression analysis, which focuses more on questions of statistical inference such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables. Extrapolation refers to the use of a fitted curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data. For processes that are expected to generally grow in magnitude one of the curves in the graphic (and many others) can be fitted by estimating their parameters. The construction of economic time series involves the estimation of some components for some dates by interpolation between values ("benchmarks") for earlier and later dates. Interpolation is estimation of an unknown quantity between two known quantities (historical data), or drawing conclusions about missing information from the available information ("reading between the lines"). Interpolation is useful where the data surrounding the missing data is available and its trend, seasonality, and longer-term cycles are known. This is often done by using a relat

    Read more →
  • Environmental impact of AI

    Environmental impact of AI

    The environmental impact of the design, training, deployment and use of artificial intelligence includes the greenhouse gas emissions from generating electricity for data centres and computing hardware, operational and upstream water use, and material impacts from hardware manufacturing, mining and electronic waste. Estimating AI's environmental effects can be difficult because results depend on how impacts are measured, including whether accounting includes only model computation or also data-centre overhead, idle capacity, hardware manufacture, and local electricity supply. As these issues have received greater attention, governments and regulators have increasingly considered data-centre reporting requirements, energy-efficiency standards, and broader transparency measures for AI-related resource use. == Carbon footprint and energy use == AI-related energy use arises at multiple stages, including model training, fine-tuning, inference, storage, networking, and supporting infrastructure such as cooling and power conversion. === Individual level === Published estimates of energy use per AI request vary widely across models, tasks and measurement methods. A benchmark study presented at the 2024 ACM Conference on Fairness, Accountability, and Transparency found substantial differences between task types, with lower energy use for some text tasks and much higher energy use for image generation in the study's test conditions. In that benchmark, simple classification tasks consumed about 0.002–0.007 Wh per prompt on average (about 9% of a smartphone charge for 1,000 prompts), while text generation and text summarisation each used about 0.05 Wh per prompt; image generation averaged 2.91 Wh per prompt, and the least efficient image model in the study used 11.49 Wh per image (roughly equivalent to half a smartphone charge). First-party measurements in production environments have also been published. A 2025 Google study on Gemini assistant serving reported median per-prompt energy, emissions, and water-use estimates under the authors' accounting framework, while noting that different system boundaries can produce substantially different results. The study reported a median text-prompt estimate of about 0.24 Wh, which is roughly as much energy as watching nine seconds of television. The study also stated that software and infrastructure improvements reduced energy use by a factor of 33 and carbon emissions by a factor of 44 for a typical prompt over one year within the authors' framework. Researchers at the University of Michigan measured the energy consumption of various Meta Llama 3.1 models released in 2024 and found that smaller language models (8 billion parameters) use about 114 joules (0.03167 Wh) per response, while larger models (405 billion parameters) require up to 6,700 joules (1.861 Wh) per response. This corresponds to the energy needed to run a microwave oven for roughly one-tenth of a second and eight seconds, respectively. Comparisons between AI systems and human labour for specific tasks have produced mixed results and remain sensitive to assumptions about output quality, workload and system boundaries. A 2024 study in Scientific Reports reported 130 to 2900 times lower estimated carbon emissions for selected AI systems than for human writers and illustrators under its assumptions. A later Scientific Reports paper reported a counterexample for programming tasks under its assumptions, finding 5 to 19 times higher estimated emissions for the evaluated AI system than for human programmers on the benchmark used in that study. === System level === ==== Energy use and efficiency ==== AI electricity intensity depends not only on model architecture but also on hardware and facility efficiency. Data-centre operators commonly report Power usage effectiveness (PUE), which measures the ratio of total facility energy to IT equipment energy; a lower PUE indicates less overhead energy for cooling and other supporting infrastructure. Operators may also publish metrics and case studies on hardware efficiency, cooling systems and power sourcing. In its 2024 environmental report, Google stated that its 2023 total greenhouse gas emissions increased 13% year over year, primarily because of increased data-centre energy consumption and supply-chain emissions, while also reporting lower PUE than industry averages for its own facilities. The International Energy Agency has also reported that data centres remain a relatively small share of global electricity use overall, but that their local effects can be much more pronounced because demand is geographically concentrated. ==== Carbon footprint ==== At system level, AI contributes to rising electricity demand in data centres and related infrastructure. The International Energy Agency estimated that data centres used about 415 TWh of electricity in 2024, or around 1.5% of global electricity consumption, and projected that data-centre electricity use could rise to about 945 TWh by 2030, with AI identified as the main driver of that growth alongside other digital services. The carbon footprint of AI systems depends strongly on electricity sources, hardware efficiency, utilisation rates, and what stages are included in the accounting. Training large models can require substantial electricity, while total lifecycle impacts also depend on deployment scale and the amount of inference performed after training. Early analyses of frontier-model development reported rapid historical growth in training compute for selected systems, although later trends have depended on changes in model design, hardware and efficiency gains. Accounting methods that include upstream or embodied impacts, such as hardware manufacture and facilities construction, can materially affect estimates of AI-related emissions. === Decisions and strategies by individual companies === Large technology companies have reported that the expansion of AI and cloud infrastructure affects their sustainability targets, electricity demand, and resource use. Google, for example, attributed part of its emissions growth in 2023 to increased data-centre energy consumption and supply-chain emissions in its 2024 environmental report. Cloud and AI companies have also announced measures intended to reduce environmental impacts, including investment in more efficient hardware, low-carbon electricity procurement, alternative cooling systems, and water stewardship programmes. The extent, comparability, and third-party verification of such disclosures vary between firms and jurisdictions. == Water usage == Data centres can use water directly for cooling and indirectly through the water used in electricity generation, depending on the local energy mix. Public reporting on data-centre water use has often been inconsistent, making comparisons between operators and regions difficult. To standardise operational reporting, The Green Grid proposed the metric water usage effectiveness (WUE), defined as annual site water use divided by IT equipment energy use. WUE does not by itself measure local water stress, source sustainability, or all upstream water impacts. Studies of AI water use also distinguish between water withdrawal and water consumption. Research on AI-specific water use has argued that the water footprint of AI systems can be difficult to observe and may vary substantially by location, cooling design, and electricity source. A 2025 Communications of the ACM article summarised methods for estimating AI water footprints and emphasised the distinction between water withdrawal and water consumption. Li and colleagues estimated that global AI water withdrawal could reach 4.2–6.6 billion cubic metres in 2027 under the scenarios examined in their article. Using GPT-3, released by OpenAI in 2020, as an example, they estimated that training the model in Microsoft's U.S. data centres could consume about 700,000 litres of onsite water and about 5.4 million litres in total when offsite electricity-related water use was included; they also estimated that 10–50 medium-length GPT-3 responses could consume about 500 mL of water, depending on when and where the model was deployed. Published prompt-level estimates have also varied by system and accounting framework: the 2025 Google study on Gemini assistant serving reported a median text-prompt estimate of about 0.26 mL under its framework. Location can materially affect the significance of data-centre water use. Research on U.S. data centres found that one-fifth of servers' direct water footprint came from moderately to highly water-stressed watersheds, while nearly half of servers were fully or partially powered by plants located in water-stressed regions. A 2025 Reuters report, citing data from Verisk Maplecroft and NatureFinance, said that an average mid-sized data centre uses about 1.4 million litres of water per day for cooling and that Phoenix would experience a 32% increase in annual water stress if currently pl

    Read more →